Python RegExp Builder

This is a library that generates regular expressions given what Unicode codepoints should be matched. It is able to generate RegExp using surrogate pairs, hence supporting codepoints outside the BMP (i.e., those above U+FFFF) in narrow Python builds.

Python 2 is currently required; Python 3 support is soon forthcoming.

`builder` module

This contains, well, everything. It exports two functions: builder and sequence_builder.

builder takes an iterable containing either (or a mixture) of codepoints (given either by a number type or as a unicode string) or two-tuples of codepoints, where (x, y) represents a range from U+x to U+y inclusive.

For example:

>>> builder([0x61])
u'a'
>>> builder([(0x61, 0x7A)])
u'[a-z]'
>>> builder([(0x61, 0x7A), (0x41, 0x5A)])
u'[A-Za-z]'
>>> builder([(u"a", u"z"), (0x41, 0x5A)])
u'[A-Za-z]'
>>> builder([(u"a", 0x7A), (0x41, 0x5A)])
u'[A-Za-z]'

enumerable_builder takes an iterable and enumerates it (through the built-in enumerate(), hence requirements that places hold) treating the nth value enumerated as U+n and allows it if the value is truthful.

For example:

>>> enumerable_builder([False, False, True])
u'\\\x02'
>>> enumerable_builder([True, True, True, True])
u'[\x00-\x03]'

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
builder.py		builder.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Python RegExp Builder

`builder` module

About

Uh oh!

Releases

Packages

Languages

gsnedders/py-regexp-builder

Folders and files

Latest commit

History

Repository files navigation

Python RegExp Builder

builder module

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`builder` module

Packages