Skip to content

Commit

Permalink
Added elemental methods to EZRegex, and added flag methods
Browse files Browse the repository at this point in the history
  • Loading branch information
smartycope committed Apr 21, 2024
1 parent 6e138e7 commit 936a86c
Show file tree
Hide file tree
Showing 7 changed files with 309 additions and 28 deletions.
32 changes: 30 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,21 +38,26 @@ TLDR: This is to regular expressions what CMake is to makefiles
* [Usage](#usage)
* [Invert](#inverting)
* [Generate](#generation)
* [Elements and Methods](#elements-and-methods)
* [Dialects](#dialects)
* [Documentation](#documentation)
* [Developer Docs](#developer-documentation)
* [Installation](#installation)
* [Todo](#todo)
* [License](#license)
* [Credits](#credits)

## Usage

Quickstart
```python
from ezregex import *
'foo' + number + optional(whitespace) + word
# Or, using methods
number.append(whitespace.optional).prepend('foo').append(word)
# Matches `foo123abc` and `foo123 abc`
# but not `abc123foo` or `foo bar`

```

Importing as a named package is recommended
Expand Down Expand Up @@ -126,11 +131,23 @@ The `invert` function (available as er.invert(`expression`), `expression`.invert


## Generation
In version 1.7.0 we introduced a new function: `generate_regex`. It takes in 2 sets of strings, and returns a regular expression that will match everything in the first set and nothing in the second set. It may be a bit crude, but it can be a good starting point if you don't know where to start. It's also really good at [regex golf](http://regex.alf.nu/).
In version v1.7.0 we introduced a new function: `generate_regex`. It takes in 2 sets of strings, and returns a regular expression that will match everything in the first set and nothing in the second set. It may be a bit crude, but it can be a good starting point if you don't know where to start. It's also really good at [regex golf](http://regex.alf.nu/).

## Elements and Methods
As of v2.1.0, there's *elemental methods* in EZRegex objects, as well as the basic elements. These shadow their element counterparts exactly, and work the same way, they're just for convenience and preference.

For example, these are all equivelent:
```python
# Element functions
optional(whitespace) + group(either(repeat('a'), 'b')) + if_followed_by(word)
# Elemental methods
whitespace.optional.append(literal('a').repeat.or_('b').unnamed).if_followed_by(word)
# Mixed
whitespace.optional + repeat('a').or_('b').unnamed + if_followed_by(word)
```

## Dialects
As of version 1.6.0, the concepts of *dialects* was introduced. Different languages often have slight variations on the regular expression syntax. As this library is meant to be language independent (even though it's written in Python), you should be able to compile regular expressions to work with other languages as well. To do that, you can simply import all the elements as a sub-package, and they should work identically, although some languages may not have the same features as others.
As of version v1.6.0, the concepts of *dialects* was introduced. Different languages often have slight variations on the regular expression syntax. As this library is meant to be language independent (even though it's written in Python), you should be able to compile regular expressions to work with other languages as well. To do that, you can simply import all the elements as a sub-package, and they should work identically, although some languages may not have the same features as others.
```python
>>> import ezregex as er # The python dialect is the defualt dialect
>>> er.group(digit, 'name') + er.earlier_group('name')
Expand Down Expand Up @@ -1069,3 +1086,14 @@ See the [GitHub Issue Page](https://github.com/smartycope/ezregex/issues)

## License
EZRegex is distributed under the [MIT License](https://choosealicense.com/licenses/mit)

## Credits
This library was written from scratch entirely by Copeland Carter.
Inspirations for this project include:

- [PyParsing](https://github.com/pyparsing/pyparsing)
- I stole a bunch of the operators (especially the [] operator) from them, though we happened upon the same basic structure independantly (convergent evolution, anyone?)
- [regular-expressions.info](https://www.regular-expressions.info/refflavors.html)
- Their reference is where I got a lot of the other regex flavors
- [human-regex](https://github.com/fleetingbytes/human-regex)
- Gave me the idea for including element methods, instead of solely element functions
134 changes: 127 additions & 7 deletions ezregex/EZRegex.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@
from .generate import *
from .invert import invert

from .base import base, psuedonymns

# TODO: Seperate EZRegex into a "bytes" mode vs "string" mode
# TODO: consider changing add_flags to "outer" or "end" or something
# TODO: Seriously consider removing the debug functions
Expand All @@ -23,7 +25,7 @@ def __init__(self, definition, *, sanatize=True, replacement=False, flags=''):
still work from the user's end
"""
# Set attributes like this so the class can remain "immutable", while still being usable
self.__setattr__('flags', flags, True)
self.__setattr__('_flags', flags, True)
self.__setattr__('_sanatize', sanatize, True)
self.__setattr__('replacement', replacement, True)

Expand Down Expand Up @@ -78,7 +80,7 @@ def _compile(self, add_flags=True):
if add_flags:
regex = self._beginning + regex + self._end

if len(self.flags):
if len(self._flags):
regex = self._flag_func(regex)
return regex

Expand All @@ -90,10 +92,13 @@ def _copy(self, definition=..., sanatize=..., replacement=..., flags=...):
if replacement is Ellipsis:
replacement = self.replacement
if flags is Ellipsis:
flags = self.flags
flags = self._flags

return type(self)(definition, sanatize=sanatize, replacement=replacement, flags=flags)

def _base(self, element, /, *args, **kwargs):
""" Constructs the base element specified, and returns it passed with any additional arguements """
return type(self)(**base[element])(*args, **kwargs)

# Regular functions
def str(self):
Expand Down Expand Up @@ -195,6 +200,118 @@ def inverse(self, amt=1, **kwargs):
def invert(self, amt=1, **kwargs):
return self.inverse(amt, **kwargs)

# Elemental functions
def group(self, name=None):
return self._base('group', self, name=name)

def named(self, name):
return self.group(name)

@property
def unnamed(self):
return self.group()

def not_preceded_by(self, input):
return self._base('if_not_preceded_by', self, input)

def preceded_by(self, input):
return self._base('if_preceded_by', self, input)

def not_proceded_by(self, input):
return self._base('if_not_proceeded_by', self, input)

def proceded_by(self, input):
return self._base('if_proceeded_by', self, input)

def enclosed_with(self, open, closed=None):
return self._base('if_enclosed_with', self, open, closed)

@property
def optional(self):
return self._base('optional', self)

@property
def repeat(self):
return self._base('repeat', self)

@property
def exactly(self):
return self._base('is_exactly', self)

def at_least(self, min):
return self._base('at_least', min, self)

def more_than(self, min):
return self._base('more_than', min, self)

def amt(self, amt):
return self._base('amt', amt, self)

def at_most(self, max):
return self._base('at_most', max, self)

def between(self, min, max, greedy=True, possessive=False):
return self._base('between', min, max, self, greedy=greedy, possessive=possessive)

def at_least_one(self, greedy=True, possessive=False):
return self._base('at_least_one', self, greedy=greedy, possessive=possessive)

def at_least_none(self, greedy=True, possessive=False):
return self._base('at_least_none', self, greedy=greedy, possessive=possessive)

def or_(self, input):
return self._base('either', self, input)

@property
def ASCII(self):
return self.set_flags('a')

@property
def IGNORECASE(self):
return self.set_flags('i')

@property
def DOTALL(self):
return self.set_flags('s')

@property
def LOCALE(self):
return self.set_flags('L')

@property
def MULTILINE(self):
return self.set_flags('m')

@property
def UNICODE(self):
return self.set_flags('u')


# Named operator functions
def append(self, input):
return self + input

def prepend(self, input):
return input + self

# Flag functions
@property
def flags(self):
return self._flags

def set_flags(self, to):
return self._copy(flags=to)

def add_flag(self, flag):
if flag not in self._flags:
return self._copy(flags=self._flags + flag)
return self

def remove_flag(self, flag):
if flag in self._flags:
return self._copy(flags=self._flags.replace(flag, ''))
return self

# Magic Functions
def __call__(self, *args, **kwargs):
""" This should be called by the user to specify the specific parameters of this instance i.e. anyof('a', 'b') """
Expand Down Expand Up @@ -250,14 +367,14 @@ def __add__(self, thing):
self._funcList + [partial(lambda cur=...: cur + self._sanitizeInput(thing))],
sanatize=self._sanatize or thing._sanatize if isinstance(thing, EZRegex) else self._sanatize,
replacement=self.replacement or thing.replacement if isinstance(thing, EZRegex) else self.replacement,
flags=(self.flags + thing.flags) if isinstance(thing, EZRegex) else self.flags
flags=(self._flags + thing.flags) if isinstance(thing, EZRegex) else self._flags
)

def __radd__(self, thing):
return self._copy([partial(lambda cur=...: self._sanitizeInput(thing) + cur)] + self._funcList,
sanatize=self._sanatize or thing._sanatize if isinstance(thing, EZRegex) else self._sanatize,
replacement=self.replacement or thing.replacement if isinstance(thing, EZRegex) else self.replacement,
flags=(self.flags + thing.flags) if isinstance(thing, EZRegex) else self.flags
flags=(self._flags + thing.flags) if isinstance(thing, EZRegex) else self._flags
)

def __iadd__(self, thing):
Expand Down Expand Up @@ -406,5 +523,8 @@ def __setattr__(self, name, value, ignore=False):
else:
raise TypeError('EZRegex objects are immutable')

def __delattr__(self, *args):
raise TypeError('EZRegex objects are immutable')
def __delattr__(self, name, ignore=False):
if ignore:
del self.__dict__[name]
else:
raise TypeError('EZRegex objects are immutable')
50 changes: 49 additions & 1 deletion ezregex/EZRegex.pyi
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import re
import sys
from functools import partial
from typing import Any, Callable, Iterator
from typing import Any, Callable, Iterator, Self

from mypy_extensions import DefaultNamedArg, VarArg

Expand Down Expand Up @@ -41,6 +41,54 @@ class EZRegex:
""" "Inverts" the current Regex expression to give an example of a string it would match.
Useful for debugging purposes. """

# Elemental functions
def group(self, name:str=None) -> EZRegex: ...
def named(self, name:str) -> EZRegex: ...
@property
def unnamed(self) -> EZRegex: ...
def not_preceded_by(self, input:InputType) -> EZRegex: ...
def preceded_by(self, input:InputType) -> EZRegex: ...
def not_proceded_by(self, input:InputType) -> EZRegex: ...
def proceded_by(self, input:InputType) -> EZRegex: ...
def enclosed_with(self, open:str, closed:str|None=None) -> EZRegex: ...
@property
def optional(self) -> EZRegex: ...
@property
def repeat(self) -> EZRegex: ...
@property
def exactly(self) -> EZRegex: ...
def at_least(self, min:int) -> EZRegex: ...
def more_than(self, min:int) -> EZRegex: ...
def amt(self, amt:int) -> EZRegex: ...
def at_most(self, max:int) -> EZRegex: ...
def between(self, min:int, max:int, greedy:bool=True, possessive:bool=False) -> EZRegex: ...
def at_least_one(self, greedy:bool=True, possessive:bool=False) -> EZRegex: ...
def at_least_none(self, greedy:bool=True, possessive:bool=False) -> EZRegex: ...
def or_(self, input:InputType) -> EZRegex: ...
@property
def ASCII(self) -> EZRegex: ...
@property
def IGNORECASE(self) -> EZRegex: ...
@property
def DOTALL(self) -> EZRegex: ...
@property
def LOCALE(self) -> EZRegex: ...
@property
def MULTILINE(self) -> EZRegex: ...
@property
def UNICODE(self) -> EZRegex: ...

# Named operator functions
def append(self, input:InputType) -> EZRegex: ...
def prepend(self, input:InputType) -> EZRegex: ...

# Flag functions
@property
def flags(self) -> str: ...
def set_flags(self, to:str) -> EZRegex: ...
def add_flag(self, flag:str) -> EZRegex: ...
def remove_flag(self, flag:str) -> EZRegex: ...

# Magic Functions
def __call__(self, *args, **kwargs) -> EZRegex | str:
""" This should be called by the user to specify the specific parameters of this instance i.e. anyof('a', 'b') """
Expand Down
27 changes: 13 additions & 14 deletions ezregex/base/elements.py
Original file line number Diff line number Diff line change
Expand Up @@ -262,12 +262,12 @@ def if_exists(num_or_name, does, doesnt, cur=...):
'unicode': {'definition': lambda name, cur=...: fr'\N{name}'},

# Amounts
'match_max': {'definition': match_max},
'match_num': {'definition': match_num},
'match_more_than': {'definition': match_more_than},
'match_at_least': {'definition': match_at_least},
'match_at_most': {'definition': match_at_most},
'match_range': {'definition': match_range},
'repeat': {'definition': match_max},
'amt': {'definition': match_num},
'more_than': {'definition': match_more_than},
'at_least': {'definition': match_at_least},
'at_most': {'definition': match_at_most},
'between': {'definition': match_range},
'at_least_one': {'definition': at_least_one},
'at_least_none': {'definition': at_least_none},

Expand Down Expand Up @@ -317,12 +317,12 @@ def if_exists(num_or_name, does, doesnt, cur=...):


psuedonymns = {
'match_max': ('matchMax',),
'match_at_most': ('matchAtMost', 'atMost', 'at_most',),
'match_num': ('matchNum', 'matchAmt', 'match_amt', 'amt', 'num',),
'match_range': ('matchRange',),
'match_more_than': ('matchMoreThan', 'match_greater_than', 'matchGreaterThan', 'moreThan', 'more_than',),
'match_at_least': ('matchAtLeast', 'match_min', 'matchMin', 'atLeast', 'at_least',),
'repeat': ('matchMax', 'match_max'),
'at_most': ('matchAtMost', 'atMost', 'match_at_most',),
'amt': ('matchNum', 'matchAmt', 'match_amt', 'match_num', 'num',),
'between': ('matchRange', 'match_range'),
'more_than': ('matchMoreThan', 'match_greater_than', 'matchGreaterThan', 'moreThan', 'match_more_than',),
'at_least': ('matchAtLeast', 'match_min', 'matchMin', 'atLeast', 'match_at_least',),
'line_starts_with': ('lineStartsWith', 'line_start', 'lineStart',),
'string_starts_with': ('stringStartsWith', 'string_start', 'stringStart',),
'line_ends_with': ('lineEndsWith', 'line_end', 'lineEnd',),
Expand All @@ -335,7 +335,6 @@ def if_exists(num_or_name, does, doesnt, cur=...):
'letter': ('alpha',),
'alpha_num': ('alphanum' , 'alpha_num',),
'whitechunk': ('white' ,),
'at_least_none': ('anyAmt', 'any_amt', 'zeroOrMore', 'zero_or_more',),
'any_between': ('anyBetween',),
'word_char': ('wordChar',),
'hex_digit': ('hexDigit', 'hex',),
Expand Down Expand Up @@ -366,7 +365,7 @@ def if_exists(num_or_name, does, doesnt, cur=...):
'is_exactly': ('exactly', 'isExactly',),
'optional': ('oneOrNone', 'one_or_none', 'opt',),
'at_least_one': ('oneOrMore', 'one_or_more', 'atLeastOne', 'atLeast1', 'at_least_1',),
'at_least_none': ('noneOrMore', 'none_or_more', 'atLeastNone', 'at_least_0', 'atLeast0',),
'at_least_none': ('noneOrMore', 'none_or_more', 'atLeastNone', 'at_least_0', 'atLeast0', 'anyAmt', 'any_amt', 'zeroOrMore', 'zero_or_more',),
'ASCII': ('ascii', 'a',),
'DOTALL': ('dotall', 's',),
'IGNORECASE': ('ignorecase', 'i', 'ignoreCase', 'ignore_case',),
Expand Down
8 changes: 4 additions & 4 deletions ezregex/base/interface.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,19 +92,19 @@ def match_num(num: int, input: InputType) -> EZRegex:
"Match `num` amount of `input` in the string"
...

def match_more_than(min: int, input: InputType) -> EZRegex:
def more_than(min: int, input: InputType) -> EZRegex:
"Match more than `min` sequences of `input` in the string"
...

def match_at_least(min:int, input:InputType) -> EZRegex:
def at_least(min:int, input:InputType) -> EZRegex:
"Match at least `min` sequences of `input` in the string"
...

def match_at_most(max:int, input:InputType) -> EZRegex:
def at_most(max:int, input:InputType) -> EZRegex:
"Match at most `max` instances of `input` in the string"
...

def match_range(min:int, max:int, input:InputType, greedy:bool=True, possessive:bool=False) -> EZRegex:
def between(min:int, max:int, input:InputType, greedy:bool=True, possessive:bool=False) -> EZRegex:
""" Match between `min` and `max` sequences of `input` in the string. This also accepts `greedy` and `possessive` parameters
Max can be an empty string to indicate no maximum
`greedy` means it will try to match as many repititions as possible
Expand Down
Loading

0 comments on commit 936a86c

Please sign in to comment.