-
-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
re module: number of named groups is limited to 100 max #66627
Comments
While writing a lexer for javascript language, I managed to hit the limit of named groups in one regexp, it's 100. The check is in sre_compile.py:compile() function, and there is even an XXX comment on this. Unfortunately, I'm not an expert in this module, so I'm not sure if this check can be lifted, or at least if the number can be bumped to 200 or 500 (why is 100 btw?) Please share your thoughts. |
It is 100 to avoid a syntactic ambiguity between numbered groups and octal numbers, if I remember correctly. I can't remember if that constraint still applies in python3, where the octal notation was made more strict in general. |
In the regex module, I borrowed the \g<...> escape from .sub's replacement string to provide an alternative way to refer to a group in a pattern, and that let me remove the limit. |
There is two reasons for this limitation. First reason is mentioned by David. There is no syntax to backreference a group with number > 99 (but there is a syntax for conditional groups and for substitutions). Second reason is that current implementation of regexp engine uses an array of constant size for groups. Here is a patch which increases static limit to 1000 groups. It also allows to specify arbitrary group number in form of "(?P=number)". This is conformed to the syntax of conditional groups and for substitutions. |
Serhiy, This is awesome! Is is possible to split the patch in two, and commit the one that just increases the groups limit to 3.4 as well? Thank you |
This is definitely not a bug fix. May be Matthew will commit it to the regex |
Here is a patch which removes static limit. It is much more complicated than the first patch and I prefer first apply the first patch. Aren't 1000 groups enough for everyone? |
I'm fine with either one, Serhiy. The static one looks good to me. |
New changeset 0b85ea4bd1af by Serhiy Storchaka in branch 'default': |
Thank you Antoine for your review. To avoid discrepancy between re and regex (and other engines), I have committed only a part of dynamic patch, without adding support of backreferences with index over 99. It is unlikely to achieve this limit in hand written regular expression, and in generated regular expression you can use named groups. I found that backreference syntax is one of most discrepant thing in regular expressions. There are at least 8 different variants (\N, \gN, \g<N>, \g{N}, \k<N>, \k'N', \k{N}, (?P=N)), and \g<N> in Perl have different meaning. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: