-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unmatched Group issue - workaround #43640
Comments
Using sre.sub[n], an "unmatched group" error can occur. The test I used is this pattern: sre.sub("foo(?:b(ar)|baz)","\\1","foobaz") This will cause the following backtrace to occur: Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "lib/python2.4/sre.py", line 142, in sub
return _compile(pattern, 0).sub(repl, string, count)
File "lib/python2.4/sre.py", line 260, in filter
return sre_parse.expand_template(template, match)
File "lib/python2.4/sre_parse.py", line 782, in expand_template
raise error, "unmatched group"
sre_constants.error: unmatched group Python Version 2.4.3, Mac OS X (behaviour has been verified on This behaviour, while by design, is unwanted because this type of The example that I was trying resembles the following: sre.sub("User: (?:Registered User #(\d+)|Guest)","%USERID|\1%",data) The intended behaviour is that the function returns "" when the user is However, when this function encounters a Guest, it raises an exception Perl and other regex engines behave as I have described, substituting |
The current behavior also makes the "sub" function useless when you need to backreference a group that might not capture, since you have no chance to deal with the exception. |
AFAIK the findall function works as desired in this respect: empty matches will return empty strings. |
This is still a problem which has just given me a headache, because |
Hi All, I found a workaround for the re.sub method so it does not raise an This is the nutshell: When doing a search and replace with sub, replace the group represented If there’s nothing matched by this group the empty subexpression A complete description is in my post: Regards, Gerard. |
Looking at your code example, that solution seems quite obvious now, and |
How would I apply that workaround to my example? re.sub("foo(?:b(ar)|baz)","\\1","foobaz") |
Dear Bobby, I don't see what would be the part that generates the empty string? Regards, Gerard. |
Well, in this example the group (ar) is unmatched, so sre throws the A better example is probably The correct behaviour, as I have observed in other regex
implementations, is to replace the group by the empty string; for
example, in Javascript:
>>> 'foobar'.replace(/foo(?:b(ar)|baz)/,'$1')
"ar"
>>> 'foobaz'.replace(/foo(?:b(ar)|baz)/,'$1')
"" |
Bobby, Can you post the actual text you need this for? The back ref indeed Symantically speaking ... If there's a "b" then return the "ar", because Kind regards, Gerard. |
It was so long ago, I've since redone half my codebase (the hack is Sorry about that. |
This has been addressed in issue bpo-2636. |
Matthew, Thanx for the heads-up! Regards, Gerard. |
If I understand "This has been addressed in issue bpo-2636.", this issue should be closed as, perhaps, out-of-date or duplicate, with 2636 as superceder. Correct? |
Issue bpo-2636 resulted in the new regex module (also available on PyPI), so this issue is addressed by that, but there's no patch for the re module. |
It would be nice if you could port 'pieces' of bpo-2636 to Python, in order to fix this and other bugs (and possibly add more features too). |
I'm having the same issue as the original author of this issue was. The workaround does not apply to the situation where the captured text is on one side of an "or" grouping, rather than just being optional. I'm trying to remove groups of text in parentheses that come at the end of a string, but if the content in a pair of parentheses is a number, I want to retain it. My regular expression looks like so: These work:
>>> re.sub(r'(?:\((?:(\d+)|.*?)\)\s*)+$','\\1','avatar (2009)')
'avatar 2009'
>>> re.sub(r'(?:\((?:(\d+)|.*?)\)\s*)+$','\\1','avatar (2009) (special edition)')
'avatar 2009'
This doesn't:
>>> re.sub(r'(?:\((?:(\d+)|.*?)\)\s*)+$','\\1','avatar (special Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.6/re.py", line 151, in sub
return _compile(pattern, 0).sub(repl, string, count)
File "/usr/lib/python2.6/re.py", line 278, in filter
return sre_parse.expand_template(template, match)
File "/usr/lib/python2.6/sre_parse.py", line 793, in expand_template
raise error, "unmatched group"
sre_constants.error: unmatched groupedition)') Is there some way I can apply this workaround to this situation? |
Sorry, the non-working command should look as follows: re.sub(r'(?:\((?:(\d+)|.*?)\)\s*)+$','\\1','avatar (special edition)') |
The replacement can be a callable, so you could do this: re.sub(r'(?:\((?:(\d+)|.*?)\)\s*)+$', lambda m: m.group(1) or '', 'avatar (special edition)') |
Perfect; thank you! |
Here is a patch which make unmatched groups to be replaced by empty string. These changes looks rather as new feature than bug fix and therefore can be applied only to 3.5. |
New changeset bd2f1ea04025 by Serhiy Storchaka in branch 'default': |
Thank you for your review Antoine. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: