New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2.3a1 computes lastindex incorrectly #37819
Comments
In Python 2.[012], the code import re
exp = re.compile("(?P<NCName>[a-zA-Z_](\w|[_.-])*)")
print exp.match("namespace").lastgroup prints "NCName". In Python 2.3a1, it prints "None". The |
Logged In: YES I believe the discrepancy was deliberately introduced in revision 2.84 of _sre.c. I agree with you that lastindex should return the the index of the matching group with the rightmost closing parenthesis (perhaps some elaboration in the docs is also in order). If this is the correct interpretation, two places need to be patched: 1) the handling of SRE_OP_MARK needs to be reverted to the 2.22 code and 2) the code in the lastmark_restore function needs to be tweaked so that lastindex is not accidentally set to the last matched group entered. Thinking further though, given a (contrived) pattern like this: re.match('((x))y(?:(a)b|ac)', 'xyac') what should lastindex be? I assume 1, given the definition above (lastindex = matching group with rightmost close parens). In 2.22 it is 3, since group 3 matched before the branch failed at the 'b'. In 2.3a1 it is 2, since lastindex is restored (after the branch fails) using the saved lastmark. Anyway, if it should be 1, then I think _sre.c will have to save lastindex as well as lastmark when processing the three opcodes which may end up calling lastmark_restore. |
Logged In: YES Assigning to Gustavo, since he wrote 2.84. Gustavo, can you |
Logged In: YES Martin, the lastgroup/lastindex handling was quite broken in >>> import re
>>> exp = re.compile("(?P<NCName>[a-zA-Z_](\w|[_.-])*)")
>>> match = exp.match("namespace")
>>> match.groups()
('namespace', 'e')
>>> match.groupdict()
{'NCName': 'namespace'} This has the same result in any python you execute. This About the None result, that's also correct. In the example, lastgroup In the case above, the group didn't have a name. If we check Greg, your example is correctly showing one of the bugs in >>> re.match('((x))y(?:(a)b|ac)', 'xyac').groups()
('x', 'x', None)
>>> re.match('((x))y(?:(a)b|ac)', 'xyac').lastindex
2 (notice that groups always start in 1, as group 0 is the Martin, if you agree, please close the bug. If you have any |
Logged In: YES Gustavo, I agree that the numbering of groups is and should I also agree that *if* lastindex is 2, lastgroup should be None. I still think this value is incorrect, though. It is It is illogical because group 1 ends *after* group 2 ends, It is unhelpful because one of the primary purposes of It is incompatible because earlier Python versions behaved |
Logged In: YES Why do you think lastindex is incorrect? Isn't 2 the lastindex? >>> import re
>>> exp = re.compile("(?P<NCName>[a-zA-Z_](\w|[_.-])*)")
>>> match = exp.match("namespace")
>>> match.group(0)
'namespace'
>>> match.group(1)
'namespace'
>>> match.group(2)
'e' It works like this in all Python versions. Also, if you It is incompatible with old versions because old versions >>> exp = re.compile("(?P<NCName>[a-zA-Z_])")
>>> print exp.match("namespace").lastindex
1
How this changes anything? As we agreed, groups are numbered Hummm.. perhaps you think that the old behavior was to show >>> re.compile("(a(b)?)((c)d)?").match("abce").lastindex
4
>>> re.compile("(a(b)?)((c)d)?").match("abce").groups()
('ab', 'b', None, None) In Python 2.3: >>> re.compile("(a(b)?)((c)d)?").match("abce").lastindex
2
>>> re.compile("(a(b)?)((c)d)?").match("abce").groups()
('ab', 'b', None, None)
I don't understand this. If you want the entire match, just Can you show me how I've broken the Scanner? If PyXML is broken, it trusted in an undocumented and broken |
Logged In: YES I'll just add my two cents here. Gustavo, I think given your I also agree that it is an incompatible change. Although the FYI, I posted a patch here to revert back to the previous behavior: http://www.python.org/sf/712900 You two may want to look at it to see if it looks like it's on the right >>> re.compile("(a(b)?)((c)d)?").match("abce").lastindex
1 As you can see, it reports the correct value for lastindex given |
Logged In: YES
The problem here is defining what's the 2.2 definition. I've checked with other examples, and it looks like your If that's the case, the documentation is *very* misleading, |
Logged In: YES Concluding:
|
Logged In: YES I backed out the changes made in 2.84 which changed the Modules/_sre.c: 2.90 I'm also including some examples in the lastindex Sorry about the trouble this may have caused. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: