New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-35859: re module, fix wrong capturing groups in rare cases #11756

Open
wants to merge 10 commits into
base: master
from

Conversation

Projects
None yet
4 participants
@animalize
Copy link
Contributor

animalize commented Feb 4, 2019

MARK_PUSH(lastmark) macro didn't protect MARK 0 if it was the only available mark.

https://bugs.python.org/issue35859

animalize added some commits Jan 19, 2019

@animalize

This comment has been minimized.

Copy link
Contributor Author

animalize commented Feb 8, 2019

This fix is not correct, I'll update this PR when I can use my computer.

@animalize animalize changed the title bpo-35859: in re module, save marks before JUMP_MIN_UNTIL_3 jump bpo-35859: re module, fix wrong capturing groups in rare cases Feb 9, 2019

animalize added some commits Feb 9, 2019

@animalize animalize force-pushed the animalize:issue35859 branch from 3830fa0 to a279f3f Feb 18, 2019

animalize added some commits Feb 18, 2019

@animalize

This comment has been minimized.

Copy link
Contributor Author

animalize commented Feb 18, 2019

@serhiy-storchaka
I'm afraid this PR can't be merged to 2.7 branch automatically. I will create a PR for 2.7 branch tomorrow, along with the patch in #11546.

def test_bug_35859(self):
# Capture behavior depends on the order of an alternation
s = 'ab'
self.assertEqual(re.search(r'(ab|a)*?b', s).groups(), ('a',))

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Feb 18, 2019

Member

Why search() is used instead of match() or fullmatch()?

ab|a is equivalent to ab?. Is there a reason why use the former? If there is a difference, it is better to use .b|a instead, because ab|a can be transformed to ab? by the RE compiler in future versions.

This comment has been minimized.

@animalize

animalize Feb 18, 2019

Author Contributor

Nice catch, this PR doesn't fix the problem.

>>> re.match(r'(ab?)*?b', 'ab').groups()
('',)

The correct output should be:

>>> regex.match(r'(ab?)*?b', 'ab').groups()
('a',)

I will recheck the patch tomorrow.

s = 'ab'
self.assertEqual(re.search(r'(ab|a)*?b', s).groups(), ('a',))
self.assertEqual(re.search(r'(ab|a)+?b', s).groups(), ('a',))
self.assertEqual(re.search(r'(ab|a){0,}?b', s).groups(), ('a',))

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Feb 18, 2019

Member

X{0,}? is equivalent to X*?, so this test is redundant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment