Skip to content

Conversation

@Abhi210
Copy link
Contributor

@Abhi210 Abhi210 commented Nov 3, 2025

This PR adds validation to re.Scanner.__init__ that rejects lexicon patterns containing capturing groups. If a user-supplied pattern contains any capturing groups, Scanner now raises ValueError with a clear message advising the use of non-capturing groups (?:...) instead. Further, tests were added to assert ValueError for lexicons containing capturing groups and a passing test for non-capturing group.

@Abhi210 Abhi210 force-pushed the fix-scanner-capturing-groups-backup branch from e9e76d0 to 12bf67b Compare November 3, 2025 13:13
It adds validation to re.Scanner.init that rejects lexicon patterns containing capturing groups. If a user-supplied pattern contains any capturing groups, Scanner now raises ValueError with a clear message advising the use of non-capturing groups (?:...) instead.
@Abhi210 Abhi210 force-pushed the fix-scanner-capturing-groups-backup branch from 0ebfddd to 29db6ca Compare November 3, 2025 14:00
Copy link

@wjssz wjssz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After you update the patch, you may ask core developer Serhiy to review.

@Abhi210 Abhi210 force-pushed the fix-scanner-capturing-groups-backup branch from f1d2c8b to 81f3675 Compare November 4, 2025 05:39
Copy link

@wjssz wjssz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that you are a beginner, please be patient.
After my review is completed, then ask core developer Serhiy to review.

for phrase, action in lexicon:
sub_pattern = _parser.parse(phrase, flags)
if sub_pattern.state.groups != 1:
raise ValueError(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can write in one line. A line should <= 80 characters.

raise ValueError("Can not use capturing groups in re.Scanner.")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion. I have resolved it now


#Capturing group throws an error
lex = [("(a)b", None)]
with self.assertRaises(ValueError):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may check exception message.

msg = "Can not use capturing groups in re.Scanner"
with self.assertRaisesRegex(ValueError, msg):
    ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion! I have resolved it now. Need to learn a lot!

@Abhi210 Abhi210 force-pushed the fix-scanner-capturing-groups-backup branch from 81f3675 to 401969c Compare November 4, 2025 06:35
'op+', 'bar'], ''))

def test_bug_140797(self):
#bug 140797: remove capturing groups compilation form re.Scanner
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a space after # in comments.

- #Capturing group throws an error
+ # Capturing group throws an error

And add a space after , in functions arguments.

- with self.assertRaisesRegex(ValueError,msg):
+ with self.assertRaisesRegex(ValueError, msg):

Then looks good to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops! Thank you again! Resolved

@Abhi210 Abhi210 force-pushed the fix-scanner-capturing-groups-backup branch from 401969c to db0ea4a Compare November 4, 2025 06:49
(['sum', 'op=', 3, 'op*', 'foo', 'op+', 312.5,
'op+', 'bar'], ''))

def test_bug_140797(self):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My negligence, please use test_bug_gh140797 as the name.
If no "gh", It may refer to the previous bug tracker.
Sorry for this.

- # bug 140797: remove capturing groups compilation form re.Scanner
+ # gh140797: capturing groups is not allowed in re.Scanner

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Thank you again for your time and suggestions!

@Abhi210 Abhi210 force-pushed the fix-scanner-capturing-groups-backup branch from db0ea4a to 0f9a934 Compare November 4, 2025 07:05
# Capturing group throws an error
lex = [("(a)b", None)]
with self.assertRaisesRegex(ValueError, msg):
Scanner(lex)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This saves a line...
My fault again.

- Scanner(lex)
+ Scanner([("(a)b", None)])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Testing sure takes the time 😂

@Abhi210 Abhi210 force-pushed the fix-scanner-capturing-groups-backup branch from 0f9a934 to 0197f65 Compare November 4, 2025 07:14
@wjssz
Copy link

wjssz commented Nov 4, 2025

I have checked basic problems, @serhiy-storchaka please review.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your PR @Abhi210 and thank you for your review @wjssz.

There are some minor nitpicks, overall LGTM.

for phrase, action in lexicon:
sub_pattern = _parser.parse(phrase, flags)
if sub_pattern.state.groups != 1:
raise ValueError("Can not use capturing groups in re.Scanner")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Cannot". This is the most commonly used variant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Thank you

@@ -0,0 +1,4 @@
The re.Scanner class now forbids regular expressions containing capturing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention that that class is undocumented. You can also use some formatting, even if the link does not work: :class:`!re.Scanner`.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Thank you 😊

@Abhi210 Abhi210 force-pushed the fix-scanner-capturing-groups-backup branch from 0197f65 to feaee4e Compare November 4, 2025 08:31
Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two more nitpicks and LGTM. 👍

Abhi210 and others added 2 commits November 4, 2025 15:29
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@serhiy-storchaka serhiy-storchaka merged commit fa9c3ee into python:main Nov 4, 2025
46 checks passed
@serhiy-storchaka serhiy-storchaka added the needs backport to 3.13 bugs and security fixes label Nov 4, 2025
@serhiy-storchaka serhiy-storchaka added the needs backport to 3.14 bugs and security fixes label Nov 4, 2025
@miss-islington-app
Copy link

Thanks @Abhi210 for the PR, and @serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 3.14.
🐍🍒⛏🤖

@miss-islington-app
Copy link

Thanks @Abhi210 for the PR, and @serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Nov 4, 2025
…ns (pythonGH-140944)

(cherry picked from commit fa9c3ee)

Co-authored-by: Abhishek Tiwari <Abhi210@users.noreply.github.com>
@bedevere-app
Copy link

bedevere-app bot commented Nov 4, 2025

GH-140982 is a backport of this pull request to the 3.14 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.14 bugs and security fixes label Nov 4, 2025
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Nov 4, 2025
…ns (pythonGH-140944)

(cherry picked from commit fa9c3ee)

Co-authored-by: Abhishek Tiwari <Abhi210@users.noreply.github.com>
@bedevere-app
Copy link

bedevere-app bot commented Nov 4, 2025

GH-140983 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Nov 4, 2025
serhiy-storchaka pushed a commit that referenced this pull request Nov 4, 2025
…rns (GH-140944) (GH-140983)

(cherry picked from commit fa9c3ee)

Co-authored-by: Abhishek Tiwari <Abhi210@users.noreply.github.com>
serhiy-storchaka pushed a commit that referenced this pull request Nov 4, 2025
…rns (GH-140944) (GH-140982)

(cherry picked from commit fa9c3ee)

Co-authored-by: Abhishek Tiwari <Abhi210@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants