-
-
Notifications
You must be signed in to change notification settings - Fork 33.3k
gh-140797: Forbid capturing groups in re.Scanner lexicon patterns #140944
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-140797: Forbid capturing groups in re.Scanner lexicon patterns #140944
Conversation
12bf67b to
e9e76d0
Compare
e9e76d0 to
12bf67b
Compare
It adds validation to re.Scanner.init that rejects lexicon patterns containing capturing groups. If a user-supplied pattern contains any capturing groups, Scanner now raises ValueError with a clear message advising the use of non-capturing groups (?:...) instead.
0ebfddd to
29db6ca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After you update the patch, you may ask core developer Serhiy to review.
f1d2c8b to
81f3675
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that you are a beginner, please be patient.
After my review is completed, then ask core developer Serhiy to review.
Lib/re/__init__.py
Outdated
| for phrase, action in lexicon: | ||
| sub_pattern = _parser.parse(phrase, flags) | ||
| if sub_pattern.state.groups != 1: | ||
| raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can write in one line. A line should <= 80 characters.
raise ValueError("Can not use capturing groups in re.Scanner.")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the suggestion. I have resolved it now
Lib/test/test_re.py
Outdated
|
|
||
| #Capturing group throws an error | ||
| lex = [("(a)b", None)] | ||
| with self.assertRaises(ValueError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may check exception message.
msg = "Can not use capturing groups in re.Scanner"
with self.assertRaisesRegex(ValueError, msg):
...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the suggestion! I have resolved it now. Need to learn a lot!
81f3675 to
401969c
Compare
Lib/test/test_re.py
Outdated
| 'op+', 'bar'], '')) | ||
|
|
||
| def test_bug_140797(self): | ||
| #bug 140797: remove capturing groups compilation form re.Scanner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a space after # in comments.
- #Capturing group throws an error
+ # Capturing group throws an errorAnd add a space after , in functions arguments.
- with self.assertRaisesRegex(ValueError,msg):
+ with self.assertRaisesRegex(ValueError, msg):Then looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops! Thank you again! Resolved
401969c to
db0ea4a
Compare
Lib/test/test_re.py
Outdated
| (['sum', 'op=', 3, 'op*', 'foo', 'op+', 312.5, | ||
| 'op+', 'bar'], '')) | ||
|
|
||
| def test_bug_140797(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My negligence, please use test_bug_gh140797 as the name.
If no "gh", It may refer to the previous bug tracker.
Sorry for this.
- # bug 140797: remove capturing groups compilation form re.Scanner
+ # gh140797: capturing groups is not allowed in re.ScannerThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Thank you again for your time and suggestions!
db0ea4a to
0f9a934
Compare
Lib/test/test_re.py
Outdated
| # Capturing group throws an error | ||
| lex = [("(a)b", None)] | ||
| with self.assertRaisesRegex(ValueError, msg): | ||
| Scanner(lex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This saves a line...
My fault again.
- Scanner(lex)
+ Scanner([("(a)b", None)])There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Testing sure takes the time 😂
0f9a934 to
0197f65
Compare
|
I have checked basic problems, @serhiy-storchaka please review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lib/re/__init__.py
Outdated
| for phrase, action in lexicon: | ||
| sub_pattern = _parser.parse(phrase, flags) | ||
| if sub_pattern.state.groups != 1: | ||
| raise ValueError("Can not use capturing groups in re.Scanner") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Cannot". This is the most commonly used variant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Thank you
| @@ -0,0 +1,4 @@ | |||
| The re.Scanner class now forbids regular expressions containing capturing | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mention that that class is undocumented. You can also use some formatting, even if the link does not work: :class:`!re.Scanner`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Thank you 😊
0197f65 to
feaee4e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two more nitpicks and LGTM. 👍
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
|
Thanks @Abhi210 for the PR, and @serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 3.14. |
|
Thanks @Abhi210 for the PR, and @serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13. |
…ns (pythonGH-140944) (cherry picked from commit fa9c3ee) Co-authored-by: Abhishek Tiwari <Abhi210@users.noreply.github.com>
|
GH-140982 is a backport of this pull request to the 3.14 branch. |
…ns (pythonGH-140944) (cherry picked from commit fa9c3ee) Co-authored-by: Abhishek Tiwari <Abhi210@users.noreply.github.com>
|
GH-140983 is a backport of this pull request to the 3.13 branch. |
This PR adds validation to
re.Scanner.__init__that rejects lexicon patterns containing capturing groups. If a user-supplied pattern contains any capturing groups, Scanner now raisesValueErrorwith a clear message advising the use of non-capturing groups (?:...) instead. Further, tests were added to assertValueErrorfor lexicons containing capturing groups and a passing test for non-capturing group.re.Scanner#140797