-
-
Notifications
You must be signed in to change notification settings - Fork 33k
gh-139516: Fix lambda colon start format spec in f-string in tokenizer #139657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Ping @pablogsal. I added the test to |
Please add a rest for the f-string test file as well as this will be a semantic test that needs to hold true even if we change the tokenizer of some other implementation doesn't have the same tokenizer |
Lib/test/test_fstring.py
Outdated
# gh-139516 | ||
# The '\n' is explicit to ensure no trailing whitespace which would invalidate the test. | ||
# Must use tokenize instead of compile so that source is parsed by line which exposes the bug. | ||
list(tokenize.tokenize(BytesIO('''f"{f(a=lambda: 'à'\n)}"'''.encode()).readline)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused. Isn't it possible to trigger this in an exec
or eval
call? Or perhaps a file with an encoding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See below VVV
Done. But I had to use Let me know if this test is good enough or if you want something else. |
yes, going via the tokenizer makes no sense here. The pourpose of what I asked is that alternative implementations will still run these tests files to check if they are compliant and we need to provide a way to run a file or exec some code and say "this is what we expect". You are triggering the bug via a specific aspect of CPython but I would prefer if we could trigger it end-to-end via a file. There are more tests executing python over files, check in |
Running error as script. |
LGTM Thank you very much @tom-pytel ! |
Thanks @tom-pytel for the PR, and @pablogsal for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13, 3.14. |
…kenizer (pythonGH-139657) (cherry picked from commit 539461d9ec8e5322ead638f7be733fd196aa6c79) Co-authored-by: Tomasz Pytel <tompytel@gmail.com>
Sorry, @tom-pytel and @pablogsal, I could not cleanly backport this to
|
GH-139701 is a backport of this pull request to the 3.14 branch. |
@tom-pytel can you make the backport following the instructions? |
Sure, in a bit. |
GH-139726 is a backport of this pull request to the 3.13 branch. |
A
=
followed by a:
in an f-string expression could cause the tokenizer to erroneously think it was starting a format spec, leading to incorrect internal state and possible decode errors if this results in split unicode characters on copy. This PR fixes this by disallowing=
to setin_debug
state unless it is encountered at the top level of an f-string expression.This problem exists back to py 3.13 and this PR can probably be backported easily enough.