Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generate_tokens tokenizes $ differently with Python 3.12 than earlier #104802

Closed
pekkaklarck opened this issue May 23, 2023 · 4 comments
Closed
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@pekkaklarck
Copy link

I tested Python 3.12 beta 1 with Robot Framework and noticed that tokenize.generate_tokens() handles expressions containing $ differently than earlier. Earlier $ yielded ERRORTOKEN but nowadays we get OP:

Python 3.11.3 (main, Apr  5 2023, 14:15:06) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tokenize import generate_tokens
>>> from io import StringIO
>>> next(generate_tokens(StringIO('$x').readline))
TokenInfo(type=60 (ERRORTOKEN), string='$', start=(1, 0), end=(1, 1), line='$x')
Python 3.12.0b1 (main, May 22 2023, 23:31:26) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tokenize import generate_tokens
>>> from io import StringIO
>>> next(generate_tokens(StringIO('$x').readline))
TokenInfo(type=55 (OP), string='$', start=(1, 0), end=(1, 1), line='$x\n')

We support Python evaluation with special variables like $var > 1 in Robot Framework data and this change breaks our tokenizing code. I didn't notice anything related in the release notes and decided to report this. If the change is intentional, we can easily update our code to handle also these semantics.

Notice also that there's a small change with TokenInfo.line above. With Python 3.12 there's an additional \n even though the original string didn't contain any newlines.

@pekkaklarck pekkaklarck added the type-bug An unexpected behavior, bug, or error label May 23, 2023
@sunmy2019
Copy link
Member

I didn't notice anything related in the release notes

It just makes it to the deadline. I think the docs can be updated later.
#104323 (comment)

@pekkaklarck
Copy link
Author

If I understand the referenced comment correctly, generate_tokens nowadays uses a totally different tokenizer underneath and there can be subtle differences like this one. If that's the case, this likely won't be considered a bug and the old behavior won't be restored. That's fine, I can update our code, but mentioning this under backwards incompatible changes in the release notes would be a good idea.

@pablogsal
Copy link
Member

pablogsal commented May 23, 2023

Thanks for raising this with us! I can confirm that this is a expected side effect of the change. Docs are being worked on here:

#104824

I am closing this issue as we will cover the docs part on the linked PR.

@pablogsal pablogsal closed this as not planned Won't fix, can't repro, duplicate, stale May 23, 2023
@pablogsal
Copy link
Member

Opened #104825 for the new line character

fsc-eriker added a commit to fsc-eriker/cpython that referenced this issue Feb 1, 2024
fsc-eriker added a commit to fsc-eriker/cpython that referenced this issue Feb 14, 2024
…ingle cwf

Don't add new error cases when refactoring; simply ensure that the code avoids an IndexError for the cases it attempts to handle
zware pushed a commit to fsc-eriker/cpython that referenced this issue Feb 14, 2024
…ingle cwf

Don't add new error cases when refactoring; simply ensure that the code avoids an IndexError for the cases it attempts to handle
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

5 participants