`generate_tokens` tokenizes `$` differently with Python 3.12 than earlier #104802

pekkaklarck · 2023-05-23T16:43:48Z

I tested Python 3.12 beta 1 with Robot Framework and noticed that tokenize.generate_tokens() handles expressions containing $ differently than earlier. Earlier $ yielded ERRORTOKEN but nowadays we get OP:

Python 3.11.3 (main, Apr  5 2023, 14:15:06) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tokenize import generate_tokens
>>> from io import StringIO
>>> next(generate_tokens(StringIO('$x').readline))
TokenInfo(type=60 (ERRORTOKEN), string='$', start=(1, 0), end=(1, 1), line='$x')

Python 3.12.0b1 (main, May 22 2023, 23:31:26) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tokenize import generate_tokens
>>> from io import StringIO
>>> next(generate_tokens(StringIO('$x').readline))
TokenInfo(type=55 (OP), string='$', start=(1, 0), end=(1, 1), line='$x\n')

We support Python evaluation with special variables like $var > 1 in Robot Framework data and this change breaks our tokenizing code. I didn't notice anything related in the release notes and decided to report this. If the change is intentional, we can easily update our code to handle also these semantics.

Notice also that there's a small change with TokenInfo.line above. With Python 3.12 there's an additional \n even though the original string didn't contain any newlines.

The text was updated successfully, but these errors were encountered:

sunmy2019 · 2023-05-23T16:49:05Z

I didn't notice anything related in the release notes

It just makes it to the deadline. I think the docs can be updated later.
#104323 (comment)

pekkaklarck · 2023-05-23T17:05:06Z

If I understand the referenced comment correctly, generate_tokens nowadays uses a totally different tokenizer underneath and there can be subtle differences like this one. If that's the case, this likely won't be considered a bug and the old behavior won't be restored. That's fine, I can update our code, but mentioning this under backwards incompatible changes in the release notes would be a good idea.

pablogsal · 2023-05-23T23:27:22Z

Thanks for raising this with us! I can confirm that this is a expected side effect of the change. Docs are being worked on here:

#104824

I am closing this issue as we will cover the docs part on the linked PR.

pablogsal · 2023-05-23T23:37:24Z

Opened #104825 for the new line character

…rror Based on PR review

…ingle cwf Don't add new error cases when refactoring; simply ensure that the code avoids an IndexError for the cases it attempts to handle

pekkaklarck added the type-bug An unexpected behavior, bug, or error label May 23, 2023

pekkaklarck mentioned this issue May 23, 2023

Python 3.12 compatibility robotframework/robotframework#4771

Closed

arhadthedev added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label May 23, 2023

AlexWaygood assigned pablogsal and lysnikolaou May 23, 2023

pablogsal closed this as not planned Won't fix, can't repro, duplicate, stale May 23, 2023

pablogsal mentioned this issue May 23, 2023

gh-102856: Add changes related to PEP 701 in 3.12 What's New docs #104824

Merged

d0b3rm4n mentioned this issue Oct 9, 2023

[Bug] Python 3.12 and 3.11 do not produce the same report MarketSquare/robotframework-robocop#968

Closed

fsc-eriker added a commit to fsc-eriker/cpython that referenced this issue Feb 1, 2024

Update for pythonGH-104802: register a defect instead of raising an e…

a614ff1

…rror Based on PR review

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`generate_tokens` tokenizes `$` differently with Python 3.12 than earlier #104802

`generate_tokens` tokenizes `$` differently with Python 3.12 than earlier #104802

pekkaklarck commented May 23, 2023

sunmy2019 commented May 23, 2023

pekkaklarck commented May 23, 2023

pablogsal commented May 23, 2023 •

edited

pablogsal commented May 23, 2023

generate_tokens tokenizes $ differently with Python 3.12 than earlier #104802

generate_tokens tokenizes $ differently with Python 3.12 than earlier #104802

Comments

pekkaklarck commented May 23, 2023

sunmy2019 commented May 23, 2023

pekkaklarck commented May 23, 2023

pablogsal commented May 23, 2023 • edited

pablogsal commented May 23, 2023

`generate_tokens` tokenizes `$` differently with Python 3.12 than earlier #104802

`generate_tokens` tokenizes `$` differently with Python 3.12 than earlier #104802

pablogsal commented May 23, 2023 •

edited