Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The lines in tokens from tokenize.generate_tokens incorrectly indicate multiple lines. #104972

Closed
nedbat opened this issue May 26, 2023 · 6 comments
Labels
type-bug An unexpected behavior, bug, or error

Comments

@nedbat
Copy link
Member

nedbat commented May 26, 2023

The line attribute in tokens returned by tokenize.generate_tokens incorrectly indicate multiple lines. The tokens should have an invariant that using the .start and .end attributes to index into the .line attribute will produce the .string attribute.

tokbug.py:

import io
import sys
import tokenize

SOURCE = r"""
a + \
b
"""

print(sys.version)
readline = io.StringIO(SOURCE).readline
for tok in tokenize.generate_tokens(readline):
    correct = (tok.string) == (tok.line[tok.start[1]: tok.end[1]])
    print(tok, "" if correct else "<*****!!!")

Run with 3.12.0a7:

% /usr/local/pyenv/pyenv/versions/3.12.0a7/bin/python tokbug.py
3.12.0a7 (main, Apr  5 2023, 05:51:58) [Clang 14.0.3 (clang-1403.0.22.14.1)]
TokenInfo(type=62 (NL), string='\n', start=(1, 0), end=(1, 1), line='\n')
TokenInfo(type=1 (NAME), string='a', start=(2, 0), end=(2, 1), line='a + \\\n')
TokenInfo(type=54 (OP), string='+', start=(2, 2), end=(2, 3), line='a + \\\n')
TokenInfo(type=1 (NAME), string='b', start=(3, 0), end=(3, 1), line='b\n')
TokenInfo(type=4 (NEWLINE), string='\n', start=(3, 1), end=(3, 2), line='b\n')
TokenInfo(type=0 (ENDMARKER), string='', start=(4, 0), end=(4, 0), line='')

Run with 3.12.0b1:

% /usr/local/pyenv/pyenv/versions/3.12.0b1/bin/python tokbug.py
3.12.0b1 (main, May 23 2023, 16:19:59) [Clang 14.0.3 (clang-1403.0.22.14.1)]
TokenInfo(type=65 (NL), string='\n', start=(1, 0), end=(1, 1), line='\n')
TokenInfo(type=1 (NAME), string='a', start=(2, 0), end=(2, 1), line='a + \\\n')
TokenInfo(type=55 (OP), string='+', start=(2, 2), end=(2, 3), line='a + \\\n')
TokenInfo(type=1 (NAME), string='b', start=(3, 0), end=(3, 1), line='a + \\\nb\n') <*****!!!
TokenInfo(type=4 (NEWLINE), string='\n', start=(3, 1), end=(3, 2), line='a + \\\nb\n') <*****!!!
TokenInfo(type=0 (ENDMARKER), string='', start=(4, 0), end=(4, 0), line='')

Related to #104825? cc @pablogsal

Linked PRs

@nedbat nedbat added the type-bug An unexpected behavior, bug, or error label May 26, 2023
@nedbat
Copy link
Member Author

nedbat commented May 26, 2023

The tip of 3.12 shows one more style of change, due to omitting the newlines from .line:

3.12.0b1+ (heads/3.12:6324458bef, May 26 2023, 06:25:21) [Clang 14.0.3 (clang-1403.0.22.14.1)]
TokenInfo(type=65 (NL), string='\n', start=(1, 0), end=(1, 1), line='') <*****!!!
TokenInfo(type=1 (NAME), string='a', start=(2, 0), end=(2, 1), line='a + \\')
TokenInfo(type=55 (OP), string='+', start=(2, 2), end=(2, 3), line='a + \\')
TokenInfo(type=1 (NAME), string='b', start=(3, 0), end=(3, 1), line='a + \\\nb') <*****!!!
TokenInfo(type=4 (NEWLINE), string='\n', start=(3, 1), end=(3, 2), line='a + \\\nb') <*****!!!
TokenInfo(type=0 (ENDMARKER), string='', start=(4, 0), end=(4, 0), line='')

@pablogsal
Copy link
Member

CC: @mgmacias95

pablogsal added a commit to pablogsal/cpython that referenced this issue May 26, 2023
pablogsal added a commit to pablogsal/cpython that referenced this issue May 26, 2023
…e module are correct

Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
pablogsal added a commit to pablogsal/cpython that referenced this issue May 26, 2023
…e module are correct

Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
pablogsal added a commit to pablogsal/cpython that referenced this issue May 26, 2023
…e module are correct

Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 26, 2023
…e module are correct (pythonGH-104975)

(cherry picked from commit 3fdb55c)

Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
@pablogsal
Copy link
Member

@nedbat can you check with main now?

terryjreedy pushed a commit that referenced this issue May 26, 2023
…ze module are correct (GH-104975) (#104982)

gh-104972: Ensure that line attributes in tokens in the tokenize module are correct (GH-104975)
(cherry picked from commit 3fdb55c)

Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
@terryjreedy
Copy link
Member

If you set automerge on the backport, the usual test-hypothesis failure disabled it. I merged it.

@nedbat
Copy link
Member Author

nedbat commented May 26, 2023

@pablogsal Gorgeous! Thanks for the quick turnaround.

@pablogsal
Copy link
Member

If you set automerge on the backport, the usual test-hypothesis failure disabled it. I merged it.

Thanks a lot @terryjreedy !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

3 participants