Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(🐞) Column marker in error message off when line contains non ASCII character #102310

Closed
KotlinIsland opened this issue Feb 28, 2023 · 8 comments
Labels
3.10 only security fixes 3.11 only security fixes 3.12 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@KotlinIsland
Copy link
Contributor

KotlinIsland commented Feb 28, 2023

b"Ā"
👉 python test.py
  File "/test.py", line 1
    b"Ā"
         ^
SyntaxError: bytes can only contain ASCII literal characters

Here the caret is pointing to blackspace after the code

So each non-ASCII character is adding more incorrect offset:

👉 $c:temp = "b'ĀĀĀĀĀĀĀĀ'"
👉 py temp
  File "C:\temp", line 1
    b'ĀĀĀĀĀĀĀĀ'
                       ^

Expected

👉 python test.py
  File "/test.py", line 1
    b"Ā"
      ^
SyntaxError: bytes can only contain ASCII literal characters

Linked PRs

@KotlinIsland KotlinIsland added the type-bug An unexpected behavior, bug, or error label Feb 28, 2023
@AlexWaygood AlexWaygood added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Feb 28, 2023
@arhadthedev
Copy link
Member

Can confirm on Python 3.12.0a5+ (heads/main:bcadcde712, Feb 26 2023, 12:54:46) [MSC v.1929 64 bit (AMD64)] on win32.

Note: Ā here is UTF8-encoded as C4 80 (a single U+0100 LATIN CAPITAL LETTER A WITH MACRON). So incorrect row advancing on combined code points is totally ruled out until text normalization happens inside the parser.

@KotlinIsland KotlinIsland changed the title (🐞) Column marker in error message off for bytes with non ASCII character (🐞) Column marker in error message off when line contains non ASCII character Feb 28, 2023
@arhadthedev
Copy link
Member

arhadthedev commented Feb 28, 2023

@benjaminp, @pablogsal (as parser experts)

update: @gvanrossum, @lysnikolaou (as new parser experts)

@terryjreedy
Copy link
Member

In IDLE, Python 3.12.0a5+ (heads/main:0f89acf6cc, Feb 27 2023, 21:33:28) says "incomplete input" for this and "\N{a}", reported in #102312, and marks the closing quote, but this is effect of how codeop compiles.
In REPL, I see the above and the #102312 message. I closed #102312 as duplicate.

@terryjreedy terryjreedy added 3.11 only security fixes 3.10 only security fixes 3.12 bugs and security fixes labels Feb 28, 2023
@terryjreedy
Copy link
Member

Verified bug in installed 3.10.10, 3.11.2, and 3.12.a5.

@sobolevn
Copy link
Member

sobolevn commented Feb 28, 2023

What do you think about this?

» ./python.exe ex.py
  File "/Users/sobolev/Desktop/cpython/ex.py", line 1
    x = b"Ā"
        ^^^^^
SyntaxError: bytes can only contain ASCII literal characters

This can be achieved with:
Снимок экрана 2023-02-28 в 14 02 29

Or do we need to highlight the symbol exactly?
If my solution is fine, I can add a test case and send a PR.

@arhadthedev
Copy link
Member

If my solution is fine, I can add a test case and send a PR.

@sobolevn In absence of other feedback, your solution looks better than nothing. So it's worth to create a PR and get it merged before 3.12b1.

@sobolevn
Copy link
Member

Will do :)

@lysnikolaou
Copy link
Contributor

lysnikolaou commented Apr 25, 2023

Can this be closed?

Edit: Actually, I'm closing this. Re-open in case there's more to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.10 only security fixes 3.11 only security fixes 3.12 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

6 participants