Skip to content

gh-145234: Fix SystemError in parser when \r is introduced after code…#145276

Closed
gourijain029-del wants to merge 1 commit intopython:mainfrom
gourijain029-del:fix-parser-r-systemerror
Closed

gh-145234: Fix SystemError in parser when \r is introduced after code…#145276
gourijain029-del wants to merge 1 commit intopython:mainfrom
gourijain029-del:fix-parser-r-systemerror

Conversation

@gourijain029-del
Copy link

@gourijain029-del gourijain029-del commented Feb 26, 2026

Description
This PR fixes a SystemError: Parser/string_parser.c:286: bad argument to internal function that occurred when a Python file used an encoding (like UTF-7) that introduced \r characters after decoding.

Root Cause
The crash was caused by a synchronization failure between the tokenizer, the lexer, and the string parser:

Tokenizer: When the file tokenizer recoded a line (e.g., from UTF-7 to UTF-8), it was not normalizing newlines. If the codec introduced a \r, it remained in the buffer.
Lexer: The lexer skipped \r characters but did not correctly trigger "beginning-of-line" (atbol) logic. This meant that if a \r followed a comment (#...), the lexer would remain in a state where it thought it was still on the same line, causing it to merge the comment and the subsequent string literal into a single, invalid token.
String Parser: When

_PyPegen_parse_string
received this broken token (which didn't start with a quote character), it raised a SystemError.
Changes

Parser/lexer/lexer.c
: Updated the lexer to treat a standalone \r as a full newline. It now correctly sets atbol = 1 and resets the current token start, preventing the "merging" of tokens across lines.

Parser/tokenizer/file_tokenizer.c
:
Updated

tok_readline_recode
to explicitly call

_PyTokenizer_translate_newlines
on the UTF-8 decoded buffer.
Optimized

tok_underflow_file
to immediately discard and re-decode the buffer as soon as a coding spec is identified, preventing raw bytes from leaking into the parser.

Lib/test/test_parser_utf7_r.py
: Added a new regression test that uses a UTF-7 encoded \r to reproduce the original crash.

@bedevere-app
Copy link

bedevere-app bot commented Feb 26, 2026

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@pablogsal
Copy link
Member

Closing as per https://devguide.python.org/getting-started/generative-ai/

Please don't randomly open AI generated PRs

@pablogsal pablogsal closed this Feb 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants