Skip to content

reject surrogate code points in \U unicode escapes#493

Merged
frostming merged 1 commit into
python-poetry:masterfrom
netliomax25-code:reject-surrogate-unicode-escape
Jun 8, 2026
Merged

reject surrogate code points in \U unicode escapes#493
frostming merged 1 commit into
python-poetry:masterfrom
netliomax25-code:reject-surrogate-unicode-escape

Conversation

@netliomax25-code

Copy link
Copy Markdown
Contributor
  1. _peek_unicode detects surrogates from the first two hex digits, which only matches the 4-digit \u form.
  2. The 8-digit \U form reaches the surrogate range with leading zeros (\U0000D800), so the check is skipped and the escape is accepted as a lone surrogate that fails UTF-8 encoding.
  3. Test the decoded code point against U+D800..U+DFFF instead, so both forms are rejected with InvalidUnicodeValueError.

Valid boundaries (U+D7FF, U+E000) and astral code points still parse; added a regression test.

@frostming frostming merged commit b1399c3 into python-poetry:master Jun 8, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants