Skip to content

fix quadratic backtracking in unicode escape normalization#5163

Merged
cobaltt7 merged 2 commits into
psf:mainfrom
sahvx655-wq:unicode-escape-quadratic-regex
Jun 3, 2026
Merged

fix quadratic backtracking in unicode escape normalization#5163
cobaltt7 merged 2 commits into
psf:mainfrom
sahvx655-wq:unicode-escape-quadratic-regex

Conversation

@sahvx655-wq
Copy link
Copy Markdown
Contributor

normalize_unicode_escape_sequences runs UNICODE_ESCAPE_RE over every string leaf, and the pattern matched a backslash run with (\+) immediately followed by a required escape body. A long run of backslashes with no trailing u/U/x/N escape makes the engine retry the run from each restart position, so formatting a string of backslashes is quadratic: a 64KB literal already takes minutes, and blackd accepts far larger bodies over the wire. Making the escape body optional lets the whole run be consumed in a single match, which restores linear time and leaves the output identical.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

diff-shades results comparing this PR (b06aa63) to main (d246367):

--preview style: no changes

--stable style: no changes


What is this? | Workflow run | diff-shades documentation

@JelleZijlstra
Copy link
Copy Markdown
Collaborator

Thanks, could you add a changelog entry? Something like "Improve performance on strings containing many consecutive backslashes"

@sahvx655-wq
Copy link
Copy Markdown
Contributor Author

Added it under Performance, thanks.

@cobaltt7 cobaltt7 merged commit 67ffc34 into psf:main Jun 3, 2026
57 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants