Fix infinite hang in regex parser for invalid UTF-8 with unicode flag#3857
Merged
Conversation
When scanSourceCharacter() encountered invalid UTF-8 in unicode mode, it returned "" without advancing the position, causing callers in loops (like scanAlternative) to loop forever. Now advance past the invalid byte before returning. Agent-Logs-Url: https://github.com/microsoft/typescript-go/sessions/31168130-b57f-43d2-a183-fbcb47341a53 Co-authored-by: jakebailey <5341706+jakebailey@users.noreply.github.com>
Agent-Logs-Url: https://github.com/microsoft/typescript-go/sessions/31168130-b57f-43d2-a183-fbcb47341a53 Co-authored-by: jakebailey <5341706+jakebailey@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Fix infinite hang in tsgo for invalid UTF-8 in regex
Fix infinite hang in regex parser for invalid UTF-8 with unicode flag
May 15, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes an infinite loop in the regex parser when scanning invalid UTF-8 bytes inside a unicode-mode regex (/u). Previously, scanSourceCharacter() returned "" without advancing the position on utf8.RuneError, causing callers like scanAlternative to spin forever.
Changes:
- Advance past the invalid byte in the unicode-mode branch of
scanSourceCharacter()to mirror non-unicode behavior. - Add a minimal compiler test reproducing the hang.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| internal/scanner/regexp.go | Splits the size == 0/RuneError branch and consumes the invalid byte on RuneError to prevent infinite loops. |
| testdata/tests/cases/compiler/regexInvalidUtf8WithUnicodeFlag.ts | New compiler test exercising invalid UTF-8 in a /u regex. |
| testdata/baselines/reference/compiler/regexInvalidUtf8WithUnicodeFlag.{js,types,symbols} | Generated baselines for the new test. |
jakebailey
approved these changes
May 15, 2026
DanielRosenwasser
approved these changes
May 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
scanSourceCharacter()in unicode mode returned""without advancing the position on invalid UTF-8, causing callers likescanAlternativeto spin forever in theirfor p.pos() < p.endloops. The non-unicode path already handled this correctly by consuming the byte.scanSourceCharacter()before returning, matching the non-unicode behaviorregexInvalidUtf8WithUnicodeFlag.tscompiler test