Skip to content

Fix infinite hang in regex parser for invalid UTF-8 with unicode flag#3857

Merged
DanielRosenwasser merged 3 commits into
mainfrom
copilot/fix-infinite-hang-tsgo
May 15, 2026
Merged

Fix infinite hang in regex parser for invalid UTF-8 with unicode flag#3857
DanielRosenwasser merged 3 commits into
mainfrom
copilot/fix-infinite-hang-tsgo

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 15, 2026

scanSourceCharacter() in unicode mode returned "" without advancing the position on invalid UTF-8, causing callers like scanAlternative to spin forever in their for p.pos() < p.end loops. The non-unicode path already handled this correctly by consuming the byte.

echo '/\x80/u' > bug.ts
tsgo --ignoreConfig bug.ts  # hangs forever
  • Fix: Advance past the invalid byte in the unicode-mode branch of scanSourceCharacter() before returning, matching the non-unicode behavior
  • Test: Added regexInvalidUtf8WithUnicodeFlag.ts compiler test

Copilot AI and others added 2 commits May 15, 2026 17:39
When scanSourceCharacter() encountered invalid UTF-8 in unicode mode,
it returned "" without advancing the position, causing callers in loops
(like scanAlternative) to loop forever. Now advance past the invalid
byte before returning.

Agent-Logs-Url: https://github.com/microsoft/typescript-go/sessions/31168130-b57f-43d2-a183-fbcb47341a53

Co-authored-by: jakebailey <5341706+jakebailey@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix infinite hang in tsgo for invalid UTF-8 in regex Fix infinite hang in regex parser for invalid UTF-8 with unicode flag May 15, 2026
Copilot AI requested a review from jakebailey May 15, 2026 17:41
@jakebailey jakebailey marked this pull request as ready for review May 15, 2026 17:48
Copilot AI review requested due to automatic review settings May 15, 2026 17:48
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes an infinite loop in the regex parser when scanning invalid UTF-8 bytes inside a unicode-mode regex (/u). Previously, scanSourceCharacter() returned "" without advancing the position on utf8.RuneError, causing callers like scanAlternative to spin forever.

Changes:

  • Advance past the invalid byte in the unicode-mode branch of scanSourceCharacter() to mirror non-unicode behavior.
  • Add a minimal compiler test reproducing the hang.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

File Description
internal/scanner/regexp.go Splits the size == 0/RuneError branch and consumes the invalid byte on RuneError to prevent infinite loops.
testdata/tests/cases/compiler/regexInvalidUtf8WithUnicodeFlag.ts New compiler test exercising invalid UTF-8 in a /u regex.
testdata/baselines/reference/compiler/regexInvalidUtf8WithUnicodeFlag.{js,types,symbols} Generated baselines for the new test.

@DanielRosenwasser DanielRosenwasser added this pull request to the merge queue May 15, 2026
Merged via the queue into main with commit 9522325 May 15, 2026
25 checks passed
@DanielRosenwasser DanielRosenwasser deleted the copilot/fix-infinite-hang-tsgo branch May 15, 2026 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Infinite hang in tsgo for invalid UTF-8 in /u or /v regex

4 participants