Skip to content

Fix JSX entity decoder skipping entities after non-entity ampersand#3897

Merged
DanielRosenwasser merged 5 commits into
mainfrom
copilot/fix-jsx-entity-decoder
May 18, 2026
Merged

Fix JSX entity decoder skipping entities after non-entity ampersand#3897
DanielRosenwasser merged 5 commits into
mainfrom
copilot/fix-jsx-entity-decoder

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 16, 2026

Fixes #1399

Analysis

The decodeEntities function in internal/transformers/jsxtransforms/jsx.go had a bug where it would find the first & and then search for the next ;, but didn't account for the possibility of another & appearing between them.

For example, with input &&:

  1. Find & at position 0
  2. Find ; at position 5
  3. Extract entity &amp (positions 1-5) — invalid because it contains &
  4. Write the whole thing as-is: &&

The TypeScript reference implementation uses a regex &((#((\d+)|x([\da-fA-F]+)))|(\w+)); where \w+ naturally excludes &, so it correctly matches only & starting from the second &.

Fix

Added an inner loop in the decodeEntities function that skips past all intervening & characters between the current & and the next ;, emitting each as a literal. The loop adjusts the semi index as it advances, so the entity span is only searched once. This correctly handles cases with multiple consecutive non-entity ampersands before a valid entity (e.g. a&b&c&"a&b&c&").

Test case jsxEntityDecoderAfterNonEntityAmpersand.tsx covers mixed entity/non-entity ampersand sequences including &&, a&b&c&d<e, &&, &&&, and a&b&c&.

Copilot Checklist

I successfully ran these commands at the end of my session, and they completed without error:

  • npx hereby build
  • npx hereby test
  • npx hereby lint
  • npx hereby format

When decodeEntities encounters an '&' followed by ';', it now checks if
there's another '&' between them. If so, the first '&' is treated as a
literal character and processing restarts from the next '&'. This fixes
cases like '&&' being incorrectly left as-is instead of decoded to '&&'.

Agent-Logs-Url: https://github.com/microsoft/typescript-go/sessions/5ff0998f-b249-4679-a68b-71dd0a3f168c

Co-authored-by: jakebailey <5341706+jakebailey@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix tsgo JSX entity decoder for non-entity ampersands Fix JSX entity decoder skipping entities after non-entity ampersand May 16, 2026
Copilot AI requested a review from jakebailey May 16, 2026 15:57
@jakebailey jakebailey marked this pull request as ready for review May 16, 2026 16:31
Copilot AI review requested due to automatic review settings May 16, 2026 16:31
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a bug in the JSX text entity decoder (decodeEntities) where a non-entity ampersand could cause the decoder to skip subsequent valid entities, aligning tsgo behavior with the TypeScript reference implementation.

Changes:

  • Update decodeEntities to detect an intervening & before ; and restart scanning so subsequent entities are still decoded.
  • Add a new compiler test case covering mixed entity and non-entity ampersand sequences in JSX text.
  • Add new reference baselines (.js, .types, .symbols, .errors.txt) for the test.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
internal/transformers/jsxtransforms/jsx.go Adjusts entity scanning logic to avoid consuming invalid &...; spans that contain another &.
testdata/tests/cases/compiler/jsxEntityDecoderAfterNonEntityAmpersand.tsx New regression test for entity decoding after non-entity ampersands in JSX text.
testdata/baselines/reference/compiler/jsxEntityDecoderAfterNonEntityAmpersand.js Expected JS emit showing decoded JSX text strings.
testdata/baselines/reference/compiler/jsxEntityDecoderAfterNonEntityAmpersand.types Expected type baseline for the new test.
testdata/baselines/reference/compiler/jsxEntityDecoderAfterNonEntityAmpersand.symbols Expected symbol baseline for the new test.
testdata/baselines/reference/compiler/jsxEntityDecoderAfterNonEntityAmpersand.errors.txt Expected diagnostic baseline for the new test.

// Check if there's another '&' between the current '&' and the ';'.
// If so, the current '&' is not part of a valid entity; write it as
// a literal and restart from the next '&'.
nextAmp := strings.IndexByte(text[1:], '&')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot check and consider similar problems

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Narrowed the search from text[1:] to text[1:semi] so it only inspects the candidate entity span. I also checked the rest of the function — the other two IndexByte calls (lines 864 and 900) search for & to find the next entity start, so they correctly need to scan the full remaining text. Fixed in 6c7989a.

Narrows the IndexByte scan from text[1:] to text[1:semi] so only the
candidate entity span is inspected, avoiding an unnecessary O(n) scan
of the remaining text for each entity.

Agent-Logs-Url: https://github.com/microsoft/typescript-go/sessions/895e4b37-80ee-4c96-af8b-acf8a085ba78

Co-authored-by: jakebailey <5341706+jakebailey@users.noreply.github.com>
Copilot AI requested a review from jakebailey May 16, 2026 16:50
// @target: esnext

const a = <div>&&amp;</div>;
const b = <div>a&b&amp;c&d&lt;e</div>;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is totally missing the case of a&b&c&amp;, isn't it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a&b&c&amp; as test case e. It correctly produces "a&b&c&". Fixed in 0654017.

// If so, the current '&' is not part of a valid entity; write it as
// a literal and restart from the next '&'.
nextAmp := strings.IndexByte(text[1:semi], '&')
if nextAmp >= 0 {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably needs to be a loop.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored into a proper inner for loop that skips all intervening & characters in one pass, adjusting semi as it goes. Fixed in 0654017.

@jakebailey
Copy link
Copy Markdown
Member

@copilot address comments

Comment on lines +883 to +884
for nextAmp := strings.IndexByte(text[1:semi], '&'); nextAmp >= 0; nextAmp = strings.IndexByte(text[1:semi], '&') {
result.WriteString(text[:nextAmp+1])
Copy link
Copy Markdown
Member

@DanielRosenwasser DanielRosenwasser May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd just prefer

Suggested change
for nextAmp := strings.IndexByte(text[1:semi], '&'); nextAmp >= 0; nextAmp = strings.IndexByte(text[1:semi], '&') {
result.WriteString(text[:nextAmp+1])
for {
nextAmp := strings.IndexByte(text[1:semi], '&')
if nextAmp >= 0 {
break
}
result.WriteString(text[:nextAmp+1])

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot do it

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restructured to for { ... break } style in 94e06d5.

Copy link
Copy Markdown
Member

@DanielRosenwasser DanielRosenwasser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks okay, but @jakebailey what do you think of the nit I left?

@DanielRosenwasser DanielRosenwasser added this pull request to the merge queue May 18, 2026
Merged via the queue into main with commit 1bf0f86 May 18, 2026
21 checks passed
@DanielRosenwasser DanielRosenwasser deleted the copilot/fix-jsx-entity-decoder branch May 18, 2026 21:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tsgo JSX entity decoder skips entities that follow a non-entity ampersand

4 participants