Fix JSX entity decoder skipping entities after non-entity ampersand#3897
Conversation
When decodeEntities encounters an '&' followed by ';', it now checks if there's another '&' between them. If so, the first '&' is treated as a literal character and processing restarts from the next '&'. This fixes cases like '&&' being incorrectly left as-is instead of decoded to '&&'. Agent-Logs-Url: https://github.com/microsoft/typescript-go/sessions/5ff0998f-b249-4679-a68b-71dd0a3f168c Co-authored-by: jakebailey <5341706+jakebailey@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Fixes a bug in the JSX text entity decoder (decodeEntities) where a non-entity ampersand could cause the decoder to skip subsequent valid entities, aligning tsgo behavior with the TypeScript reference implementation.
Changes:
- Update
decodeEntitiesto detect an intervening&before;and restart scanning so subsequent entities are still decoded. - Add a new compiler test case covering mixed entity and non-entity ampersand sequences in JSX text.
- Add new reference baselines (
.js,.types,.symbols,.errors.txt) for the test.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
internal/transformers/jsxtransforms/jsx.go |
Adjusts entity scanning logic to avoid consuming invalid &...; spans that contain another &. |
testdata/tests/cases/compiler/jsxEntityDecoderAfterNonEntityAmpersand.tsx |
New regression test for entity decoding after non-entity ampersands in JSX text. |
testdata/baselines/reference/compiler/jsxEntityDecoderAfterNonEntityAmpersand.js |
Expected JS emit showing decoded JSX text strings. |
testdata/baselines/reference/compiler/jsxEntityDecoderAfterNonEntityAmpersand.types |
Expected type baseline for the new test. |
testdata/baselines/reference/compiler/jsxEntityDecoderAfterNonEntityAmpersand.symbols |
Expected symbol baseline for the new test. |
testdata/baselines/reference/compiler/jsxEntityDecoderAfterNonEntityAmpersand.errors.txt |
Expected diagnostic baseline for the new test. |
| // Check if there's another '&' between the current '&' and the ';'. | ||
| // If so, the current '&' is not part of a valid entity; write it as | ||
| // a literal and restart from the next '&'. | ||
| nextAmp := strings.IndexByte(text[1:], '&') |
There was a problem hiding this comment.
Good catch. Narrowed the search from text[1:] to text[1:semi] so it only inspects the candidate entity span. I also checked the rest of the function — the other two IndexByte calls (lines 864 and 900) search for & to find the next entity start, so they correctly need to scan the full remaining text. Fixed in 6c7989a.
Narrows the IndexByte scan from text[1:] to text[1:semi] so only the candidate entity span is inspected, avoiding an unnecessary O(n) scan of the remaining text for each entity. Agent-Logs-Url: https://github.com/microsoft/typescript-go/sessions/895e4b37-80ee-4c96-af8b-acf8a085ba78 Co-authored-by: jakebailey <5341706+jakebailey@users.noreply.github.com>
| // @target: esnext | ||
|
|
||
| const a = <div>&&</div>; | ||
| const b = <div>a&b&c&d<e</div>; |
There was a problem hiding this comment.
This is totally missing the case of a&b&c&, isn't it?
There was a problem hiding this comment.
Added a&b&c& as test case e. It correctly produces "a&b&c&". Fixed in 0654017.
| // If so, the current '&' is not part of a valid entity; write it as | ||
| // a literal and restart from the next '&'. | ||
| nextAmp := strings.IndexByte(text[1:semi], '&') | ||
| if nextAmp >= 0 { |
There was a problem hiding this comment.
This probably needs to be a loop.
There was a problem hiding this comment.
Refactored into a proper inner for loop that skips all intervening & characters in one pass, adjusting semi as it goes. Fixed in 0654017.
|
@copilot address comments |
…case Agent-Logs-Url: https://github.com/microsoft/typescript-go/sessions/2a9af1f9-b711-45d4-9abb-dccb033ef954 Co-authored-by: jakebailey <5341706+jakebailey@users.noreply.github.com>
| for nextAmp := strings.IndexByte(text[1:semi], '&'); nextAmp >= 0; nextAmp = strings.IndexByte(text[1:semi], '&') { | ||
| result.WriteString(text[:nextAmp+1]) |
There was a problem hiding this comment.
I think I'd just prefer
| for nextAmp := strings.IndexByte(text[1:semi], '&'); nextAmp >= 0; nextAmp = strings.IndexByte(text[1:semi], '&') { | |
| result.WriteString(text[:nextAmp+1]) | |
| for { | |
| nextAmp := strings.IndexByte(text[1:semi], '&') | |
| if nextAmp >= 0 { | |
| break | |
| } | |
| result.WriteString(text[:nextAmp+1]) |
There was a problem hiding this comment.
Restructured to for { ... break } style in 94e06d5.
DanielRosenwasser
left a comment
There was a problem hiding this comment.
Looks okay, but @jakebailey what do you think of the nit I left?
Agent-Logs-Url: https://github.com/microsoft/typescript-go/sessions/31b2e5d6-fd33-413f-8a6e-04cc6aae425b Co-authored-by: jakebailey <5341706+jakebailey@users.noreply.github.com>
Fixes #1399
Analysis
The
decodeEntitiesfunction ininternal/transformers/jsxtransforms/jsx.gohad a bug where it would find the first&and then search for the next;, but didn't account for the possibility of another&appearing between them.For example, with input
&&:&at position 0;at position 5&(positions 1-5) — invalid because it contains&&&The TypeScript reference implementation uses a regex
&((#((\d+)|x([\da-fA-F]+)))|(\w+));where\w+naturally excludes&, so it correctly matches only&starting from the second&.Fix
Added an inner loop in the
decodeEntitiesfunction that skips past all intervening&characters between the current&and the next;, emitting each as a literal. The loop adjusts thesemiindex as it advances, so the entity span is only searched once. This correctly handles cases with multiple consecutive non-entity ampersands before a valid entity (e.g.a&b&c&→"a&b&c&").Test case
jsxEntityDecoderAfterNonEntityAmpersand.tsxcovers mixed entity/non-entity ampersand sequences including&&,a&b&c&d<e,&&,&&&, anda&b&c&.Copilot Checklist
I successfully ran these commands at the end of my session, and they completed without error: