fix(build): scrub hidden Unicode from every text file in dist/#654
Conversation
fro-bot
left a comment
There was a problem hiding this comment.
Verdict: CONDITIONAL
Can merge after addressing the one blocking issue below.
Blocking issues
Shared stateful regex across concurrent async iterations
HIDDEN_UNICODE_RE is a module-level constant with the /g flag. A global-flag regex is stateful: .test() advances lastIndex to just past the match, and subsequent calls resume from that position. The plugin processes files concurrently via Promise.all, so multiple async tasks share the same regex object.
The manual reset on the line after the test:
if (!HIDDEN_UNICODE_RE.test(content)) return
HIDDEN_UNICODE_RE.lastIndex = 0 // ← too late; another task may have already called .test()...does not help because Node.js event-loop turns can interleave between the await readFile and the .test() call of another iteration. In practice this means:
- A file whose hidden-Unicode character happened to land at the position where
lastIndexwas left by a sibling iteration will returnfalsefrom.test()and be silently skipped. - Conversely, a file with no hidden Unicode may read a stale non-zero
lastIndexfrom a prior iteration butString.prototype.replaceresetslastIndexto 0 after running, so the replacement pass is not affected — only the guard test is.
The net effect is non-deterministic: some files with hidden Unicode may be silently left unmodified, defeating the purpose of the plugin.
Fix options (pick one):
-
Drop the
/gflag from the guard test; keep it only on.replace():const HIDDEN_UNICODE_TEST_RE = /[\u00A0\u00AD...]/ // no /g — stateless const HIDDEN_UNICODE_RE = /[\u00A0\u00AD...]/g // /g for replace if (!HIDDEN_UNICODE_TEST_RE.test(content)) return
-
Create the regex inside the callback so each iteration gets its own instance:
const re = /[\u00A0\u00AD\u1680\u2000-\u200C\u200E\u200F\u2028\u2029\u202A-\u202F\u205F\u3000\uFEFF]/g if (!re.test(content)) return re.lastIndex = 0 const fixed = content.replace(re, ...)
-
Use
String.prototype.search(orincludes-equivalent viare.source) for the guard, then.replace()with a fresh/reset regex.
Option 1 is minimal; option 2 is cleanest.
Non-blocking concerns
sourcemap: falseis the right call for an action runtime. Worth noting in a comment that this is intentional and not an oversight, since future readers may wonder why maps are suppressed.- The binary-extension allowlist is a reasonable heuristic. Files without an extension (e.g. a generated
dist/LICENSEordist/NOTICE) are not excluded and will be processed as text — that's the correct behaviour, just worth confirming against the actualdist/output shape. - The
extderivation via.split('.').pop()returns the full filename for files with no dot (e.g.LICENSE), which will not be inBINARY_EXTENSIONSand will therefore be processed. That is correct, but it also means a file literally namedbr(no extension) would be skipped. Negligible risk given the context.
Missing tests
The escape plugin has no unit tests. The blocking regex-race is hard to catch without a test that spawns concurrent invocations. A test that calls writeBundle with a mock dir containing several files, at least one with and one without hidden Unicode, would have caught this. Not a blocker for this specific fix, but worth filing as a follow-up.
Risk assessment (MED)
The sourcemap: false change is low risk and clearly correct. The binary-extension broadening is low risk. The regex-race bug is the concern: because lastIndex is non-deterministic at build time, some runs may produce a dist/ that still contains raw hidden-Unicode bytes, which would mean the Renovate warning reappears intermittently — exactly the symptom this PR is meant to permanently silence. Blast radius is limited to build output correctness, not runtime behaviour.
Run Summary
| Field | Value |
|---|---|
| Event | pull_request |
| Repository | fro-bot/agent |
| Run ID | 26148369596 |
| Cache | hit |
| Session | ses_1bbaea4daffeJJyjpzVjviOHon |
fro-bot
left a comment
There was a problem hiding this comment.
Verdict: CONDITIONAL
Can merge after addressing the one blocking issue below.
Blocking issues
Shared stateful /g regex across concurrent async iterations
HIDDEN_UNICODE_RE is a module-level constant with the /g flag. A global-flag regex is stateful: .test() advances lastIndex to just past the match, and subsequent calls resume from that position. The plugin processes files concurrently via Promise.all, so multiple async tasks share the same regex object.
The manual reset after the test:
if (!HIDDEN_UNICODE_RE.test(content)) return
HIDDEN_UNICODE_RE.lastIndex = 0 // ← too late; another task may have already called .test()...does not fully protect against the race. Node.js event-loop turns interleave between the await readFile and the .test() of another iteration. In practice:
- A file whose hidden-Unicode character lands at the position where
lastIndexwas left by a sibling iteration will returnfalsefrom.test()and be silently skipped. - The
.replace()call itself resetslastIndexto 0 after running, so files that do proceed to replacement are fine — only the guard test is unreliable.
Net effect: non-deterministic. Some builds may produce a dist/ that still contains raw hidden-Unicode bytes, which is exactly the intermittent symptom this PR is meant to permanently close.
Fix options (pick one):
- Use a separate stateless regex (no
/g) for the guard test:
const HIDDEN_UNICODE_TEST_RE = /[\u00A0\u00AD\u1680\u2000-\u200C\u200E\u200F\u2028\u2029\u202A-\u202F\u205F\u3000\uFEFF]/
const HIDDEN_UNICODE_RE = /[\u00A0\u00AD\u1680\u2000-\u200C\u200E\u200F\u2028\u2029\u202A-\u202F\u205F\u3000\uFEFF]/g
if (!HIDDEN_UNICODE_TEST_RE.test(content)) return- Create the regex inside the callback so each iteration owns its instance:
const re = /[\u00A0\u00AD\u1680\u2000-\u200C\u200E\u200F\u2028\u2029\u202A-\u202F\u205F\u3000\uFEFF]/g
if (!re.test(content)) return
re.lastIndex = 0
const fixed = content.replace(re, ...)Option 1 is a minimal change; option 2 is cleanest and avoids the dual-definition.
Non-blocking concerns
sourcemap: falseis the right call for an action runtime. A brief inline comment noting this is intentional (not an oversight) would help future readers.- Files with no dot in their name (e.g.
LICENSE,NOTICE) have.split('.').pop()return the full filename, which won't match anyBINARY_EXTENSIONSentry, so they're processed as text. That's correct behaviour, just worth confirming against actualdist/output.
Missing tests
The escape plugin has no unit tests. A test that calls writeBundle with a mock dir containing several concurrent files — at least one with and one without hidden Unicode — would have caught the regex-race. Not required to unblock, but worth a follow-up issue.
Risk assessment (MED)
sourcemap: false and the broadened extension filtering are low-risk and clearly correct. The regex-race is the only concern: it makes the fix non-deterministic at build time, potentially leaving hidden Unicode in dist/ on some runs. Blast radius is limited to build output correctness; no runtime security or data-integrity impact.
Run Summary
| Field | Value |
|---|---|
| Event | pull_request |
| Repository | fro-bot/agent |
| Run ID | 26148369596 |
| Cache | hit |
| Session | ses_1bbaea4daffeJJyjpzVjviOHon |
The Renovate Dependency Dashboard kept logging a 'Hidden Unicode characters' warning even after the prior fix that replaced flagged bytes in dist/*.js with \uXXXX escapes. The detector runs before ignorePaths filters, so any committed file under dist/ that contains characters from Renovate's hidden-Unicode regex triggers the warning regardless of how we tell Renovate to skip the path. The next tsdown version flipped the default for emitting source maps, which makes the bundler write dist/*.js.map files containing raw vendor bytes (smart-quote artifacts in minimatch, the nbsp entry in the fast-xml-parser HTML entity table). The escape plugin missed those because it only processed .js files. Two changes lock the door: - Set sourcemap: false explicitly in tsdown.config.ts so the build output stays slim and reproducible across tsdown versions. The action runtime does not consume source maps, and committing them tripled the size of dist/ on the tsdown 0.22.0 branch. - Widen the escape plugin to process every text file in dist/, with a small allowlist of binary extensions excluded to avoid round-trip corruption. Future Renovate-flagged content in any committed dist/ file will be neutralized at build time. dist/ rebuilds byte-identical with the current tsdown version, so this ships as a config-only change with no bundle churn.
c131cbb to
2f29ccf
Compare
fro-bot
left a comment
There was a problem hiding this comment.
Verdict: PASS
The blocking issue from the prior review has been addressed correctly. All changes look good.
Blocking issues
None.
Non-blocking concerns
None. The sourcemap: false comment was added. The stateless/stateful regex split is clearly explained in the inline comment.
One minor observation, not a blocker: replaceAll with a /g regex is correct per spec (replaceAll resets lastIndex before and after the call), but if a reader wonders whether the shared /g instance is still safe here, the comment covers the rationale. No action needed.
Missing tests
The escape plugin still has no unit tests, as noted in the prior review. This remains a suggested follow-up, not a blocker.
Risk assessment (LOW)
The regex race is fixed. sourcemap: false is intentional and clearly documented. The binary-extension allowlist is sound. No correctness, security, or API-breaking issues remain.
Run Summary
| Field | Value |
|---|---|
| Event | pull_request |
| Repository | fro-bot/agent |
| Run ID | 26148934138 |
| Cache | hit |
| Session | ses_1bbaea4daffeJJyjpzVjviOHon |
The Renovate Dependency Dashboard keeps logging a "Hidden Unicode characters" warning. The prior fix that replaced flagged bytes in
dist/*.jswith\uXXXXescapes addressed one vector, butignorePaths: ['dist/**']doesn't help because Renovate runs the detector before ignore filters are applied.The next tsdown version (0.22.0) flipped the default for emitting source maps, which makes the bundler write
dist/*.js.mapfiles containing raw vendor bytes:@actions/cache's minimatch smart-quote artifact (options ||\u00A0{})fast-xml-parser's HTML entity decoding table (nbsp:"\u00A0")The escape plugin missed both because it only processed
.jsfiles. Renovate'stsdown 0.22.0update PR (#648) has hit this on every branch scan since it opened.This locks the door on both fronts:
sourcemap: falseexplicit in tsdown config. The action runtime doesn't consume source maps, and committing them tripleddist/size on the tsdown 0.22.0 branch. Reproducibility across tsdown versions, no more accidental source map emission.dist/, with a small binary-extension allowlist (png, woff, etc.) to avoid round-trip corruption. Future Renovate-flagged content anywhere in committeddist/gets neutralized at build time.dist/rebuilds byte-identical with the current tsdown (0.21.10) — config-only change, no bundle churn.After this lands, the dashboard's hidden-Unicode warning should clear within one Renovate scan cycle, and the pending tsdown 0.22.0 PR should be able to merge without retriggering the alert.