Skip to content

fix(build): scrub hidden Unicode from every text file in dist/#654

Merged
marcusrbrown merged 1 commit into
mainfrom
fix/renovate-hidden-unicode
May 20, 2026
Merged

fix(build): scrub hidden Unicode from every text file in dist/#654
marcusrbrown merged 1 commit into
mainfrom
fix/renovate-hidden-unicode

Conversation

@marcusrbrown
Copy link
Copy Markdown
Collaborator

The Renovate Dependency Dashboard keeps logging a "Hidden Unicode characters" warning. The prior fix that replaced flagged bytes in dist/*.js with \uXXXX escapes addressed one vector, but ignorePaths: ['dist/**'] doesn't help because Renovate runs the detector before ignore filters are applied.

The next tsdown version (0.22.0) flipped the default for emitting source maps, which makes the bundler write dist/*.js.map files containing raw vendor bytes:

  • @actions/cache's minimatch smart-quote artifact (options ||\u00A0{})
  • fast-xml-parser's HTML entity decoding table (nbsp:"\u00A0")

The escape plugin missed both because it only processed .js files. Renovate's tsdown 0.22.0 update PR (#648) has hit this on every branch scan since it opened.

This locks the door on both fronts:

  • sourcemap: false explicit in tsdown config. The action runtime doesn't consume source maps, and committing them tripled dist/ size on the tsdown 0.22.0 branch. Reproducibility across tsdown versions, no more accidental source map emission.
  • Escape plugin broadened to every text file in dist/, with a small binary-extension allowlist (png, woff, etc.) to avoid round-trip corruption. Future Renovate-flagged content anywhere in committed dist/ gets neutralized at build time.

dist/ rebuilds byte-identical with the current tsdown (0.21.10) — config-only change, no bundle churn.

After this lands, the dashboard's hidden-Unicode warning should clear within one Renovate scan cycle, and the pending tsdown 0.22.0 PR should be able to merge without retriggering the alert.

Copy link
Copy Markdown
Owner

@fro-bot fro-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict: CONDITIONAL

Can merge after addressing the one blocking issue below.

Blocking issues

Shared stateful regex across concurrent async iterations

HIDDEN_UNICODE_RE is a module-level constant with the /g flag. A global-flag regex is stateful: .test() advances lastIndex to just past the match, and subsequent calls resume from that position. The plugin processes files concurrently via Promise.all, so multiple async tasks share the same regex object.

The manual reset on the line after the test:

if (!HIDDEN_UNICODE_RE.test(content)) return
HIDDEN_UNICODE_RE.lastIndex = 0          // ← too late; another task may have already called .test()

...does not help because Node.js event-loop turns can interleave between the await readFile and the .test() call of another iteration. In practice this means:

  • A file whose hidden-Unicode character happened to land at the position where lastIndex was left by a sibling iteration will return false from .test() and be silently skipped.
  • Conversely, a file with no hidden Unicode may read a stale non-zero lastIndex from a prior iteration but String.prototype.replace resets lastIndex to 0 after running, so the replacement pass is not affected — only the guard test is.

The net effect is non-deterministic: some files with hidden Unicode may be silently left unmodified, defeating the purpose of the plugin.

Fix options (pick one):

  1. Drop the /g flag from the guard test; keep it only on .replace():

    const HIDDEN_UNICODE_TEST_RE = /[\u00A0\u00AD...]/   // no /g — stateless
    const HIDDEN_UNICODE_RE      = /[\u00A0\u00AD...]/g  // /g for replace
    if (!HIDDEN_UNICODE_TEST_RE.test(content)) return
  2. Create the regex inside the callback so each iteration gets its own instance:

    const re = /[\u00A0\u00AD\u1680\u2000-\u200C\u200E\u200F\u2028\u2029\u202A-\u202F\u205F\u3000\uFEFF]/g
    if (!re.test(content)) return
    re.lastIndex = 0
    const fixed = content.replace(re, ...)
  3. Use String.prototype.search (or includes-equivalent via re.source) for the guard, then .replace() with a fresh/reset regex.

Option 1 is minimal; option 2 is cleanest.

Non-blocking concerns

  • sourcemap: false is the right call for an action runtime. Worth noting in a comment that this is intentional and not an oversight, since future readers may wonder why maps are suppressed.
  • The binary-extension allowlist is a reasonable heuristic. Files without an extension (e.g. a generated dist/LICENSE or dist/NOTICE) are not excluded and will be processed as text — that's the correct behaviour, just worth confirming against the actual dist/ output shape.
  • The ext derivation via .split('.').pop() returns the full filename for files with no dot (e.g. LICENSE), which will not be in BINARY_EXTENSIONS and will therefore be processed. That is correct, but it also means a file literally named br (no extension) would be skipped. Negligible risk given the context.

Missing tests

The escape plugin has no unit tests. The blocking regex-race is hard to catch without a test that spawns concurrent invocations. A test that calls writeBundle with a mock dir containing several files, at least one with and one without hidden Unicode, would have caught this. Not a blocker for this specific fix, but worth filing as a follow-up.

Risk assessment (MED)

The sourcemap: false change is low risk and clearly correct. The binary-extension broadening is low risk. The regex-race bug is the concern: because lastIndex is non-deterministic at build time, some runs may produce a dist/ that still contains raw hidden-Unicode bytes, which would mean the Renovate warning reappears intermittently — exactly the symptom this PR is meant to permanently silence. Blast radius is limited to build output correctness, not runtime behaviour.


Run Summary
Field Value
Event pull_request
Repository fro-bot/agent
Run ID 26148369596
Cache hit
Session ses_1bbaea4daffeJJyjpzVjviOHon

Copy link
Copy Markdown
Owner

@fro-bot fro-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict: CONDITIONAL

Can merge after addressing the one blocking issue below.

Blocking issues

Shared stateful /g regex across concurrent async iterations

HIDDEN_UNICODE_RE is a module-level constant with the /g flag. A global-flag regex is stateful: .test() advances lastIndex to just past the match, and subsequent calls resume from that position. The plugin processes files concurrently via Promise.all, so multiple async tasks share the same regex object.

The manual reset after the test:

if (!HIDDEN_UNICODE_RE.test(content)) return
HIDDEN_UNICODE_RE.lastIndex = 0          // ← too late; another task may have already called .test()

...does not fully protect against the race. Node.js event-loop turns interleave between the await readFile and the .test() of another iteration. In practice:

  • A file whose hidden-Unicode character lands at the position where lastIndex was left by a sibling iteration will return false from .test() and be silently skipped.
  • The .replace() call itself resets lastIndex to 0 after running, so files that do proceed to replacement are fine — only the guard test is unreliable.

Net effect: non-deterministic. Some builds may produce a dist/ that still contains raw hidden-Unicode bytes, which is exactly the intermittent symptom this PR is meant to permanently close.

Fix options (pick one):

  1. Use a separate stateless regex (no /g) for the guard test:
const HIDDEN_UNICODE_TEST_RE = /[\u00A0\u00AD\u1680\u2000-\u200C\u200E\u200F\u2028\u2029\u202A-\u202F\u205F\u3000\uFEFF]/
const HIDDEN_UNICODE_RE      = /[\u00A0\u00AD\u1680\u2000-\u200C\u200E\u200F\u2028\u2029\u202A-\u202F\u205F\u3000\uFEFF]/g
if (!HIDDEN_UNICODE_TEST_RE.test(content)) return
  1. Create the regex inside the callback so each iteration owns its instance:
const re = /[\u00A0\u00AD\u1680\u2000-\u200C\u200E\u200F\u2028\u2029\u202A-\u202F\u205F\u3000\uFEFF]/g
if (!re.test(content)) return
re.lastIndex = 0
const fixed = content.replace(re, ...)

Option 1 is a minimal change; option 2 is cleanest and avoids the dual-definition.

Non-blocking concerns

  • sourcemap: false is the right call for an action runtime. A brief inline comment noting this is intentional (not an oversight) would help future readers.
  • Files with no dot in their name (e.g. LICENSE, NOTICE) have .split('.').pop() return the full filename, which won't match any BINARY_EXTENSIONS entry, so they're processed as text. That's correct behaviour, just worth confirming against actual dist/ output.

Missing tests

The escape plugin has no unit tests. A test that calls writeBundle with a mock dir containing several concurrent files — at least one with and one without hidden Unicode — would have caught the regex-race. Not required to unblock, but worth a follow-up issue.

Risk assessment (MED)

sourcemap: false and the broadened extension filtering are low-risk and clearly correct. The regex-race is the only concern: it makes the fix non-deterministic at build time, potentially leaving hidden Unicode in dist/ on some runs. Blast radius is limited to build output correctness; no runtime security or data-integrity impact.


Run Summary
Field Value
Event pull_request
Repository fro-bot/agent
Run ID 26148369596
Cache hit
Session ses_1bbaea4daffeJJyjpzVjviOHon

The Renovate Dependency Dashboard kept logging a 'Hidden Unicode
characters' warning even after the prior fix that replaced flagged bytes
in dist/*.js with \uXXXX escapes. The detector runs before ignorePaths
filters, so any committed file under dist/ that contains characters from
Renovate's hidden-Unicode regex triggers the warning regardless of how
we tell Renovate to skip the path.

The next tsdown version flipped the default for emitting source maps,
which makes the bundler write dist/*.js.map files containing raw vendor
bytes (smart-quote artifacts in minimatch, the nbsp entry in the
fast-xml-parser HTML entity table). The escape plugin missed those
because it only processed .js files.

Two changes lock the door:

- Set sourcemap: false explicitly in tsdown.config.ts so the build
  output stays slim and reproducible across tsdown versions. The action
  runtime does not consume source maps, and committing them tripled the
  size of dist/ on the tsdown 0.22.0 branch.

- Widen the escape plugin to process every text file in dist/, with a
  small allowlist of binary extensions excluded to avoid round-trip
  corruption. Future Renovate-flagged content in any committed dist/
  file will be neutralized at build time.

dist/ rebuilds byte-identical with the current tsdown version, so this
ships as a config-only change with no bundle churn.
@marcusrbrown marcusrbrown force-pushed the fix/renovate-hidden-unicode branch from c131cbb to 2f29ccf Compare May 20, 2026 07:47
Copy link
Copy Markdown
Owner

@fro-bot fro-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict: PASS

The blocking issue from the prior review has been addressed correctly. All changes look good.

Blocking issues

None.

Non-blocking concerns

None. The sourcemap: false comment was added. The stateless/stateful regex split is clearly explained in the inline comment.

One minor observation, not a blocker: replaceAll with a /g regex is correct per spec (replaceAll resets lastIndex before and after the call), but if a reader wonders whether the shared /g instance is still safe here, the comment covers the rationale. No action needed.

Missing tests

The escape plugin still has no unit tests, as noted in the prior review. This remains a suggested follow-up, not a blocker.

Risk assessment (LOW)

The regex race is fixed. sourcemap: false is intentional and clearly documented. The binary-extension allowlist is sound. No correctness, security, or API-breaking issues remain.


Run Summary
Field Value
Event pull_request
Repository fro-bot/agent
Run ID 26148934138
Cache hit
Session ses_1bbaea4daffeJJyjpzVjviOHon

@marcusrbrown marcusrbrown merged commit d5351b6 into main May 20, 2026
10 checks passed
@marcusrbrown marcusrbrown deleted the fix/renovate-hidden-unicode branch May 20, 2026 07:52
@fro-bot fro-bot mentioned this pull request May 20, 2026
47 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants