Skip to content

fix: string-literal-aware comment stripper in build.js (#45, #38)#51

Merged
jacaudi merged 1 commit into
mainfrom
fix/build-comment-stripper
Apr 11, 2026
Merged

fix: string-literal-aware comment stripper in build.js (#45, #38)#51
jacaudi merged 1 commit into
mainfrom
fix/build-comment-stripper

Conversation

@jacaudi
Copy link
Copy Markdown
Owner

@jacaudi jacaudi commented Apr 11, 2026

Summary

Closes #45 (URL literal mangling) and the build-script portion of #38 (block comments ignored).

The old comment-stripper in scripts/build.js was a single regex:

.replace(/\/\/.*$/gm, '')

This silently truncated any line containing //, including URL string literals like "https://example.com". It also did nothing about /* block */ comments. Both are currently latent defects — no worker source hits them today — but would silently break the next time someone adds a URL literal.

Change

  • New helper scripts/strip-comments.js — hand-written state-machine tokenizer (~176 LOC with JSDoc). Handles:
    • double- and single-quoted strings with \ escapes
    • template literals with nested ${...} interpolations (brace-depth tracked)
    • // line and /* block */ comments outside strings
    • best-effort regex literal disambiguation so /a\/b/ doesn't crash
  • scripts/build.js swaps the regex call for stripComments(). No other changes.
  • Zero new dependencies. This project deliberately has no package.json.

Tests

New file app/tests/test-comment-stripper.js — 20 assertions covering URL preservation, all string-literal forms, escape sequences, block comments, and edge cases (unterminated strings, empty input).

TDD cycle followed: test file written first, RED verified (MODULE_NOT_FOUND), GREEN verified after implementation.

Test plan

  • Full suite: 153 tests pass / 0 fail (baseline 133 + 20 new)
  • Built app/index.html byte-identical to baseline (229229 bytes) — no real content was affected, correctness proven by unit tests
  • CI green

Replace the regex-based comment stripper with a hand-written
state-machine tokenizer in scripts/strip-comments.js. The old regex
(/\/\/.*$/gm) would silently truncate any line containing a `//`,
including URL string literals like "https://example.com". It also
didn't handle /* block */ comments at all.

The new stripper is aware of:
- double- and single-quoted strings with backslash escapes
- template literals with nested ${} interpolations
- best-effort regex literal disambiguation
- both // line comments and /* block */ comments outside strings

New test file app/tests/test-comment-stripper.js (20 assertions) asserts
URL preservation, string literal handling, escape sequences, block
comments, and edge cases. Built app/index.html is byte-identical to
the baseline because no current worker source hits the bug — this
is a latent-defect fix verified by unit tests, not output diff.

Closes #45
Closes #38 (build-script portion; Go portion addressed separately)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jacaudi jacaudi merged commit 76d0832 into main Apr 11, 2026
13 checks passed
@jacaudi jacaudi deleted the fix/build-comment-stripper branch April 11, 2026 07:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: comment-stripping regex in scripts/build.js mangles URL string literals

1 participant