Skip to content

perf: fast lexer using charCodeAt comparisons and sticky regexes#4

Merged
jogibear9988 merged 4 commits intomainfrom
copilot/improve-lexer-performance
Mar 17, 2026
Merged

perf: fast lexer using charCodeAt comparisons and sticky regexes#4
jogibear9988 merged 4 commits intomainfrom
copilot/improve-lexer-performance

Conversation

Copy link
Copy Markdown

Copilot AI commented Mar 17, 2026

Applies the es-module-shims fast lexer principles to the CSS parser: numeric character-code comparisons instead of string comparisons, and sticky regexes to eliminate substring allocations.

Lexer (src/parse/lexer.ts)

  • charCodeAt() over string indexing — all hot-path character checks (skipWhitespace, tryOpenBrace, tryCloseBrace, tryColon, skipSemicolonAndWhitespace, tryCommaAndWhitespace) now compare numeric char codes against named constants instead of creating single-char strings
  • Sticky regex matchingmatchRegex() sets re.lastIndex = this.pos and matches against the full input with the y flag, eliminating the this.remaining substring slice (previously O(n) allocation per call)
  • Range-based advance_advanceRange(from, to) iterates the original input by index instead of receiving an intermediate string
// Before: string comparison + substring allocation
const ch = this.input[this.pos];        // allocates 1-char string
if (ch === '{') { ... }
const m = re.exec(this.remaining);      // allocates O(n) substring

// After: numeric comparison + sticky regex on full input
const ch = this.input.charCodeAt(this.pos);  // returns number
if (ch === Ch_OPEN) { ... }
re.lastIndex = this.pos;
const m = re.exec(this.input);               // no allocation

Parser (src/parse/index.ts)

  • Pre-compiled sticky regexes at module scope — all 20+ regex patterns (re_comment, re_media, re_keyframesName, re_pageMarginBox, etc.) compiled once as y-flag constants instead of per-parse() call
  • Inlined at-rule parsers — replaced _compileAtRule factory with explicit functions using module-level regex constants for @import, @charset, @namespace
  • charCodeAt guardslexer.charAt() === '@'lexer.charCodeAt() === Ch_AT throughout

String search utilities (src/utils/stringSearch.ts)

  • charCodeAt() for character dispatch in bracket/quote-aware search loops

Security

9 CodeQL ReDoS alerts on @import/@charset/@namespace regex patterns are pre-existing — the pattern is unchanged from the original _compileAtRule factory, only moved to module scope. The at-import-dos test case validates handling of pathological input.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • guybedford.com
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node /home/REDACTED/work/_temp/ghcca-node/node/bin/node --enable-source-maps /home/REDACTED/work/_temp/copilot-developer-action-main/dist/index.js (dns block)
  • web.archive.org
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node /home/REDACTED/work/_temp/ghcca-node/node/bin/node --enable-source-maps /home/REDACTED/work/_temp/copilot-developer-action-main/dist/index.js (dns block)

If you need me to access, download, or install something from one of these locations, you can either:


📍 Connect Copilot coding agent with Jira, Azure Boards or Linear to delegate work to Copilot in one click without leaving your project management tool.

Copilot AI and others added 3 commits March 17, 2026 19:45
Inspired by the es-module-shims fast lexer approach:
- Replace string character comparisons with charCodeAt() numeric comparisons
- Convert all ^-anchored regexes to y-flag (sticky) regexes to avoid
  creating temporary substring slices on every matchRegex call
- Define character code constants (Ch_*) for readability
- Use charCodeAt in stringSearch utilities for hot-path comparisons
- Add _advanceRange() method to avoid intermediate string allocation

Co-authored-by: jogibear9988 <364896+jogibear9988@users.noreply.github.com>
- Move pageMarginBoxRegex to module-level pre-compiled constant
- Replace _compileAtRule with explicit pre-compiled regex constants
  for import, charset, and namespace at-rules
- Convert new RegExp() to regex literals per biome lint recommendations

Co-authored-by: jogibear9988 <364896+jogibear9988@users.noreply.github.com>
Co-authored-by: jogibear9988 <364896+jogibear9988@users.noreply.github.com>
Copilot AI changed the title [WIP] Investigate fast lexer technique for improved performance perf: fast lexer using charCodeAt comparisons and sticky regexes Mar 17, 2026
Copilot AI requested a review from jogibear9988 March 17, 2026 19:57
@jogibear9988 jogibear9988 marked this pull request as ready for review March 17, 2026 20:01
@jogibear9988 jogibear9988 merged commit c11ec8f into main Mar 17, 2026
@jogibear9988 jogibear9988 deleted the copilot/improve-lexer-performance branch March 17, 2026 20:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants