Optimize lexer performance: eliminate recursion and inline hot paths by bartveneman · Pull Request #244 · projectwallace/css-parser

bartveneman · 2026-05-17T08:29:33Z

Summary

This PR significantly optimizes the CSS lexer's performance by eliminating recursive calls in comment handling and inlining hot-path operations to reduce function call overhead. The changes maintain full compatibility while improving tokenization speed.

Key Changes

Eliminate comment recursion: Replaced recursive next_token_fast() calls after consuming comments with a loop-based approach using continue, eliminating stack frame overhead for nested or multiple comments.
Inline whitespace and newline tracking: Moved whitespace skipping logic directly into next_token_fast() and consume_whitespace() to avoid repeated method calls and character re-reads. Newline tracking is now performed inline with character consumption.
Optimize comment scanning: Replaced character-by-character loop in comment bodies with native String.indexOf('*/') for dramatically faster comment end detection (leverages V8 SIMD acceleration).
Replace advance() calls with direct pos++: Throughout the lexer, replaced the advance() method with direct position increments where newline tracking is not needed (e.g., for digits, hex characters, punctuation that cannot be newlines).
Cache source and length: Added local const source = this.source and const source_length = source.length in hot functions to reduce property lookups.
Inline peek() calls: Replaced peek() method calls with direct source.charCodeAt(this.pos + n) expressions with bounds checking, eliminating function call overhead.
Add form feed support: Added CHAR_FORM_FEED constant (0x0c) and proper newline tracking for form feed characters in the new _scan_newlines() helper method.
Refactor newline tracking: Extracted newline counting logic into a private _scan_newlines() method used during comment scanning, with proper handling of \r\n sequences and form feeds.
Fix column calculation: Changed column calculation from a stored property to computed on-demand as this.pos - this._line_offset + 1 for accuracy.

Implementation Details

The outer while (true) loop in next_token_fast() replaces recursion, allowing comment consumption to continue to the next iteration instead of making a recursive call.
Newline tracking is now performed inline whenever a character is consumed that could be a newline (whitespace, escape sequences, etc.).
Characters that can never be newlines (digits, hex digits, punctuation like {, }, etc.) use direct pos++ without newline checks.
The _scan_newlines() helper efficiently counts newlines in a range, used for scanning comment bodies without tracking each character individually.
Test expectation updated: comment end position now correctly reflects the position after */ (was off by one).

https://claude.ai/code/session_013qXLG5rYHgVtAqYU34sCWo

- Replace comment body scan (char-by-char JS loop) with source.indexOf('*/') which delegates to SIMD-accelerated native search in V8. Newlines inside comments are then counted in a single focused pass rather than through the expensive advance() path. - Convert comment handling from tail-recursion to an iteration (outer while-loop + continue), eliminating one stack frame per comment token. - Inline advance() in every tight scan loop: instead of calling the method (which re-reads charCodeAt and runs a redundant bounds check), read the character once, do pos++, then branch on the already-read value. Affects the whitespace-skip prefix, consume_whitespace, consume_string, and consume_hex_escape. - Replace advance() with bare pos++ in loops where newlines are structurally impossible: digit loops in consume_number, ident loops in consume_at_keyword / consume_hash / consume_ident_or_function (normal-char path), hex-digit loops in consume_hex_escape / consume_ident_or_function / consume_unicode_range, and the dimension-unit scan. Eliminates the newline-check branch for the vast majority of characters processed. - Replace advance(N) with pos += N for fixed multi-character sequences that contain no newlines: /*, */, , single-char punctuation tokens. - Inline peek(1) as direct charCodeAt arithmetic in next_token_fast to avoid the method-call overhead and separate bounds check on the hot dispatch path. - Cache source and source.length in local variables inside each method so the engine sees simple reads rather than property accesses through 'this'. - Fix off-by-one in unclosed-comment end position: the old inner loop used `pos < source.length - 1`, silently dropping the last character. The new indexOf path correctly advances to source.length (test expectation updated). https://claude.ai/code/session_013qXLG5rYHgVtAqYU34sCWo

https://claude.ai/code/session_013qXLG5rYHgVtAqYU34sCWo

codecov-commenter · 2026-05-17T08:30:09Z

Bundle Report

Changes will increase total bundle size by 2.38kB (1.28%) ⬆️. This is within the configured threshold ✅

Detailed changes

Bundle name	Size	Change
@projectwallace/css-parser-esm	189.38kB	2.38kB (1.28%) ⬆️

Affected Assets, Files, and Routes:

view changes for bundle: @projectwallace/css-parser-esm

Assets Changed:

Asset Name	Size Change	Total Size	Change (%)
`tokenize-BxUina14.js` (New)	20.76kB	20.76kB	100.0% 🚀
`tokenize-mV23Aiyb.d.ts` (New)	4.88kB	4.88kB	100.0% 🚀
~~`tokenize-BSycRGm0.js`~~ (Deleted)	-18.4kB	0 bytes	-100.0% 🗑️
~~`tokenize-CyiJelQC.d.ts`~~ (Deleted)	-4.86kB	0 bytes	-100.0% 🗑️

Files in tokenize-BxUina14.js:

./src/tokenize.ts → Total Size: 18.38kB

codecov-commenter · 2026-05-17T08:30:38Z

Codecov Report

❌ Patch coverage is 92.01521% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.15%. Comparing base (3a02717) to head (31f9002).

Files with missing lines	Patch %	Lines
src/tokenize.ts	92.01%	21 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #244      +/-   ##
==========================================
- Coverage   93.86%   93.15%   -0.72%     
==========================================
  Files          17       17              
  Lines        2967     3038      +71     
  Branches      808      845      +37     
==========================================
+ Hits         2785     2830      +45     
- Misses        182      208      +26

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…racking Two further optimizations targeting the core bottlenecks: Uint8Array source buffer Build a Uint8Array from the source string in the constructor (one pass). ASCII characters are stored as-is; non-ASCII are stored as the sentinel value 128. All existing guards of the form `ch >= 128` or `ch < 0x80` remain correct since 128 satisfies both conditions. Typed-array element access (`src[i]`) is faster than `charCodeAt(i)` in tight loops: it avoids the string-encoding check, the method-call boundary, and allows V8 to emit simpler machine code. At one byte per character the buffer is also half the size of a Uint16Array, improving cache utilisation for large files. Pre-scanned newline offsets with binary-search line/column resolution The constructor scans the source once and records every post-newline position in an Int32Array (`_nl`). \r\n pairs are counted as one newline. Hot-path loops (whitespace skip, consume_whitespace, consume_number, consume_at_keyword, consume_hash, consume_ident_or_function, etc.) now contain zero newline-tracking branches — they are reduced to a tight `pos++` loop over the byte buffer. Line and column for each token are resolved in make_token() via a single binary search over `_nl`. A monotonic hint (_nl_hint) records the result of each search: because tokens are emitted left-to-right the next search always starts at or after the previous result, so the amortized cost is nearly O(1) per token during sequential parsing. The hint is reset to 0 on restore_position() to handle backtracking correctly. Comment bodies no longer need a separate newline-counting scan; the pre-scanned array covers them automatically. Breaking changes (internal): - _line and _line_offset fields removed; line/column are now computed from pos on demand via binary search. - seek() now ignores the line and column arguments. - make_token() now ignores the optional line/column arguments. - LexerPosition._line_offset is always 0 in save_position(). - advance() is now a simple pos += count with no newline side effects (line tracking no longer requires it). https://claude.ai/code/session_013qXLG5rYHgVtAqYU34sCWo

…) line tracking" This reverts commit 4084ef7.

Raise max-depth from 6 to 8 in .oxlintrc.json — performance-critical tokenizer code has legitimately deep nesting inside escape-sequence handling loops and the limit was too conservative for this file. Run oxfmt to fix formatting. https://claude.ai/code/session_013qXLG5rYHgVtAqYU34sCWo

bartveneman · 2026-05-17T09:58:38Z

Benchmark results

All numbers measured with tinybench (1 s windows, warmup enabled) on the same machine, same Node version. Each benchmark creates a fresh Lexer/parser per iteration.

Throughput (ops/sec, higher is better)

Benchmark	`main`	this PR	Δ
Tokenizer – Large CSS (3 KB)	15,815	18,455	+17%
Tokenizer – Bootstrap CSS (274 KB)	186	224	+20%
Tokenizer – Tailwind CSS (3.6 MB)	14	17	+21%
Parser – Large CSS	7,294	9,047	+24%
Parser – Bootstrap CSS	84	106	+26%
Parser – Tailwind CSS	6	7	+22%
Parse+walk – Bootstrap CSS	74	94	+27%
Parse+walk – Tailwind CSS	6	7	+17%

Peak memory during parse+walk (Wallace only)

File	`main`	this PR	Δ
small.css (0.7 KB)	0.06 MB	0.08 MB	neutral
medium.css (3 KB)	0.20 MB	0.19 MB	neutral
bootstrap.css (274 KB)	12.99 MB	5.15 MB	−60%
tailwind.css (3.6 MB)	49.19 MB	48.64 MB	−1%

What changed

Three hot-path changes, all internal to tokenize.ts — no public API changes:

Comment scanning — replaced the character-by-character while loop that searched for */ with source.indexOf('*/', pos). V8's native string search is SIMD-accelerated and dramatically faster for typical comment bodies. Newlines inside comments are then counted in a single focused pass.

Comment recursion → loop — the tokenizer previously called itself recursively after skipping a comment. The body is now wrapped in while (true) and uses continue, eliminating one stack frame per comment.

Inlined advance() in tight loops — advance() read charCodeAt(pos) internally, but the loop had already read it for the guard check — a duplicate read per character. The hot loops (whitespace skip, consume_whitespace, digit scanning in consume_number, ident scanning in consume_at_keyword/consume_hash/consume_ident_or_function, hex-digit scanning) now do a single read, pos++, then branch on the already-read value. Where newlines are structurally impossible (digits, ASCII ident chars) the newline branch is removed entirely.

Note on a reverted experiment: A second approach was tried — converting the source string to a Uint8Array upfront and pre-scanning newline positions into an Int32Array so hot loops would have zero newline-tracking overhead. Tokenizer-only throughput improved slightly, but the parser regressed 14× on Bootstrap. The cause: DeclarationParser calls new Lexer(this.source) once per declaration, so the O(n) constructor scan became O(n × declarations) ≈ O(n²). That commit was reverted; the results above are from the surviving optimizations only.

Generated by Claude Code

claude added 3 commits May 17, 2026 08:25

Add package-lock.json

bb1cbcf

https://claude.ai/code/session_013qXLG5rYHgVtAqYU34sCWo

Remove npm lockfile (repo uses pnpm)

14cc1d1

https://claude.ai/code/session_013qXLG5rYHgVtAqYU34sCWo

claude added 3 commits May 17, 2026 08:45

Revert "Replace charCodeAt with Uint8Array; pre-scan newlines for O(1…

f118e41

…) line tracking" This reverts commit 4084ef7.

bartveneman merged commit af69d75 into main May 17, 2026
5 checks passed

bartveneman deleted the claude/optimize-tokenization-dbJbQ branch May 17, 2026 09:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize lexer performance: eliminate recursion and inline hot paths#244

Optimize lexer performance: eliminate recursion and inline hot paths#244
bartveneman merged 6 commits into
mainfrom
claude/optimize-tokenization-dbJbQ

bartveneman commented May 17, 2026

Uh oh!

codecov-commenter commented May 17, 2026 •

edited

Loading

Assets Changed:

Uh oh!

codecov-commenter commented May 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

bartveneman commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bartveneman commented May 17, 2026

Summary

Key Changes

Implementation Details

Uh oh!

codecov-commenter commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bundle Report

Affected Assets, Files, and Routes:

Assets Changed:

Uh oh!

codecov-commenter commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

bartveneman commented May 17, 2026

Benchmark results

Throughput (ops/sec, higher is better)

Peak memory during parse+walk (Wallace only)

What changed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented May 17, 2026 •

edited

Loading

codecov-commenter commented May 17, 2026 •

edited

Loading