Skip to content

POC: Perf/handlebars v2 parser#13

Draft
johanrd wants to merge 14 commits intomainfrom
perf/handlebars-v2-parser
Draft

POC: Perf/handlebars v2 parser#13
johanrd wants to merge 14 commits intomainfrom
perf/handlebars-v2-parser

Conversation

@johanrd
Copy link
Copy Markdown
Owner

@johanrd johanrd commented Apr 14, 2026

POC from claude: Three parsers compared

  • main — current @handlebars/parser (Jison-generated) + simple-html-tokenizer
  • v2-parser (johanrd/ember.js @ perf/handlebars-v2-parser) — claude-iterated recursive descent JS parser replacing only the Jison HBS layer, keeping simple-html-tokenizer untouched, Where Jison's generated lexer tests up to 40 regexes per token and slices the input string on every match, the v2-parser uses an index-based cursor with indexOf('{{') for content scanning and charCodeAt dispatch for mustache classification — no string copies, no regex gauntlet. That's why it's ~1.8x faster at the HBS layer even though it's doing the same parse.
  • rust/wasm[FEATURE rust-parser] Rust/WASM template parser using pest.rs emberjs/ember.js#21313

Benchmarks run with Node 24, warmed JIT, on the full ember-template-compiler precompile() path (so this includes preprocess() → ASTv2 normalization → opcode encoding → wire format — the whole thing).

Parse only (ms/call)

template main (Jison) v2-parser (index-based cursor parser) rust/wasm (this PR)
small (25 chars) 0.024ms 0.015ms 0.038ms
medium (352 chars) 0.177ms 0.102ms 0.478ms
real-world (1505 chars) 0.605ms 0.343ms 3.857ms
large (3520 chars) 1.688ms 0.942ms 22.960ms

Full pipeline results (ms/call)

template chars main (Jison) v2-parser rust/wasm (#21313))
small 25 0.047ms 0.038ms 0.067ms
medium 352 0.492ms 0.397ms 0.779ms
real-world 1494 1.832ms 1.577ms 4.947ms
large (10x medium) 3520 5.095ms 4.667ms 27.107ms

Parse vs compile split (medium template)

phase main (Jison) v2-parser rust/wasm
preprocess() only 0.175ms (40%) 0.093ms (26%) 0.480ms (66%)
compile only (shared) 0.262ms (60%) 0.266ms (74%) 0.250ms (34%)
total 0.438ms 0.358ms 0.730ms

The compile step (ASTv2 normalization + opcode encoding) costs the same ~0.25ms in all three — it's identical code. Only the parse phase differs.

What does this show?

The compile step (ASTv2 normalization + opcode encoding) costs ~0.25ms in all three — identical code. The gap is entirely in preprocess(), and it compounds: rust/wasm's JSON bridge (serde_json::to_stringJSON.parse()convertLocations() walk) is O(AST size), so the gap widens with template complexity (2x at medium → 5.8x at large).

The single-pass architecture is a real win in theory — the current pipeline genuinely scans HTML twice (@handlebars/parser treats it as opaque content, then simple-html-tokenizer re-tokenizes it via tokenizePart()).

johanrd added 9 commits March 16, 2026 20:01
…t (v2-parser)

The Jison LALR(1) parser was the #1 bottleneck in @glimmer/syntax's
preprocess(), taking ~50% of total parse time. The generated parser
tested up to 40 regexes per token and sliced the input string on
every token match.

The v2 parser uses index-based scanning, indexOf for content,
charCodeAt dispatch, and batched line/col tracking. It produces
AST-identical output (104/104 unit tests pass).

HBS parse: 6-10x faster
End-to-end preprocess(): 2-3x faster

See PERF-INVESTIGATION.md for full analysis and benchmarks.
8 bugs fixed:

1. Sub-expression path locations (4 cases): paths like
   {{(helper).bar}} now correctly span from the sub-expression
   start, not just the .tail portion. Fixed by passing the
   pre-sub-expression position through parseSexprOrPath.

2. {{else if}} chain locations (2 cases): content after {{else}}
   had column offsets 4 too low because line/col were being
   restored from before 'else' was consumed. Fixed position
   tracking in consumeOpen's else-chain handling.

3. Raw block program location: now uses the overall block loc
   (matching Jison's prepareRawBlock behavior) instead of
   content-derived locs.

4. Nested raw blocks: {{{{bar}}}}...{{{{/bar}}}} inside
   {{{{foo}}}}...{{{{/foo}}}} is now correctly treated as raw
   content (not parsed as a nested block). Added depth tracking
   and mismatch detection for raw block close tags.

104/104 @handlebars/parser tests pass.
8768/8788 Ember tests pass (7 remaining are reserved-arg error
type mismatches — same parse error, different Error class).
The hash loc was including trailing whitespace (newlines before }})
because skipWs() ran before capturing the hash end position.
Now captures endP before the trailing whitespace skip.

Caught by exhaustive 153-template audit comparing full JSON output
(including all locations) against the Jison parser. 153/153 identical.
Found by stress testing: \{{foo}} caused an infinite loop in
scanContent(). Two bugs:

1. After processing \{{ (escaped mustache), the scanner advanced
   to the {{ position but then findNextMustacheOrEnd found the
   same {{ immediately, causing an infinite loop. Fixed by
   advancing past the {{ and including it as literal content.

2. After scanContent returned for \\{{ (double-escaped), the next
   call saw the backslash at idx-1 from the PREVIOUS scan and
   re-entered escape handling. Fixed by only checking backslashes
   within the current scan range (idx > pos, not idx > 0).

Also added stress-test.mjs with 181 test cases covering:
- Escaped mustaches (single, double, with surrounding text)
- Unicode identifiers
- Whitespace edge cases
- All strip flag combinations
- Comment edge cases (short, long, adjacent, containing }}/{{)
- Raw blocks (empty, nested, with mustache-like content)
- Deeply nested sub-expressions
- Complex block nesting with else chains
- Real-world Ember patterns
- Error cases
Round 2 of stress testing (106 additional cases) found:

1. Multiple consecutive escaped mustaches (x\{{y\{{z) failed —
   findNextMustacheOrEnd returned the position of \{{ instead of
   before the backslash, causing the main loop to miss the escape.

2. Content splitting after \{{ didn't match Jison. Jison emits
   separate ContentStatements at each \{{ boundary (emu state).
   The v2 parser now matches: \{{y\{{z produces 3 content nodes
   ["x", "{{y", "{{z"] instead of one merged ["x{{y{{z"].

287 total stress tests now pass (181 round 1 + 106 round 2).
104/104 unit tests. 8771/8791 Ember tests.
Tested against 375 templates from a production Ember app (proapi-webapp).
Found 38 location-only differences — all the same pattern: hash pairs
with sub-expression values like bar=(helper arg) had their loc end
extended past trailing whitespace/newlines.

Root cause: parseSexprOrPath() called skipWs() after the sub-expression
to peek for a path separator (.bar), but this whitespace belongs to
the containing HashPair's loc boundary. Fixed by save/restore of
pos around the peek.

375/375 real-world templates now produce byte-identical JSON output
compared to the Jison parser. 104/104 unit tests. 287/287 stress tests.
Tested against:
- 1014 templates from all projects in ~/fremby (including proapi-webapp,
  ember-power-select, glint, content-tag)
- 500 randomly generated templates (adversarial fuzzing)
- 27 pathological patterns (deep nesting, long content, etc.)

Results: 1473/1541 pass (byte-identical to Jison).

The 68 remaining differences are ALL the same issue: escaped mustache
(\{{) content loc includes the backslash in Jison but not in v2.
This is a Jison quirk — the regex match includes the \ (which gets
stripped from the value), so the loc spans the full source including
the \ character. The v2 parser's loc spans only the value content.

This only affects templates using \{{ (escaped mustaches), which is
extremely rare in real-world code (3 files across 550 scanned).

No structural differences. No crashes. No hangs.
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 14, 2026

📊 Package size report   -0%↓

File Before (Size / Brotli) After (Size / Brotli)
Total (Includes all files) 17.1 MB / 3.2 MB -0%↓17.1 MB / -0%↓3.2 MB
Tarball size 3.9 MB -0%↓3.9 MB

🤖 This report was automatically generated by pkg-size-action

@johanrd johanrd force-pushed the perf/handlebars-v2-parser branch 2 times, most recently from 2535bf7 to 2cb61b0 Compare April 14, 2026 06:53
@johanrd johanrd force-pushed the perf/handlebars-v2-parser branch from 2cb61b0 to ffa8d9b Compare April 14, 2026 06:55
@johanrd johanrd changed the title Perf/handlebars v2 parser POC: Perf/handlebars v2 parser Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant