Skip to content

Perf/handlebars v2 parser single pass#15

Closed
johanrd wants to merge 45 commits intomainfrom
perf/handlebars-v2-parser-single-pass
Closed

Perf/handlebars v2 parser single pass#15
johanrd wants to merge 45 commits intomainfrom
perf/handlebars-v2-parser-single-pass

Conversation

@johanrd
Copy link
Copy Markdown
Owner

@johanrd johanrd commented Apr 14, 2026

Unified single-pass HTML+HBS parser for @glimmer/syntax

Replaces both @handlebars/parser (Jison) and simple-html-tokenizer with a single left-to-right indexOf-based scanner that builds ASTv1 directly — no tokenizer pipeline, no intermediate representation, one pass.

Exported as unifiedPreprocess() alongside the existing preprocess(). All tests pass.

Benchmark (pnpm bench:precompile)

Apple M1 Max, Node 24.14, prod dist.

phase size main (Jison) this PR (unified) speedup
precompile small (1517c) 1.76 ms 1.31 ms 1.3×
medium (4551c) 5.36 ms 3.94 ms 1.4×
large (33374c) 42.26 ms 32.07 ms 1.3×
parse small 592 µs 159 µs 3.7×
medium 1.73 ms 495 µs 3.5×
large 14.68 ms 3.93 ms 3.7×
normalize small 747 µs 335 µs 2.2×
medium 2.24 ms 1.01 ms 2.2×
large 18.53 ms 8.37 ms 2.2×

3.7× faster at parse (the Glint per-keystroke hot-path), 2.2× through normalize, 1.3× end-to-end precompile. No regressions.

Reproduce: pnpm build && pnpm bench:precompile, compare branches.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 14, 2026

📊 Package size report   -3.56%↓

File Before (Size / Brotli) After (Size / Brotli)
dist/dev/packages/shared-chunks/compiler-DJmj5ix2.js 177.7 kB / 34 kB 23%↑219.4 kB / 23%↑41.8 kB
dist/dev/packages/shared-chunks/transform-resolutions-lE6wZX-U.js 188.6 kB / 38.1 kB -69.2%↓58.1 kB / -66.7%↓12.7 kB
dist/prod/packages/shared-chunks/compiler-BdppHhBl.js 190.9 kB / 36.4 kB 22%↑232.6 kB / 21%↑44.1 kB
dist/prod/packages/shared-chunks/transform-resolutions-C8A20IBl.js 174.2 kB / 35.3 kB -75%↓43.6 kB / -72.3%↓9.8 kB
types/stable/@glimmer/syntax/lib/parser.d.ts 5.1 kB / 1.1 kB
types/stable/@glimmer/syntax/lib/parser/handlebars-node-visitors.d.ts 2.3 kB / 640 B
types/stable/@glimmer/syntax/lib/parser/parser.d.ts 583 B / 274 B
types/stable/@glimmer/syntax/lib/parser/tokenizer-event-handlers.d.ts 5.4 kB / 1.5 kB -29.5%↓3.8 kB / -26.2%↓1.1 kB
types/stable/@handlebars/parser/types/ast.d.ts 3.6 kB / 614 B
types/stable/@handlebars/parser/types/index.d.ts 400 B / 183 B
types/stable/index.d.ts 42.8 kB / 4 kB -0.47%↓42.6 kB / -0.9%↓4 kB
Total (Includes all files) 5.4 MB / 1.3 MB -3.56%↓5.2 MB / -3%↓1.2 MB
Tarball size 1.2 MB -3.86%↓1.2 MB

🤖 This report was automatically generated by pkg-size-action

@johanrd johanrd force-pushed the perf/handlebars-v2-parser-single-pass branch from 9b32989 to 80a0d49 Compare April 15, 2026 05:02
@johanrd johanrd force-pushed the perf/handlebars-v2-parser-single-pass branch from 4d4f187 to bd619cb Compare April 16, 2026 19:22
johanrd added a commit that referenced this pull request Apr 16, 2026
…kage

Mirrors #15's structural cleanup, but keeps simple-html-tokenizer — this
PR replaces only the HBS layer (Jison → recursive descent v2-parser),
not the HTML layer.

Changes:
- Move v2-parser.js, whitespace-control.js, visitor.js, exception.js
  from packages/@handlebars/parser/lib/ into
  packages/@glimmer/syntax/lib/parser/.
- v2-parser.js adds parse() (v2ParseWithoutProcessing + WhitespaceControl)
  and parseWithoutProcessing() exports matching the handlebars API that
  tokenizer-event-handlers consumes.
- tokenizer-event-handlers.ts: swap '@handlebars/parser' import for
  './v2-parser'.
- Delete packages/@handlebars entirely.
- Remove @handlebars/parser from package.json, pnpm-workspace.yaml,
  rollup.config.mjs, eslint.config.mjs, CI workflows, and build docs.

With v2-parser now on the default preprocess() path, main's
pnpm bench:precompile shows modest consistent speedups vs Jison across
all phases and sizes (no regressions). See PR body for numbers.
johanrd added a commit that referenced this pull request Apr 16, 2026
…ession fixtures

Two additions:

1. Port the full parser-node-test.ts from #15 (unified single-pass parser).
   That file was built by migrating @handlebars/parser/spec/ mocha tests to
   qunit during #15's package cleanup; this PR deletes the same package, so
   the same coverage is needed. +259 lines.

2. New test module "Prettier smoke-test regression fixtures" covering the
   inputs the prettier smoke test found regressing vs Jison:
   - empty mustache {{}}
   - unclosed mustache {{@name}
   - bare tilde {{~}}, {{~~}}
   - reserved-named-argument parse errors: {{@}}, {{@<digit>}}, {{@@}} etc.

   Tests only assert a parse error is thrown (not the exact message text),
   so they stay valid whether v2-parser is later adjusted to emit Jison-
   compatible error strings or not. They lock in the throw-vs-accept
   behavior as regression coverage.
johanrd added a commit that referenced this pull request Apr 16, 2026
…th starts

The ember-template-compiler test suite asserts parse errors for {{@}}, {{@0}},
{{@@}} etc. against the regex /Expecting 'ID'/. Jison emits exactly that
string; v2-parser used to emit 'Expected path identifier' / 'Expected path
identifier after @'. Align on the Jison wording so the existing tests match
again (same approach as #15's unified-scanner).
johanrd added a commit that referenced this pull request Apr 16, 2026
…+ real mustache

Also port #15's prettier-smoke-test workflow step that regenerates error
snapshots (our error messages differ from Jison's verbose format — that's
accepted, not a regression).

Two fixes:

1. findNextMustacheOrEnd backs up past ALL consecutive backslashes before
   {{, not just the last one. With the old single-backup behavior, input
   like '\\{{Y}}' after a previous \{{X}} emu scan left exactly one    for the next content scan, which misclassified it as single-backslash
   escape (entering emu mode again). The fix ensures the full backslash
   run is handed to the next scanContent iteration, which correctly
   routes \\{{ into the 'literal backslash + real mustache' branch.

2. .github/workflows/glimmer-syntax-prettier-smoke-test.yml: add the
   'Update error snapshots' step #15 has. Our parse errors are short
   ('Expecting ID') vs Jison's verbose enumerations; regenerating
   prettier's error snapshots before running the tests accepts that.
EOF
johanrd added a commit that referenced this pull request Apr 16, 2026
…ession fixtures

Two additions:

1. Port the full parser-node-test.ts from #15 (unified single-pass parser).
   That file was built by migrating @handlebars/parser/spec/ mocha tests to
   qunit during #15's package cleanup; this PR deletes the same package, so
   the same coverage is needed. +259 lines.

2. New test module "Prettier smoke-test regression fixtures" covering the
   inputs the prettier smoke test found regressing vs Jison:
   - empty mustache {{}}
   - unclosed mustache {{@name}
   - bare tilde {{~}}, {{~~}}
   - reserved-named-argument parse errors: {{@}}, {{@<digit>}}, {{@@}} etc.

   Tests only assert a parse error is thrown (not the exact message text),
   so they stay valid whether v2-parser is later adjusted to emit Jison-
   compatible error strings or not. They lock in the throw-vs-accept
   behavior as regression coverage.
johanrd added 18 commits April 17, 2026 00:15
…ract tests

Split into three files by concern:

- parser-escape-test.ts: backslash escape sequences (\{{, \\{{, \\\{{)
  in top-level text, elements, attributes, and unclosed cases.
- parser-whitespace-test.ts: tilde stripping and standalone detection.
- parser-error-test.ts: inputs that must be rejected ({{}}}, {{~}}, {{@}}, etc).

parser-node-test.ts is unchanged.
…t (v2-parser)

The Jison LALR(1) parser was the #1 bottleneck in @glimmer/syntax's
preprocess(), taking ~50% of total parse time. The generated parser
tested up to 40 regexes per token and sliced the input string on
every token match.

The v2 parser uses index-based scanning, indexOf for content,
charCodeAt dispatch, and batched line/col tracking. It produces
AST-identical output (104/104 unit tests pass).

HBS parse: 6-10x faster
End-to-end preprocess(): 2-3x faster

See PERF-INVESTIGATION.md for full analysis and benchmarks.
8 bugs fixed:

1. Sub-expression path locations (4 cases): paths like
   {{(helper).bar}} now correctly span from the sub-expression
   start, not just the .tail portion. Fixed by passing the
   pre-sub-expression position through parseSexprOrPath.

2. {{else if}} chain locations (2 cases): content after {{else}}
   had column offsets 4 too low because line/col were being
   restored from before 'else' was consumed. Fixed position
   tracking in consumeOpen's else-chain handling.

3. Raw block program location: now uses the overall block loc
   (matching Jison's prepareRawBlock behavior) instead of
   content-derived locs.

4. Nested raw blocks: {{{{bar}}}}...{{{{/bar}}}} inside
   {{{{foo}}}}...{{{{/foo}}}} is now correctly treated as raw
   content (not parsed as a nested block). Added depth tracking
   and mismatch detection for raw block close tags.

104/104 @handlebars/parser tests pass.
8768/8788 Ember tests pass (7 remaining are reserved-arg error
type mismatches — same parse error, different Error class).
The hash loc was including trailing whitespace (newlines before }})
because skipWs() ran before capturing the hash end position.
Now captures endP before the trailing whitespace skip.

Caught by exhaustive 153-template audit comparing full JSON output
(including all locations) against the Jison parser. 153/153 identical.
Found by stress testing: \{{foo}} caused an infinite loop in
scanContent(). Two bugs:

1. After processing \{{ (escaped mustache), the scanner advanced
   to the {{ position but then findNextMustacheOrEnd found the
   same {{ immediately, causing an infinite loop. Fixed by
   advancing past the {{ and including it as literal content.

2. After scanContent returned for \\{{ (double-escaped), the next
   call saw the backslash at idx-1 from the PREVIOUS scan and
   re-entered escape handling. Fixed by only checking backslashes
   within the current scan range (idx > pos, not idx > 0).

Also added stress-test.mjs with 181 test cases covering:
- Escaped mustaches (single, double, with surrounding text)
- Unicode identifiers
- Whitespace edge cases
- All strip flag combinations
- Comment edge cases (short, long, adjacent, containing }}/{{)
- Raw blocks (empty, nested, with mustache-like content)
- Deeply nested sub-expressions
- Complex block nesting with else chains
- Real-world Ember patterns
- Error cases
Round 2 of stress testing (106 additional cases) found:

1. Multiple consecutive escaped mustaches (x\{{y\{{z) failed —
   findNextMustacheOrEnd returned the position of \{{ instead of
   before the backslash, causing the main loop to miss the escape.

2. Content splitting after \{{ didn't match Jison. Jison emits
   separate ContentStatements at each \{{ boundary (emu state).
   The v2 parser now matches: \{{y\{{z produces 3 content nodes
   ["x", "{{y", "{{z"] instead of one merged ["x{{y{{z"].

287 total stress tests now pass (181 round 1 + 106 round 2).
104/104 unit tests. 8771/8791 Ember tests.
Tested against 375 templates from a production Ember app (proapi-webapp).
Found 38 location-only differences — all the same pattern: hash pairs
with sub-expression values like bar=(helper arg) had their loc end
extended past trailing whitespace/newlines.

Root cause: parseSexprOrPath() called skipWs() after the sub-expression
to peek for a path separator (.bar), but this whitespace belongs to
the containing HashPair's loc boundary. Fixed by save/restore of
pos around the peek.

375/375 real-world templates now produce byte-identical JSON output
compared to the Jison parser. 104/104 unit tests. 287/287 stress tests.
Tested against:
- 1014 templates from all projects in ~/fremby (including proapi-webapp,
  ember-power-select, glint, content-tag)
- 500 randomly generated templates (adversarial fuzzing)
- 27 pathological patterns (deep nesting, long content, etc.)

Results: 1473/1541 pass (byte-identical to Jison).

The 68 remaining differences are ALL the same issue: escaped mustache
(\{{) content loc includes the backslash in Jison but not in v2.
This is a Jison quirk — the regex match includes the \ (which gets
stripped from the value), so the loc spans the full source including
the \ character. The v2 parser's loc spans only the value content.

This only affects templates using \{{ (escaped mustaches), which is
extremely rare in real-world code (3 files across 550 scanned).

No structural differences. No crashes. No hangs.
Replace the two-phase parse (Handlebars parser → simple-html-tokenizer) with
a single left-to-right indexOf-based scanner that builds ASTv1 directly.
Exports unifiedPreprocess() alongside the existing preprocess().

Parse-only speedup (warmed JIT):
  small    4.2x faster (0.0195ms → 0.0047ms)
  medium   3.2x faster (0.1569ms → 0.0495ms)
  real-world 3.7x faster (0.5862ms → 0.1583ms)
  large    3.9x faster (1.6488ms → 0.4209ms)

Full precompile() pipeline (parse + normalize + encode):
  medium   1.31x faster (0.449ms → 0.342ms)
  real-world 1.33x faster (1.716ms → 1.288ms)

All 8778 tests pass.
…error messages

parse.js was reverted to Jison (commit 7bbe305) during bug investigation
but never re-wired after the v2-parser fixes were complete. Re-enable it.

Also fix error messages for invalid @-prefixed paths (@, @0, @1, @@, etc.)
to match Jison's "Expecting 'ID'" pattern that the test suite asserts against.

All 8778 tests pass.
…anner

- Track inverseStart (pos after {{else}}/{{else if}}'s }}) and programEnd
  (start of {{else}} tag) in BlockFrame so inverse block and default
  program body get exact source spans matching the reference v2-parser.
- Chained blocks ({{else if}}) now end their loc at the start of {{/if}},
  consistent with Handlebars AST conventions.
- Switch Source import to namespace import (import * as srcApi) to avoid
  a Rollup circular-dependency TDZ error introduced by the direct import.
- Wire unifiedPreprocess as the fast-path in tokenizer-event-handlers.ts
  preprocess(); falls back to original pipeline only for codemod mode or
  Source-object inputs.

All 8778 tests pass (0 failures, 13 skipped).
johanrd added 25 commits April 17, 2026 00:17
The Jison-generated parser (parser.js, 2032 lines), parse.js wrapper,
and helpers.js are now dead code — nothing in the repo imports from
@handlebars/parser at runtime anymore. The unified-scanner handles all
string/Source inputs and the HBS.Program fallback path in
tokenizer-event-handlers.ts calls TokenizerEventHandlers.parse() directly.

Keeps Visitor, WhitespaceControl, Exception, and PrintVisitor since they
are standalone utilities that don't depend on Jison.
…anner to match Jison

scanTextNode now replicates Jison's lexer behaviour exactly:
- k=1 (\{{): escape — text before \ emitted as separate ContentStatement,
  then {{content}} merged with following text (emu-state behaviour)
- k≥2 (\\{{, \\\{{, …): real mustache — emit k-1 literal backslashes,
  skip the last backslash, leave {{ for parseHbsNode

Also fixes \{{ inside quoted attribute values (parseAttrValue).

Adds --updateSnapshot step to the prettier smoke-test CI workflow for
error-format snapshots that legitimately differ from Jison's verbose output.

Adds 15 parser tests documenting the escape-sequence contract so
regressions are caught before reaching smoke tests.

Fixes: escaped.hbs, mustache.hbs, and invalid-2.hbs prettier snapshot
failures.
The Jison parser has been replaced by the unified single-pass scanner in
@glimmer/syntax. Nothing imports from @handlebars/parser any more, so the
package directory, its workspace entry, CI jobs, and the @glimmer/syntax
dependency declaration are all removed.
unified-scanner.ts:
- Sort character constants numerically, add inline char comments, add CH_STAR
- Replace Object.defineProperty/__prop hacks with typed WeakMaps (ScanMeta)
- Fix magic numbers in classifyOpen (CH_HASH, CH_STAR, isDecoratorBlock)
- Rename cryptic vars in long-comment branch (lme→closeEnd, tr→trailingTilde, etc.)
- Replace fake while-loop in scanTextNode emu-merge with plain if-block
- Simplify main scan loop (remove redundant tail fast-path)

parser-node-test.ts:
- Port all applicable tests from deleted @handlebars/parser spec/ directory
- New modules: HBS spec, whitespace control (tilde + standalone), traversal
- Replace assert.strictEqual(x, true/false) with assert.true/false(x) (qunit/no-assert-equal-boolean)
- Replace ! non-null assertions with 'as Type' casts or optional chaining (no-non-null-assertion)
- Remove unnecessary 'as ASTv1.BlockStatement' cast in unified-scanner.ts (no-unnecessary-type-assertion)
- Run Prettier on parser-node-test.ts and local markdown files
The HBS.Program input branch of preprocess() is unreachable: no caller
in the repo passes an already-parsed AST. Removing it lets us delete
the entire tokenizer-event-handlers class, the HandlebarsNodeVisitors
base, and the abstract Parser class — ~1500 lines of code that only
existed to serve that dead path.

preprocess() now routes directly to unifiedPreprocess() for both
string and Source inputs. Signature narrowed to string | src.Source.
Drops rollup hiddenDependencies + rolledUpPackages globs, eslint
ignore/override blocks, and prettier .l/.yy patterns that all pointed
at the deleted packages/@handlebars directory. Updates parser.ts
header — it's no longer a 'replacement' for anything, it's the parser.
…e-test

The 'HBS spec (ported from @handlebars/parser)' module duplicated
existing coverage (literals, paths, hashes, blocks, sub-expressions,
error cases — all already tested earlier in the file).

The 'Traversal - visitor coverage' module was misplaced here; most of
what it covered is in traversal/visiting-node-test.ts already, and the
remainder belonged there, not in parser-node-test.

Kept: whitespace control tests (tilde + standalone), which exercise
the re-implemented WhitespaceControl pass and are genuinely new ground.
- Add toVarHeads() helper replacing 4 identical bp->VarHead maps
- Add isInsideOpenTag() predicate replacing 6 verbose boundary checks
- Add rejectUnsupportedMustache()/rejectUnsupportedBlock() helpers
  for partial/decorator error sites (pulls 30 lines of error-throwing
  out of classifyOpen's token classification)
- Remove Open.isDecorator field (never read)
- Remove ElementFrame.inSVG field (written but never consumed)
…y err()

- Add CH_SEMICOLON, CH_UNDERSCORE, CH_A, CH_Z, CH_a, CH_z, CH_x, CH_X_UPPER,
  CH_FF constants
- Add isAsciiAlpha()/isAsciiDigit() predicates
- Replace all (c >= 65 && c <= 90) || (c >= 97 && c <= 122) inline ranges
  at 5 sites with isAsciiAlpha(c); same for digit ranges
- Replace raw 59 /* ; */, 95 (_), 12 (FF), 120/88 (x/X) with named constants
- err() now throws a GlimmerSyntaxError (via generateSyntaxError) instead of
  a plain Error, so IDE consumers get source spans and module context
Split the two whitespace-stripping passes into pure mutation functions
(mutateTilde, mutateStandalone) and a shared walkAndStrip() driver that
handles the identical recurse-into-blocks-and-elements-and-filter-empty
boilerplate. Both passes still run as separate full-tree walks in
order (tilde → standalone); merging them into one interleaved walk
would change semantics because standalone at an outer level reads
nested blocks' first/last children and expects them in their
already-tilde-stripped state.
…lthrough

The reject*() helpers return never (they throw), so `return reject(...)`
compiles fine and gives ESLint the explicit control-flow break it
wants between switch cases.
@johanrd johanrd force-pushed the perf/handlebars-v2-parser-single-pass branch from bd619cb to a347ca4 Compare April 16, 2026 22:20
johanrd added a commit that referenced this pull request Apr 16, 2026
…kage

Mirrors #15's structural cleanup, but keeps simple-html-tokenizer — this
PR replaces only the HBS layer (Jison → recursive descent v2-parser),
not the HTML layer.

Changes:
- Move v2-parser.js, whitespace-control.js, visitor.js, exception.js
  from packages/@handlebars/parser/lib/ into
  packages/@glimmer/syntax/lib/parser/.
- v2-parser.js adds parse() (v2ParseWithoutProcessing + WhitespaceControl)
  and parseWithoutProcessing() exports matching the handlebars API that
  tokenizer-event-handlers consumes.
- tokenizer-event-handlers.ts: swap '@handlebars/parser' import for
  './v2-parser'.
- Delete packages/@handlebars entirely.
- Remove @handlebars/parser from package.json, pnpm-workspace.yaml,
  rollup.config.mjs, eslint.config.mjs, CI workflows, and build docs.

With v2-parser now on the default preprocess() path, main's
pnpm bench:precompile shows modest consistent speedups vs Jison across
all phases and sizes (no regressions). See PR body for numbers.
johanrd added a commit that referenced this pull request Apr 16, 2026
…th starts

The ember-template-compiler test suite asserts parse errors for {{@}}, {{@0}},
{{@@}} etc. against the regex /Expecting 'ID'/. Jison emits exactly that
string; v2-parser used to emit 'Expected path identifier' / 'Expected path
identifier after @'. Align on the Jison wording so the existing tests match
again (same approach as #15's unified-scanner).
johanrd added a commit that referenced this pull request Apr 16, 2026
…+ real mustache

Also port #15's prettier-smoke-test workflow step that regenerates error
snapshots (our error messages differ from Jison's verbose format — that's
accepted, not a regression).

Two fixes:

1. findNextMustacheOrEnd backs up past ALL consecutive backslashes before
   {{, not just the last one. With the old single-backup behavior, input
   like '\\{{Y}}' after a previous \{{X}} emu scan left exactly one    for the next content scan, which misclassified it as single-backslash
   escape (entering emu mode again). The fix ensures the full backslash
   run is handed to the next scanContent iteration, which correctly
   routes \\{{ into the 'literal backslash + real mustache' branch.

2. .github/workflows/glimmer-syntax-prettier-smoke-test.yml: add the
   'Update error snapshots' step #15 has. Our parse errors are short
   ('Expecting ID') vs Jison's verbose enumerations; regenerating
   prettier's error snapshots before running the tests accepts that.
EOF
@johanrd johanrd closed this Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant