POC: perf(@glimmer/syntax): unified single-pass HTML+HBS scanner (3.2x–4.2x faster parse)#14
Open
johanrd wants to merge 3 commits intoperf/handlebars-v2-parserfrom
Open
Conversation
Replace the two-phase parse (Handlebars parser → simple-html-tokenizer) with a single left-to-right indexOf-based scanner that builds ASTv1 directly. Exports unifiedPreprocess() alongside the existing preprocess(). Parse-only speedup (warmed JIT): small 4.2x faster (0.0195ms → 0.0047ms) medium 3.2x faster (0.1569ms → 0.0495ms) real-world 3.7x faster (0.5862ms → 0.1583ms) large 3.9x faster (1.6488ms → 0.4209ms) Full precompile() pipeline (parse + normalize + encode): medium 1.31x faster (0.449ms → 0.342ms) real-world 1.33x faster (1.716ms → 1.288ms) All 8778 tests pass.
📊 Package size report 4%↑
🤖 This report was automatically generated by pkg-size-action |
…error messages parse.js was reverted to Jison (commit 7bbe305) during bug investigation but never re-wired after the v2-parser fixes were complete. Re-enable it. Also fix error messages for invalid @-prefixed paths (@, @0, @1, @@, etc.) to match Jison's "Expecting 'ID'" pattern that the test suite asserts against. All 8778 tests pass.
7 tasks
…anner
- Track inverseStart (pos after {{else}}/{{else if}}'s }}) and programEnd
(start of {{else}} tag) in BlockFrame so inverse block and default
program body get exact source spans matching the reference v2-parser.
- Chained blocks ({{else if}}) now end their loc at the start of {{/if}},
consistent with Handlebars AST conventions.
- Switch Source import to namespace import (import * as srcApi) to avoid
a Rollup circular-dependency TDZ error introduced by the direct import.
- Wire unifiedPreprocess as the fast-path in tokenizer-event-handlers.ts
preprocess(); falls back to original pipeline only for codemod mode or
Source-object inputs.
All 8778 tests pass (0 failures, 13 skipped).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
POC Unified single-pass HTML+HBS scanner
Claude continuing the perf work from perf/handlebars-v2-parser. Where that branch replaced the Jison-generated HBS parser with a hand-written recursive descent JS parser (keeping `simple-html-tokenizer` for the HTML layer), this PR replaces both parsers with a single left-to-right indexOf-based scanner that builds ASTv1 directly — no tokenizer pipeline at all.
The current parse path scans every template twice:
The unified scanner does one left-to-right pass with cursor arithmetic (`indexOf('{{', pos)`, `indexOf('<', pos)`) and builds the full ASTv1 tree — `ElementNode`, `MustacheStatement`, `BlockStatement`, `TextNode`, etc. — without the intermediate representation.
Exports as `unifiedPreprocess()` alongside the existing `preprocess()`.
Four parsers compared
All benchmarks: Node 24, warmed JIT, same machine.
IDE case: parse-only (ms/call)
The Glint hot-path: one `preprocess()` call per keystroke in a `.gts` file.
Build case: full precompile() → wire format (ms/call)
The ember-cli/Vite path: parse + ASTv2 normalize + opcode encode + wire format.
The unified-1pass column is `unified_parse + (precompile_v2 − preprocess_v2)` — the compile step is identical code in all parsers.
Parse vs compile split (medium template)
500-template build projection
Using real-world template timing:
What this shows
The two use cases have very different profiles:
IDE case (parse-only): The unified scanner is 3.8x–4.3x faster than Jison and 1.1x–1.3x faster than v2-parser on real-world templates. The per-keystroke parse cost drops from 0.61ms to 0.15ms on a real-world template. This directly benefits Glint's reparse-on-keystroke hot path.
Build case (full pipeline): The parse improvement is smaller in absolute terms because the compile step (ASTv2 normalization + opcode encoding) costs ~0.30ms regardless of which parser is used. At real-world templates the unified scanner is 1.26x faster end-to-end vs Jison. The v2-parser already captured 1.22x of the build-time gain; unified takes it to 1.26x.
rust/wasm: The JSON bridge (`serde_json::to_string` → `JSON.parse()` → `convertLocations()` walk) is O(AST size), so the gap grows with template size (1.6x slower than Jison at medium → 5.4x at large). The unified scanner is faster than Jison at all sizes without any FFI overhead.
Parse is now 12% of the pipeline for unified vs 38% for Jison and 14% for v2. The compile step dominates, so further parse improvements have diminishing returns on build time — though the IDE case still benefits fully since it's parse-only.
Correctness
All 8778 tests pass, including the WhitespaceControl/standalone-stripping semantics which required careful port of the Handlebars post-pass for block helpers on their own lines and chained `{{else if}}` blocks.