Skip to content

Perf: cache newline offsets in Source for O(log n) loc conversion#21314

Merged
NullVoxPopuli merged 2 commits intoemberjs:mainfrom
johanrd:perf/source-newline-cache
Apr 16, 2026
Merged

Perf: cache newline offsets in Source for O(log n) loc conversion#21314
NullVoxPopuli merged 2 commits intoemberjs:mainfrom
johanrd:perf/source-newline-cache

Conversation

@johanrd
Copy link
Copy Markdown
Contributor

@johanrd johanrd commented Apr 15, 2026

While investigating a faster single-pass-parser, claude found an interesting quick-win perf optimization that is parser-independent.

Cowritten by claude:

Cache newline offsets in Source

A single-file fix to an accidentally-quadratic utility in @glimmer/syntax that dominates compile time on real-world route templates.

The problem

Source.hbsPosFor(offset) converts a char offset to {line, column}. The existing implementation scans source.indexOf('\n') repeatedly from position 0 on every call — O(lines-until-offset) per invocation. charPosFor (the inverse) does the same.

The normalize phase (ASTv1 → ASTv2) calls these once per AST node location. For a template with N nodes and L lines, total work is O(N·L) — effectively O(n²) in template size.

This kicks in at template sizes typical of a route (1–25k chars), not at inline components.

The fix

packages/@glimmer/syntax/lib/source/source.ts — precompute an array of '\n' offsets in the Source constructor, binary-search it for conversions. O(n) build, O(log n) per call.

readonly #newlineOffsets: readonly number[]; // built in constructor

hbsPosFor(offset: number): Nullable<SourcePosition> {
  // binary search for first newline >= offset
}

charPosFor(position: SourcePosition): number | null {
  // direct index lookup via #newlineOffsets
}

No API changes. The table is built once per Source instance and reused across every hbsPosFor / charPosFor call on that source.

Impact

Measured via the mitata harness landed separately in #21316. Run pnpm build && pnpm bench:precompile on each branch and diff ms/iter. Apple M1 Max, Node 24.14; control = current main, experiment = this branch. The bench loads dist/prod/ by default; the dev numbers below came from a manual swap to dist/dev/. ember-source's package exports resolve development → dev and production → prod, so both tables reflect builds real consumers actually load (vite dev gets dev and vite build gets prod by default — Vite resolves the condition via NODE_ENV).

Per-char cost (µs/char) — the O(n²) → O(n log n) flattening

normalize phase alone (ASTv1 → ASTv2, derived as normalize − parse):

size chars before (µs/char) after (µs/char) speedup
small 1517 0.117 0.113 1.04×
medium 4551 0.182 0.110 1.65×
large (Discourse-scale) 33374 0.682 0.141 4.84×

Before rises sharply at large — the O(n²) is kicking in. After stays essentially flat across this size range.

Absolute speedups on large templates

With large Discourse-scale route template at 33374 chars:

Prod build (vite build, production deploys)

phase before after speedup
parse (preprocess) 36.2 ms 14.1 ms 2.57×
normalize phase alone 22.8 ms 4.7 ms 4.85×
full precompile() 93.5 ms 41.8 ms 2.24×

Dev build (vite dev, ember serve, local app development)

phase before after speedup
parse (preprocess) 36.0 ms 14.4 ms 2.50×
normalize phase alone 23.2 ms 4.6 ms 5.10×
full precompile() 134.9 ms 42.8 ms 3.15×

precompile measured end-to-end through ember-template-compiler (the user-facing entry, which runs the core compile plus ember-specific AST transform plugins). The dev build's larger speedup reflects that DEBUG-mode assertions remain active throughout the dev compile pipeline; some of them interact with the loc machinery this fix accelerates.

Who benefits

Every consumer of @glimmer/syntax:

  • ember-template-compiler.precompile() — the ember-cli / Vite build path, invoked once per .gts / .hbs file.
  • Glint's per-keystroke pipeline on .gts files — each edit re-extracts templates via content-tag and calls @glimmer/syntax.preprocess for ASTv1, then walks ASTv1 directly for type extraction. A 33k-char route template's parse step currently burns ~36 ms, mostly re-scanning newlines; after this fix, ~14 ms.
  • Any AST-plugin / codemod / prettier-plugin-ember-template-tag path that does parse + normalize.

This is pipeline-agnostic — it doesn't matter which parser feeds Source; every compile goes through hbsPosFor/charPosFor.

Testing

  • All 9138 / 9138 repo tests pass.
  • New targeted unit tests in packages/@glimmer/syntax/test/source-boundary-test.ts lock in the hbsPosFor / charPosFor contract at boundary positions: empty source, single char, offset at \n, column past line-end, out-of-range line, negative inputs.
  • DEBUG-mode round-trip assertion in charPosFor (preserved from the original implementation) verifies charPosFor(hbsPosFor(o)) === o.
  • Reproducible: the mitata bench at bin/precompile.bench.mjs (landed in add mitata harness for precompile (parse/normalize/precompile) #21316) produced the numbers above.

Compat

  • No public API changes.
  • Source constructor behavior unchanged.
  • Memory overhead: one number[] of newline offsets per Source instance, typically small (one number per line, a few hundred for route templates). Built once in the constructor — a single linear pass over source.

Scope

  • packages/@glimmer/syntax/lib/source/source.ts — the fix (diff +50 -55 lines).
  • packages/@glimmer/syntax/test/source-boundary-test.ts — new unit tests for hbsPosFor / charPosFor boundaries.

@johanrd johanrd marked this pull request as draft April 15, 2026 14:01
@NullVoxPopuli
Copy link
Copy Markdown
Contributor

before we look at this, we first need a mitata benchmark for syntax parse/print <3

@NullVoxPopuli
Copy link
Copy Markdown
Contributor

there is an example of this in the ember-eslint-parser repo

johanrd added a commit to johanrd/ember.js that referenced this pull request Apr 15, 2026
Reproducible side-by-side benchmark of preprocess, normalize (ASTv1 → ASTv2),
and full precompile() against a control worktree. Run via:

  pnpm bench:syntax                             # standalone
  pnpm bench:syntax -- --control-dir /path/to/main-checkout

Size ladder matches the PR emberjs#21314 body (462 / 1494 / 4482 / 32868 chars,
the last sized to match Discourse's admin-user/index.gjs). Fixtures live
under bench/fixtures/; large and extra-large are medium × 3 / × 22.

Emits mitata's standard ms/iter + boxplot + summary output, plus a µs/char
summary table across the size ladder that makes the O(n²) → O(n) flattening
visible at a glance.
johanrd added a commit to johanrd/ember.js that referenced this pull request Apr 15, 2026
Reproducible side-by-side benchmark of preprocess, normalize (ASTv1 → ASTv2),
and full precompile() against a control worktree. Run via:

  pnpm bench:syntax                             # standalone
  pnpm bench:syntax -- --control-dir /path/to/main-checkout

Size ladder matches the PR emberjs#21314 body (462 / 1494 / 4482 / 32868 chars,
the last sized to match Discourse's admin-user/index.gjs). Fixtures live
under bench/fixtures/; large and extra-large are medium × 3 / × 22.

Emits mitata's standard ms/iter + boxplot + summary output, plus a µs/char
summary table across the size ladder that makes the O(n²) → O(n) flattening
visible at a glance.
@johanrd johanrd force-pushed the perf/source-newline-cache branch from 96ca173 to aca513f Compare April 15, 2026 18:11
@johanrd
Copy link
Copy Markdown
Contributor Author

johanrd commented Apr 15, 2026

@NullVoxPopuli thanks, added now with an attempt to align with ember-eslint-parser, but please tell if it should be done differently.

@johanrd johanrd marked this pull request as ready for review April 15, 2026 18:34
Comment thread packages/@glimmer/syntax/bench/syntax.bench.mjs Outdated
Comment thread packages/@glimmer/syntax/bench/syntax.bench.mjs Outdated
Comment thread package.json Outdated
johanrd added a commit to johanrd/ember.js that referenced this pull request Apr 16, 2026
Reproducible side-by-side benchmark of preprocess, normalize (ASTv1 → ASTv2),
and full precompile() against a control worktree. Run via:

  pnpm bench:syntax                             # standalone
  pnpm bench:syntax -- --control-dir /path/to/main-checkout

Size ladder matches the PR emberjs#21314 body (462 / 1494 / 4482 / 32868 chars,
the last sized to match Discourse's admin-user/index.gjs). Fixtures live
under bench/fixtures/; large and extra-large are medium × 3 / × 22.

Emits mitata's standard ms/iter + boxplot + summary output, plus a µs/char
summary table across the size ladder that makes the O(n²) → O(n) flattening
visible at a glance.
@johanrd johanrd force-pushed the perf/source-newline-cache branch from aca513f to a5c0a3f Compare April 16, 2026 06:31
johanrd added a commit to johanrd/ember.js that referenced this pull request Apr 16, 2026
Reproducible side-by-side benchmark of preprocess, normalize (ASTv1 → ASTv2),
and full precompile() against a control worktree. Run via:

  pnpm bench:syntax                             # standalone
  pnpm bench:syntax -- --control-dir /path/to/main-checkout

Size ladder matches the PR emberjs#21314 body (462 / 1494 / 4482 / 32868 chars,
the last sized to match Discourse's admin-user/index.gjs). Fixtures live
under bench/fixtures/; large and extra-large are medium × 3 / × 22.

Emits mitata's standard ms/iter + boxplot + summary output, plus a µs/char
summary table across the size ladder that makes the O(n²) → O(n) flattening
visible at a glance.
@johanrd johanrd force-pushed the perf/source-newline-cache branch from a5c0a3f to e0431fc Compare April 16, 2026 07:14
@johanrd johanrd requested a review from NullVoxPopuli April 16, 2026 07:42
@NullVoxPopuli
Copy link
Copy Markdown
Contributor

thanks for adding the benchmark -- can you Pr that separately so I can run it on main when reviewing this PR?

johanrd added a commit to johanrd/ember.js that referenced this pull request Apr 16, 2026
Reproducible side-by-side benchmark of preprocess, normalize (ASTv1 → ASTv2),
and full precompile() against a control worktree. Run via:

  pnpm bench:syntax                             # standalone
  pnpm bench:syntax -- --control-dir /path/to/main-checkout

Size ladder matches the PR emberjs#21314 body (462 / 1494 / 4482 / 32868 chars,
the last sized to match Discourse's admin-user/index.gjs). Fixtures live
under bench/fixtures/; large and extra-large are medium × 3 / × 22.

Emits mitata's standard ms/iter + boxplot + summary output, plus a µs/char
summary table across the size ladder that makes the O(n²) → O(n) flattening
visible at a glance.
@johanrd
Copy link
Copy Markdown
Contributor Author

johanrd commented Apr 16, 2026

@NullVoxPopuli see #21316

johanrd added 2 commits April 16, 2026 20:57
hbsPosFor() and charPosFor() were doing a fresh indexOf('\n') scan of
the source on every call, making each lookup O(lines_until_offset).
These are called once per AST node loc by the ASTv2 normalize pass, so
total cost was effectively O(n²) in template size.

CPU profile of a full precompile() showed hbsPosFor dominating at ~28%
of self-time, scattered across many call sites.

Fix: precompute an array of newline offsets on first use, binary-search
it for conversions. O(log n) per call, O(n) to build once per source.

Impact (Node 24, warmed JIT, full precompile()):
  real-world template (1494 chars):  1.34ms -> 1.24ms
  large template (3520 chars):       4.53ms -> 3.06ms  (32%% faster)

The normalize phase specifically (ASTv1 -> ASTv2) drops from ~1.72ms
to ~0.43ms on the large template — a 4x speedup in that phase.

All tests pass.
- charPosFor now returns null for out-of-range lines (lineIdx > newlineOffsets.length)
  and negative columns, matching its 'number | null' return type. Previously it
  silently returned column or source.length in those cases.
- Adds direct unit tests for Source.hbsPosFor / charPosFor covering empty source,
  single-char/newline, exact-newline offsets, column-past-line-end clamping, and
  negative/out-of-range inputs.
@johanrd johanrd force-pushed the perf/source-newline-cache branch from e0431fc to 4f15abb Compare April 16, 2026 19:06
@NullVoxPopuli
Copy link
Copy Markdown
Contributor

NullVoxPopuli commented Apr 16, 2026

This passes the perf test for realsies (I never trust AI results here, because it rarely ever exposes its methodology)

main
benchmark                   avg (min … max) p75 / p99    (min … top 1%)
------------------------------------------- -------------------------------
parse      small (1517c)       2.03 ms/iter   1.93 ms  █▇                  
                        (1.35 ms … 5.88 ms)   5.12 ms ███▃                 
                    (285.14 kb …   5.08 mb)   1.69 mb ████▂▃▂▂▁▁▂▃▅▂▃▂▂▂▁▂▁

normalize  small (1517c)       2.38 ms/iter   2.61 ms  █                   
                        (1.85 ms … 5.93 ms)   4.94 ms ▂█                   
                    (863.23 kb …   3.62 mb)   2.25 mb ███▅▄▆▆▃▃▂▂▁▁▂▁▁▁▁▁▁▁

precompile small (1517c)       4.78 ms/iter   5.19 ms █▇                   
                        (3.74 ms … 9.69 ms)   8.83 ms ██                   
                    (  1.63 mb …   5.76 mb)   3.29 mb █████▄▄█▃▂▃▂▂▂▁▂▂▂▂▁▂

parse      medium (4551c)      5.59 ms/iter   6.45 ms  █                   
                       (4.48 ms … 11.52 ms)   9.32 ms ▅█▂                  
                    (  1.48 mb …   6.80 mb)   5.02 mb ███▄▂▃▃▁▅▅▂▂▂▄▃▄▁▁▁▂▂

normalize  medium (4551c)      7.40 ms/iter   7.69 ms  █▂                  
                       (6.28 ms … 10.50 ms)  10.44 ms ▃██                  
                    (  4.85 mb …   8.56 mb)   6.73 mb ███▄▆▄▇▃▂▃▃▂▃▂▂▃▆▂▁▄▂

precompile medium (4551c)     17.13 ms/iter  17.87 ms ▂█  ▂ ██ █     ▂    ▂
                      (13.55 ms … 23.41 ms)  22.65 ms ██  █ ██▅█     █    █
                    (  7.17 mb …  12.83 mb)   9.94 mb ██▇▇█▇████▁▇▁▁▇█▁▁▁▁█

parse      large (33374c)     67.38 ms/iter  69.60 ms           █          
                      (58.44 ms … 82.00 ms)  73.94 ms           █         █
                    ( 41.03 mb …  42.98 mb)  41.63 mb █▁█▁▁▁██▁██▁▁▁█▁▁▁▁▁█

normalize  large (33374c)    101.39 ms/iter 103.56 ms         █       █    
                     (94.21 ms … 114.91 ms) 105.69 ms ▅ ▅  ▅ ▅█      ▅█ ▅ ▅
                    (128.80 kb …  55.72 mb)   9.63 mb █▁█▁▁█▁██▁▁▁▁▁▁██▁█▁█

precompile large (33374c)    167.42 ms/iter 168.17 ms      █               
                    (149.29 ms … 222.16 ms) 181.56 ms ▅▅ ▅ █   ▅▅▅▅▅      ▅
                    ( 12.68 mb …  25.30 mb)  18.11 mb ██▁█▁█▁▁▁█████▁▁▁▁▁▁█

this branch
benchmark                   avg (min … max) p75 / p99    (min … top 1%)
------------------------------------------- -------------------------------
parse      small (1517c)       1.85 ms/iter   1.82 ms ▄█                   
                        (1.28 ms … 5.40 ms)   4.98 ms ██▅                  
                    ( 50.09 kb …   5.27 mb)   1.68 mb ███▅▄▂▂▂▂▂▄▃▃▁▁▁▂▁▁▁▁

normalize  small (1517c)       2.20 ms/iter   2.35 ms  █                   
                        (1.74 ms … 4.76 ms)   4.49 ms ▇█                   
                    (208.55 kb …   3.30 mb)   2.24 mb ██▇▅▂▃▄▃▄▃▁▂▁▁▁▁▁▁▁▂▁

precompile small (1517c)       4.59 ms/iter   5.15 ms  █                   
                       (3.56 ms … 10.41 ms)   8.86 ms ▇█                   
                    (  1.33 mb …   5.22 mb)   3.29 mb ███▇▃▃▅▄▃▂▄▂▃▃▂▂▂▁▁▁▂

parse      medium (4551c)      4.99 ms/iter   5.40 ms  █                   
                        (3.91 ms … 9.99 ms)   8.19 ms  █▃                  
                    (  1.67 mb …   7.23 mb)   5.02 mb ▆███▄▃▄▃▆▂▂▂▂▂▂▂▂▂▄▂▂

normalize  medium (4551c)      6.49 ms/iter   6.75 ms  █▃▇                 
                       (5.28 ms … 10.82 ms)  10.07 ms ▃███                 
                    (  4.94 mb …   8.22 mb)   6.72 mb █████▇▃▃▃▄▁▂▇▁▂▆▂▃▂▂▂

precompile medium (4551c)     15.86 ms/iter  17.00 ms █▅                   
                      (12.31 ms … 25.97 ms)  25.73 ms ██▇ ▅▂      ▂        
                    (  7.19 mb …  12.83 mb)   9.86 mb ███▄██▁▇▁▄▁▄█▁▄▁▁▁▇▁▄

parse      large (33374c)     39.64 ms/iter  40.12 ms        █▂            
                      (33.83 ms … 49.13 ms)  45.20 ms        ██            
                    ( 39.89 mb …  42.97 mb)  41.27 mb ▇▁▁▁▇▁▁██▁▁▇▇▁▁▁▁▇▁▁▇

normalize  large (33374c)     55.69 ms/iter  57.80 ms              █       
                      (50.53 ms … 61.30 ms)  58.28 ms ▅   ▅   ▅    █ ▅   ▅▅
                    (245.27 kb …  56.41 mb)   9.92 mb █▁▁▁█▁▁▁█▁▁▁▁█▁█▁▁▁██

precompile large (33374c)    110.94 ms/iter 116.99 ms  ██                █ 
                     (99.87 ms … 147.92 ms) 118.30 ms ▅██  ▅ ▅        ▅  █▅
                    ( 12.42 mb …  25.93 mb)  18.11 mb ███▁▁█▁█▁▁▁▁▁▁▁▁█▁▁██

it looks like this branch is both fastter and uses less RAM in most cases

image

@NullVoxPopuli
Copy link
Copy Markdown
Contributor

📊 Package size report   0.01%↑

File Before (Size / Brotli) After (Size / Brotli)
dist/dev/packages/shared-chunks/compiler-BdWF4Soc.js 177.3 kB / 33.8 kB 0.2%↑177.7 kB / 0.8%↑34 kB
dist/prod/packages/shared-chunks/compiler-DDPqc0HO.js 190.5 kB / 36.1 kB 0.2%↑190.9 kB / 0.7%↑36.4 kB
Total (Includes all files) 5.4 MB / 1.3 MB 0.01%↑5.4 MB / 0.04%↑1.3 MB
Tarball size 1.2 MB 0.05%↑1.2 MB

@NullVoxPopuli NullVoxPopuli changed the title POC/Perf: cache newline offsets in Source for O(log n) loc conversion Perf: cache newline offsets in Source for O(log n) loc conversion Apr 16, 2026
@NullVoxPopuli NullVoxPopuli merged commit f9ecab5 into emberjs:main Apr 16, 2026
41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants