Perf: cache newline offsets in Source for O(log n) loc conversion by johanrd · Pull Request #21314 · emberjs/ember.js

johanrd · 2026-04-15T13:32:05Z

While investigating a faster single-pass-parser, claude found an interesting quick-win perf optimization that is parser-independent.

Cowritten by claude:

Cache newline offsets in `Source`

A single-file fix to an accidentally-quadratic utility in @glimmer/syntax that dominates compile time on real-world route templates.

The problem

Source.hbsPosFor(offset) converts a char offset to {line, column}. The existing implementation scans source.indexOf('\n') repeatedly from position 0 on every call — O(lines-until-offset) per invocation. charPosFor (the inverse) does the same.

The normalize phase (ASTv1 → ASTv2) calls these once per AST node location. For a template with N nodes and L lines, total work is O(N·L) — effectively O(n²) in template size.

This kicks in at template sizes typical of a route (1–25k chars), not at inline components.

The fix

packages/@glimmer/syntax/lib/source/source.ts — precompute an array of '\n' offsets in the Source constructor, binary-search it for conversions. O(n) build, O(log n) per call.

readonly #newlineOffsets: readonly number[]; // built in constructor

hbsPosFor(offset: number): Nullable<SourcePosition> {
  // binary search for first newline >= offset
}

charPosFor(position: SourcePosition): number | null {
  // direct index lookup via #newlineOffsets
}

No API changes. The table is built once per Source instance and reused across every hbsPosFor / charPosFor call on that source.

Impact

Measured via the mitata harness landed separately in #21316. Run pnpm build && pnpm bench:precompile on each branch and diff ms/iter. Apple M1 Max, Node 24.14; control = current main, experiment = this branch. The bench loads dist/prod/ by default; the dev numbers below came from a manual swap to dist/dev/. ember-source's package exports resolve development → dev and production → prod, so both tables reflect builds real consumers actually load (vite dev gets dev and vite build gets prod by default — Vite resolves the condition via NODE_ENV).

Per-char cost (µs/char) — the O(n²) → O(n log n) flattening

normalize phase alone (ASTv1 → ASTv2, derived as normalize − parse):

size	chars	before (µs/char)	after (µs/char)	speedup
small	1517	0.117	0.113	1.04×
medium	4551	0.182	0.110	1.65×
large (Discourse-scale)	33374	0.682	0.141	4.84×

Before rises sharply at large — the O(n²) is kicking in. After stays essentially flat across this size range.

Absolute speedups on large templates

With large Discourse-scale route template at 33374 chars:

Prod build (`vite build`, production deploys)

phase	before	after	speedup
parse (`preprocess`)	36.2 ms	14.1 ms	2.57×
normalize phase alone	22.8 ms	4.7 ms	4.85×
full `precompile()`	93.5 ms	41.8 ms	2.24×

Dev build (`vite dev`, `ember serve`, local app development)

phase	before	after	speedup
parse (`preprocess`)	36.0 ms	14.4 ms	2.50×
normalize phase alone	23.2 ms	4.6 ms	5.10×
full `precompile()`	134.9 ms	42.8 ms	3.15×

precompile measured end-to-end through ember-template-compiler (the user-facing entry, which runs the core compile plus ember-specific AST transform plugins). The dev build's larger speedup reflects that DEBUG-mode assertions remain active throughout the dev compile pipeline; some of them interact with the loc machinery this fix accelerates.

Who benefits

Every consumer of @glimmer/syntax:

ember-template-compiler.precompile() — the ember-cli / Vite build path, invoked once per .gts / .hbs file.
Glint's per-keystroke pipeline on .gts files — each edit re-extracts templates via content-tag and calls @glimmer/syntax.preprocess for ASTv1, then walks ASTv1 directly for type extraction. A 33k-char route template's parse step currently burns ~36 ms, mostly re-scanning newlines; after this fix, ~14 ms.
Any AST-plugin / codemod / prettier-plugin-ember-template-tag path that does parse + normalize.

This is pipeline-agnostic — it doesn't matter which parser feeds Source; every compile goes through hbsPosFor/charPosFor.

Testing

All 9138 / 9138 repo tests pass.
New targeted unit tests in packages/@glimmer/syntax/test/source-boundary-test.ts lock in the hbsPosFor / charPosFor contract at boundary positions: empty source, single char, offset at \n, column past line-end, out-of-range line, negative inputs.
DEBUG-mode round-trip assertion in charPosFor (preserved from the original implementation) verifies charPosFor(hbsPosFor(o)) === o.
Reproducible: the mitata bench at bin/precompile.bench.mjs (landed in add mitata harness for precompile (parse/normalize/precompile) #21316) produced the numbers above.

Compat

No public API changes.
Source constructor behavior unchanged.
Memory overhead: one number[] of newline offsets per Source instance, typically small (one number per line, a few hundred for route templates). Built once in the constructor — a single linear pass over source.

Scope

packages/@glimmer/syntax/lib/source/source.ts — the fix (diff +50 -55 lines).
packages/@glimmer/syntax/test/source-boundary-test.ts — new unit tests for hbsPosFor / charPosFor boundaries.

NullVoxPopuli · 2026-04-15T14:17:58Z

before we look at this, we first need a mitata benchmark for syntax parse/print <3

NullVoxPopuli · 2026-04-15T14:19:39Z

there is an example of this in the ember-eslint-parser repo

Reproducible side-by-side benchmark of preprocess, normalize (ASTv1 → ASTv2), and full precompile() against a control worktree. Run via: pnpm bench:syntax # standalone pnpm bench:syntax -- --control-dir /path/to/main-checkout Size ladder matches the PR emberjs#21314 body (462 / 1494 / 4482 / 32868 chars, the last sized to match Discourse's admin-user/index.gjs). Fixtures live under bench/fixtures/; large and extra-large are medium × 3 / × 22. Emits mitata's standard ms/iter + boxplot + summary output, plus a µs/char summary table across the size ladder that makes the O(n²) → O(n) flattening visible at a glance.

johanrd · 2026-04-15T18:17:58Z

@NullVoxPopuli thanks, added now with an attempt to align with ember-eslint-parser, but please tell if it should be done differently.

Reproducible side-by-side benchmark of preprocess, normalize (ASTv1 → ASTv2), and full precompile() against a control worktree. Run via: pnpm bench:syntax # standalone pnpm bench:syntax -- --control-dir /path/to/main-checkout Size ladder matches the PR emberjs#21314 body (462 / 1494 / 4482 / 32868 chars, the last sized to match Discourse's admin-user/index.gjs). Fixtures live under bench/fixtures/; large and extra-large are medium × 3 / × 22. Emits mitata's standard ms/iter + boxplot + summary output, plus a µs/char summary table across the size ladder that makes the O(n²) → O(n) flattening visible at a glance.

NullVoxPopuli · 2026-04-16T12:11:01Z

thanks for adding the benchmark -- can you Pr that separately so I can run it on main when reviewing this PR?

Reproducible side-by-side benchmark of preprocess, normalize (ASTv1 → ASTv2), and full precompile() against a control worktree. Run via: pnpm bench:syntax # standalone pnpm bench:syntax -- --control-dir /path/to/main-checkout Size ladder matches the PR emberjs#21314 body (462 / 1494 / 4482 / 32868 chars, the last sized to match Discourse's admin-user/index.gjs). Fixtures live under bench/fixtures/; large and extra-large are medium × 3 / × 22. Emits mitata's standard ms/iter + boxplot + summary output, plus a µs/char summary table across the size ladder that makes the O(n²) → O(n) flattening visible at a glance.

johanrd · 2026-04-16T13:19:18Z

@NullVoxPopuli see #21316

hbsPosFor() and charPosFor() were doing a fresh indexOf('\n') scan of the source on every call, making each lookup O(lines_until_offset). These are called once per AST node loc by the ASTv2 normalize pass, so total cost was effectively O(n²) in template size. CPU profile of a full precompile() showed hbsPosFor dominating at ~28% of self-time, scattered across many call sites. Fix: precompute an array of newline offsets on first use, binary-search it for conversions. O(log n) per call, O(n) to build once per source. Impact (Node 24, warmed JIT, full precompile()): real-world template (1494 chars): 1.34ms -> 1.24ms large template (3520 chars): 4.53ms -> 3.06ms (32%% faster) The normalize phase specifically (ASTv1 -> ASTv2) drops from ~1.72ms to ~0.43ms on the large template — a 4x speedup in that phase. All tests pass.

- charPosFor now returns null for out-of-range lines (lineIdx > newlineOffsets.length) and negative columns, matching its 'number | null' return type. Previously it silently returned column or source.length in those cases. - Adds direct unit tests for Source.hbsPosFor / charPosFor covering empty source, single-char/newline, exact-newline offsets, column-past-line-end clamping, and negative/out-of-range inputs.

NullVoxPopuli · 2026-04-16T21:22:41Z

This passes the perf test for realsies (I never trust AI results here, because it rarely ever exposes its methodology)

main

benchmark                   avg (min … max) p75 / p99    (min … top 1%)
------------------------------------------- -------------------------------
parse      small (1517c)       2.03 ms/iter   1.93 ms  █▇                  
                        (1.35 ms … 5.88 ms)   5.12 ms ███▃                 
                    (285.14 kb …   5.08 mb)   1.69 mb ████▂▃▂▂▁▁▂▃▅▂▃▂▂▂▁▂▁

normalize  small (1517c)       2.38 ms/iter   2.61 ms  █                   
                        (1.85 ms … 5.93 ms)   4.94 ms ▂█                   
                    (863.23 kb …   3.62 mb)   2.25 mb ███▅▄▆▆▃▃▂▂▁▁▂▁▁▁▁▁▁▁

precompile small (1517c)       4.78 ms/iter   5.19 ms █▇                   
                        (3.74 ms … 9.69 ms)   8.83 ms ██                   
                    (  1.63 mb …   5.76 mb)   3.29 mb █████▄▄█▃▂▃▂▂▂▁▂▂▂▂▁▂

parse      medium (4551c)      5.59 ms/iter   6.45 ms  █                   
                       (4.48 ms … 11.52 ms)   9.32 ms ▅█▂                  
                    (  1.48 mb …   6.80 mb)   5.02 mb ███▄▂▃▃▁▅▅▂▂▂▄▃▄▁▁▁▂▂

normalize  medium (4551c)      7.40 ms/iter   7.69 ms  █▂                  
                       (6.28 ms … 10.50 ms)  10.44 ms ▃██                  
                    (  4.85 mb …   8.56 mb)   6.73 mb ███▄▆▄▇▃▂▃▃▂▃▂▂▃▆▂▁▄▂

precompile medium (4551c)     17.13 ms/iter  17.87 ms ▂█  ▂ ██ █     ▂    ▂
                      (13.55 ms … 23.41 ms)  22.65 ms ██  █ ██▅█     █    █
                    (  7.17 mb …  12.83 mb)   9.94 mb ██▇▇█▇████▁▇▁▁▇█▁▁▁▁█

parse      large (33374c)     67.38 ms/iter  69.60 ms           █          
                      (58.44 ms … 82.00 ms)  73.94 ms           █         █
                    ( 41.03 mb …  42.98 mb)  41.63 mb █▁█▁▁▁██▁██▁▁▁█▁▁▁▁▁█

normalize  large (33374c)    101.39 ms/iter 103.56 ms         █       █    
                     (94.21 ms … 114.91 ms) 105.69 ms ▅ ▅  ▅ ▅█      ▅█ ▅ ▅
                    (128.80 kb …  55.72 mb)   9.63 mb █▁█▁▁█▁██▁▁▁▁▁▁██▁█▁█

precompile large (33374c)    167.42 ms/iter 168.17 ms      █               
                    (149.29 ms … 222.16 ms) 181.56 ms ▅▅ ▅ █   ▅▅▅▅▅      ▅
                    ( 12.68 mb …  25.30 mb)  18.11 mb ██▁█▁█▁▁▁█████▁▁▁▁▁▁█

this branch

benchmark                   avg (min … max) p75 / p99    (min … top 1%)
------------------------------------------- -------------------------------
parse      small (1517c)       1.85 ms/iter   1.82 ms ▄█                   
                        (1.28 ms … 5.40 ms)   4.98 ms ██▅                  
                    ( 50.09 kb …   5.27 mb)   1.68 mb ███▅▄▂▂▂▂▂▄▃▃▁▁▁▂▁▁▁▁

normalize  small (1517c)       2.20 ms/iter   2.35 ms  █                   
                        (1.74 ms … 4.76 ms)   4.49 ms ▇█                   
                    (208.55 kb …   3.30 mb)   2.24 mb ██▇▅▂▃▄▃▄▃▁▂▁▁▁▁▁▁▁▂▁

precompile small (1517c)       4.59 ms/iter   5.15 ms  █                   
                       (3.56 ms … 10.41 ms)   8.86 ms ▇█                   
                    (  1.33 mb …   5.22 mb)   3.29 mb ███▇▃▃▅▄▃▂▄▂▃▃▂▂▂▁▁▁▂

parse      medium (4551c)      4.99 ms/iter   5.40 ms  █                   
                        (3.91 ms … 9.99 ms)   8.19 ms  █▃                  
                    (  1.67 mb …   7.23 mb)   5.02 mb ▆███▄▃▄▃▆▂▂▂▂▂▂▂▂▂▄▂▂

normalize  medium (4551c)      6.49 ms/iter   6.75 ms  █▃▇                 
                       (5.28 ms … 10.82 ms)  10.07 ms ▃███                 
                    (  4.94 mb …   8.22 mb)   6.72 mb █████▇▃▃▃▄▁▂▇▁▂▆▂▃▂▂▂

precompile medium (4551c)     15.86 ms/iter  17.00 ms █▅                   
                      (12.31 ms … 25.97 ms)  25.73 ms ██▇ ▅▂      ▂        
                    (  7.19 mb …  12.83 mb)   9.86 mb ███▄██▁▇▁▄▁▄█▁▄▁▁▁▇▁▄

parse      large (33374c)     39.64 ms/iter  40.12 ms        █▂            
                      (33.83 ms … 49.13 ms)  45.20 ms        ██            
                    ( 39.89 mb …  42.97 mb)  41.27 mb ▇▁▁▁▇▁▁██▁▁▇▇▁▁▁▁▇▁▁▇

normalize  large (33374c)     55.69 ms/iter  57.80 ms              █       
                      (50.53 ms … 61.30 ms)  58.28 ms ▅   ▅   ▅    █ ▅   ▅▅
                    (245.27 kb …  56.41 mb)   9.92 mb █▁▁▁█▁▁▁█▁▁▁▁█▁█▁▁▁██

precompile large (33374c)    110.94 ms/iter 116.99 ms  ██                █ 
                     (99.87 ms … 147.92 ms) 118.30 ms ▅██  ▅ ▅        ▅  █▅
                    ( 12.42 mb …  25.93 mb)  18.11 mb ███▁▁█▁█▁▁▁▁▁▁▁▁█▁▁██

it looks like this branch is both fastter and uses less RAM in most cases

NullVoxPopuli · 2026-04-16T21:43:40Z

📊 Package size report `0.01%↑`

File	Before (Size / Brotli)	After (Size / Brotli)
`dist/dev/packages/shared-chunks/compiler-BdWF4Soc.js`	`177.3 kB` / `33.8 kB`	^0.2%↑`177.7 kB` / ^0.8%↑`34 kB`
`dist/prod/packages/shared-chunks/compiler-DDPqc0HO.js`	`190.5 kB` / `36.1 kB`	^0.2%↑`190.9 kB` / ^0.7%↑`36.4 kB`
Total _{(Includes all files)}	`5.4 MB` / `1.3 MB`	^0.01%↑`5.4 MB` / ^0.04%↑`1.3 MB`
Tarball size	`1.2 MB`	^0.05%↑`1.2 MB`

johanrd marked this pull request as draft April 15, 2026 14:01

johanrd force-pushed the perf/source-newline-cache branch from 96ca173 to aca513f Compare April 15, 2026 18:11

johanrd marked this pull request as ready for review April 15, 2026 18:34

NullVoxPopuli reviewed Apr 16, 2026

View reviewed changes

Comment thread packages/@glimmer/syntax/bench/syntax.bench.mjs Outdated

NullVoxPopuli reviewed Apr 16, 2026

View reviewed changes

Comment thread packages/@glimmer/syntax/bench/syntax.bench.mjs Outdated

NullVoxPopuli reviewed Apr 16, 2026

View reviewed changes

Comment thread package.json Outdated

johanrd force-pushed the perf/source-newline-cache branch from aca513f to a5c0a3f Compare April 16, 2026 06:31

johanrd force-pushed the perf/source-newline-cache branch from a5c0a3f to e0431fc Compare April 16, 2026 07:14

johanrd requested a review from NullVoxPopuli April 16, 2026 07:42

johanrd mentioned this pull request Apr 16, 2026

add mitata harness for precompile (parse/normalize/precompile) #21316

Merged

2 tasks

johanrd added 2 commits April 16, 2026 20:57

johanrd force-pushed the perf/source-newline-cache branch from e0431fc to 4f15abb Compare April 16, 2026 19:06

NullVoxPopuli approved these changes Apr 16, 2026

View reviewed changes

NullVoxPopuli changed the title ~~POC/Perf: cache newline offsets in Source for O(log n) loc conversion~~ Perf: cache newline offsets in Source for O(log n) loc conversion Apr 16, 2026

NullVoxPopuli merged commit f9ecab5 into emberjs:main Apr 16, 2026
41 checks passed

This was referenced Apr 16, 2026

POC: Perf/handlebars v2 parser johanrd/ember.js#13

Draft

Perf/handlebars v2 parser single pass johanrd/ember.js#15

Closed

johanrd mentioned this pull request Apr 16, 2026

Perf/handlebars v2 parser single pass johanrd/ember.js#17

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Perf: cache newline offsets in Source for O(log n) loc conversion#21314

Perf: cache newline offsets in Source for O(log n) loc conversion#21314
NullVoxPopuli merged 2 commits intoemberjs:mainfrom
johanrd:perf/source-newline-cache

johanrd commented Apr 15, 2026 •

edited

Loading

Uh oh!

NullVoxPopuli commented Apr 15, 2026

Uh oh!

NullVoxPopuli commented Apr 15, 2026

Uh oh!

johanrd commented Apr 15, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NullVoxPopuli commented Apr 16, 2026

Uh oh!

johanrd commented Apr 16, 2026

Uh oh!

NullVoxPopuli commented Apr 16, 2026 •

edited

Loading

Uh oh!

NullVoxPopuli commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

johanrd commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Cache newline offsets in Source

The problem

The fix

Impact

Per-char cost (µs/char) — the O(n²) → O(n log n) flattening

Absolute speedups on large templates

Prod build (vite build, production deploys)

Dev build (vite dev, ember serve, local app development)

Who benefits

Testing

Compat

Scope

Uh oh!

NullVoxPopuli commented Apr 15, 2026

Uh oh!

NullVoxPopuli commented Apr 15, 2026

Uh oh!

johanrd commented Apr 15, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NullVoxPopuli commented Apr 16, 2026

Uh oh!

johanrd commented Apr 16, 2026

Uh oh!

NullVoxPopuli commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NullVoxPopuli commented Apr 16, 2026

📊 Package size report 0.01%↑

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

johanrd commented Apr 15, 2026 •

edited

Loading

Cache newline offsets in `Source`

Prod build (`vite build`, production deploys)

Dev build (`vite dev`, `ember serve`, local app development)

NullVoxPopuli commented Apr 16, 2026 •

edited

Loading

📊 Package size report `0.01%↑`