Skip to content

perf(concat): hoist per-token chunk state out of the per-iteration closure#331

Open
Boshen wants to merge 1 commit into
mainfrom
perf/concat-hoist-state
Open

perf(concat): hoist per-token chunk state out of the per-iteration closure#331
Boshen wants to merge 1 commit into
mainfrom
perf/concat-hoist-state

Conversation

@Boshen
Copy link
Copy Markdown
Member

@Boshen Boshen commented May 25, 2026

Summary

`add_sourcemap`'s per-token loop captured `&mut self` inside the `get_source_id().map(...)` / `get_name_id().map(...)` closures so it could update `token_chunk_prev_source_id` / `token_chunk_prev_name_id` on every iteration. That forced the compiler to round-trip the `prev_*` fields through memory (load, mutate, store via the `self` pointer) for every token rather than keeping them in registers.

Track the running prev values in plain locals, then write them back to `self` once at the end. Also handle the first token separately so the `i == 0` dedup check no longer runs every iteration.

Behavior, including the duplicate-token de-duplication at the concatenation seam, is unchanged.

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 25, 2026

Merging this PR will improve performance by 2.5%

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 4 improved benchmarks
✅ 12 untouched benchmarks
⏩ 5 skipped benchmarks1

Performance Changes

Benchmark BASE HEAD Efficiency
add_sourcemap_loop 363 µs 353 µs +2.83%
from_sourcemaps 344.3 µs 334.3 µs +3.02%
lookup_table[real_medium] 1.5 µs 1.5 µs +1.97%
lookup_table[real_small] 1.4 µs 1.3 µs +2.18%

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.


Comparing perf/concat-hoist-state (15efed1) with main (9323530)

Open in CodSpeed

Footnotes

  1. 5 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

…osure

`add_sourcemap`'s per-token loop captured `&mut self` inside the
`get_source_id().map(...)` / `get_name_id().map(...)` closures so it
could update `token_chunk_prev_source_id` / `token_chunk_prev_name_id`
on every iteration. That forced the compiler to round-trip the
prev_* fields through memory (load, mutate, store via the `self`
pointer) for every token rather than keeping them in registers.

Track the running prev values in plain locals, then write them back to
`self` once at the end. Also handle the first token separately so the
`i == 0` dedup check no longer runs every iteration.

Behavior, including the duplicate-token de-duplication at the
concatenation seam, is unchanged.
@Boshen Boshen force-pushed the perf/concat-hoist-state branch from 10ebd62 to 15efed1 Compare May 25, 2026 05:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant