perf(decode): skip ,/; pre-scan for large mappings#343
Merged
Conversation
…an for small Replace the unconditional O(n) `,` / `;` pre-scan in `decode_mapping` with a size-gated estimate, factored into a new `estimate_token_capacity` helper: * mapping.len() < 256 — keep the exact scan. It's cheap on short mappings and getting capacity exactly right lets the trailing `Vec::into_boxed_slice` in `decode_from_string` skip its shrink-realloc. That realloc otherwise dominates per-iteration cost on tiny benchmarks (32-byte fixtures), so an over-estimate would regress them. * mapping.len() ≥ 256 — use the O(1) heuristic `mapping.len() / 4 + 1`. Realistic sourcemaps average ~4-5 bytes per segment, so the estimate is close to the typical count. `Vec::push` handles under-estimates via geometric growth, and the trailing `into_boxed_slice` shrinks any over-allocation back to exact size — so RSS is unaffected. No unsafe. Local bench vs main is dominated by the xlarge win.
Merging this PR will improve performance by 3.61%
|
| Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|
| ❌ | build_single |
7 µs | 7.1 µs | -1.23% |
| ❌ | lookup_table[real_medium] |
1.4 µs | 1.5 µs | -1.97% |
| ⚡ | parse[real_large] |
50.5 µs | 46.5 µs | +8.56% |
| ⚡ | parse[real_medium] |
14.9 µs | 14.5 µs | +2.74% |
| ⚡ | parse[real_xlarge] |
1.4 ms | 1.3 ms | +10.59% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing perf/decode-skip-prescan (4dabc46) with main (f9a6387)
Footnotes
-
5 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces the unconditional O(n)
,/;pre-scan indecode_mappingwith a size-gated estimate, factored into a newestimate_token_capacityhelper:,/;count. Cheap on short mappings, and getting the capacity exact lets the trailingVec::into_boxed_sliceindecode_from_stringskip its shrink-realloc (otherwise the realloc dominates per-iteration cost on tiny benchmarks).mapping.len() / 4 + 1. Realistic sourcemaps average ~4-5 bytes per segment, so the estimate is close to typical.Vec::pushhandles under-estimates via geometric growth;into_boxed_sliceshrinks any over-allocation back to exact size, so RSS is unchanged.No
unsafe.CodSpeed
Net: +3.61% improvement.
Wins (on the decode hot path — the algorithmic target)
parse[real_xlarge]parse[real_large]parse[real_medium]Regressions (binary-layout artifacts)
build_singlelookup_table[real_medium]Both regressed functions live in
src/sourcemap_builder.rsandsrc/sourcemap.rsrespectively, not insrc/decode.rs— they aren't algorithmically touched. Inserting any code intodecode.rsshifts the.textsection and moves other functions to slightly different addresses, which changes cache behavior in CodSpeed's deterministic simulator. Verified by trying multiple variants (inline if-else, threshold 128 vs 256, helper-fn extraction) — the regression set is the same regardless.The absolute delta on the regressed benchmarks is ~80ns and ~30ns respectively. The xlarge win is ~150µs (3,750× larger). Net real-world impact on bundler workloads is overwhelmingly positive.
RSS
Measured at 500 parsed xlarge maps:
,/;pre-scan for large mappings #343: 223,200 KB (no change —into_boxed_sliceshrinksVecto exact size on every parse)