chore(bench): split specifier microbenches into a separate bench binary by stormslowly · Pull Request #247 · rstackjs/rspack-resolver

stormslowly · 2026-05-27T03:00:32Z

Why

specifier/realistic[rw/hash-only] (~340 instructions) and specifier/branches[fragment/short] (~170 instructions) are short enough that a single cold instruction-cache fill on the CodSpeed measurement iteration dominates the result. ~200 cycles of fixed overhead per case translates to +5% to +10% deltas under any unrelated binary-layout shift, even when the parser itself is unchanged or faster. CodSpeed surfaces these as false-positive regressions.

Concrete signature from the #246 investigation: every "regressed" short specifier bench showed Ir Δ = 0, cycles Δ ≈ +105, I1mr Δ = +1, ILmr Δ = +1 — same value across all cases, a cache-line shift artifact, not a real perf change.

What

Stabilize the short-specifier microbenches against binary-layout noise two complementary ways:

1. Separate `[[bench]]` binary

Move the four specifier/* bench groups into their own benches/specifier.rs and register it as a second [[bench]] in Cargo.toml. Each [[bench]] runs in its own process, so the specifier binary gets a fresh, much smaller instruction-cache footprint instead of competing with the large bench_resolver code for cache lines.

	Before	After
Bench binaries	1 (`resolver`)	2 (`resolver`, `specifier`)
Specifier code shares I-cache with	resolver setup + symlink fixtures + tokio runtime + …	only the 4 specifier bench groups
Cold-start misses per short case	unpredictable (shifts with any unrelated change)	scoped to the specifier binary

benches/specifier.rs: new file. Allocator wrapper mirrors the one in bench_resolver so allocation costs are measured identically across binaries.
benches/resolver.rs: drops the specifier/* groups and their helpers; only bench_resolver remains.
Cargo.toml: adds [[bench]] name = "specifier".

2. Warm parse case in setup, outside Callgrind window

Each bench_with_input now runs 32 parses of the actual input before b.iter, via a warm_parse helper. CodSpeed's internal WARMUP_RUNS=5 happens inside b.iter and is intended to prime the harness, not absorb cold cache misses from a freshly-relayouted binary. The setup warmup runs before CodSpeed flips on Callgrind instrumentation, so it pages in parse code, lazy-inits the allocator, and trains the branch predictor on the actual input without inflating the measured counters.

Phase	When	Instrumented?	Purpose
`warm_parse(s)` (NEW)	before `b.iter`	no	absorb single cold I-fetch miss caused by layout shifts
`WARMUP_RUNS=5`	inside `b.iter`, before measurement	no	harness priming
measurement iter	inside `b.iter`	yes	the one number CodSpeed records

No code in Specifier::parse is touched; this is purely test-infrastructure stabilization.

Test plan

cargo build --benches --features __internal_bench — both binaries build
cargo clippy --all-features -- -D warnings — clean
CodSpeed benchmark workflow on this PR shows specifier deltas stable across follow-up unrelated commits (verified by landing a no-op binary-layout change after this and confirming the specifier deltas don't move)

Note

The first CodSpeed run after this lands will show a one-shot reset for the specifier benches (different binary → different baseline). Subsequent runs are the actual stable measurements.

codspeed-hq · 2026-05-27T03:06:50Z

Merging this PR will improve performance by 4.25%

⚠️

Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 1 improved benchmark
✅ 9 untouched benchmarks
🆕 80 new benchmarks
⏩ 84 skipped benchmarks¹

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
🆕	Memory	`both-tail[len_64]`	N/A	0 B	N/A
🆕	Memory	`query-tail[len_8]`	N/A	0 B	N/A
🆕	Memory	`specifier/branches[none/short]`	N/A	0 B	N/A
🆕	Memory	`frag-tail[len_64]`	N/A	0 B	N/A
🆕	Memory	`path-only[len_8]`	N/A	0 B	N/A
🆕	Memory	`specifier/branches[bare-module]`	N/A	0 B	N/A
🆕	Memory	`specifier/branches[query+fragment/medium]`	N/A	0 B	N/A
🆕	Memory	`specifier/branches[multi-question]`	N/A	0 B	N/A
🆕	Memory	`both-tail[len_1536]`	N/A	0 B	N/A
🆕	Memory	`specifier/branches[query/short]`	N/A	0 B	N/A
🆕	Memory	`specifier/branches[fragment/medium]`	N/A	0 B	N/A
🆕	Memory	`specifier/branches[query+fragment/short]`	N/A	0 B	N/A
🆕	Memory	`both-tail[len_256]`	N/A	0 B	N/A
🆕	Memory	`specifier/realistic[rw/css-modules]`	N/A	0 B	N/A
🆕	Memory	`specifier/branches[query/medium]`	N/A	0 B	N/A
🆕	Memory	`frag-tail[len_1536]`	N/A	0 B	N/A
🆕	Memory	`specifier/realistic[rw/loader-chain]`	N/A	0 B	N/A
🆕	Memory	`path-only[len_1536]`	N/A	0 B	N/A
🆕	Memory	`both-tail[len_8]`	N/A	0 B	N/A
🆕	Memory	`specifier/escapes[escapes_64]`	N/A	1 KB	N/A
...	...	...	...	...	...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing bench/stable-short-specifier (68da90c) with main (9dd63ca)}

84 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

`specifier/realistic[rw/hash-only]` (~340 instructions) and `specifier/branches[fragment/short]` (~170 instructions) are short enough that the cold instruction-cache fill on each CodSpeed measurement iteration dominates the result: ~200 cycles of fixed overhead translates to +5% to +10% deltas under any unrelated binary-layout shift, even when the parser itself is unchanged or faster. CodSpeed surfaces these as false-positive regressions. Move the four `specifier/*` bench groups into their own `benches/specifier.rs` and register it as a second `[[bench]]` in Cargo.toml. Each `[[bench]]` runs in its own process, so the specifier binary gets a fresh, much smaller instruction-cache footprint instead of competing with the large `bench_resolver` code for cache lines. The per-case Ir is unchanged — what changes is the working-set the kernel and the L1/LL caches see before measurement starts, which makes cold-start misses predictable across runs. - `benches/specifier.rs`: new file. Allocator wrapper mirrors the one in `bench_resolver` so allocation costs are measured identically. - `benches/resolver.rs`: drops `specifier/*` groups, helpers, unused imports. - `Cargo.toml`: adds `[[bench]] name = "specifier"`. No code paths in `Specifier::parse` are touched; this is purely test-infrastructure stabilization.

CodSpeed's `WARMUP_RUNS=5` inside `b.iter` primes the harness but does not absorb the single cold I-fetch miss (~105 estimated cycles) that a binary-layout shift can introduce on short cases like `specifier/realistic[rw/hash-only]`. Add a per-input `warm_parse` setup pass that runs 32 parses outside the Callgrind instrumentation window, paging in parse code, lazy-initializing the allocator, and training the branch predictor on the actual input before measurement begins.

Copilot

Pull request overview

This PR stabilizes CodSpeed measurements for very short specifier/* microbenchmarks by isolating them from unrelated binary-layout changes and by pre-warming the parse hot path outside the instrumented benchmark window.

Changes:

Add a second Criterion bench binary (specifier) dedicated to Specifier::parse microbenches.
Move all specifier/* benchmarks from benches/resolver.rs into the new benches/specifier.rs.
Add a warm_parse pre-pass before b.iter to reduce cold instruction-cache effects on short cases.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
Cargo.toml	Registers a new `[[bench]]` (`specifier`) gated by `__internal_bench`.
benches/specifier.rs	New dedicated specifier benchmark binary, including allocator wrapper + pre-`b.iter` warmup.
benches/resolver.rs	Removes the `specifier/*` benchmark groups and associated helpers/imports from the resolver bench binary.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

stormslowly force-pushed the bench/stable-short-specifier branch from 6198697 to 18984c2 Compare May 27, 2026 03:36

stormslowly force-pushed the bench/stable-short-specifier branch from 18984c2 to 4e33856 Compare May 27, 2026 05:46

stormslowly changed the title ~~chore(bench): amortize cold-start cache noise on short specifier benches~~ chore(bench): split specifier microbenches into a separate bench binary May 27, 2026

stormslowly mentioned this pull request May 27, 2026

perf(specifier): byte-level dispatch in require_without_parse #246

Closed

3 tasks

stormslowly force-pushed the bench/stable-short-specifier branch 2 times, most recently from f5fd078 to 4e33856 Compare May 27, 2026 06:42

stormslowly requested a review from hardfist May 27, 2026 15:29

stormslowly marked this pull request as ready for review May 27, 2026 15:29

Copilot AI review requested due to automatic review settings May 27, 2026 15:29

Copilot started reviewing on behalf of stormslowly May 27, 2026 15:29 View session

Copilot AI reviewed May 27, 2026

View reviewed changes

Comment thread benches/specifier.rs

hardfist approved these changes May 28, 2026

View reviewed changes

stormslowly merged commit 2808986 into main May 28, 2026
22 checks passed

stormslowly deleted the bench/stable-short-specifier branch May 28, 2026 03:18

stormslowly mentioned this pull request May 28, 2026

chore: release 0.9.2 #251

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(bench): split specifier microbenches into a separate bench binary#247

chore(bench): split specifier microbenches into a separate bench binary#247
stormslowly merged 2 commits into
mainfrom
bench/stable-short-specifier

stormslowly commented May 27, 2026 •

edited

Loading

Uh oh!

codspeed-hq Bot commented May 27, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

stormslowly commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What

1. Separate [[bench]] binary

2. Warm parse case in setup, outside Callgrind window

Test plan

Note

Uh oh!

codspeed-hq Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will improve performance by 4.25%

Performance Changes

Footnotes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

stormslowly commented May 27, 2026 •

edited

Loading

1. Separate `[[bench]]` binary

codspeed-hq Bot commented May 27, 2026 •

edited

Loading