bench(engine): profile ReorderBuffer cache behavior at 1M items (#1854) by oferchen · Pull Request #4180 · oferchen/rsync

oferchen · 2026-05-16T20:25:27Z

Summary

Adds crates/engine/benches/reorder_buffer_cache.rs, a Criterion bench that exercises ReorderBuffer insert/drain at 1M out-of-order indexed items.
Varies item payload size {32 B, 256 B, 4 KB} and insertion order pattern {fully reverse, random shuffle, near-in-order with 10% deltas}; reports ops/sec via Throughput::Elements.
Top-of-file documents how to run with perf stat -e cache-misses,cache-references (Linux) and valgrind --tool=cachegrind (any platform), plus what favorable / unfavorable counter results would imply for the storage layout (keep Box<[Option<T>]> vs. switch to flat Vec<T> + occupancy bitmap, or hot/cold split).

Notes

Bench is gated behind BENCH_REORDER_CACHE=1 so default cargo bench runs remain fast - per-cell wall time is 5-30 s at 1M items.
Payloads are pre-built outside the timed window via iter_batched(PerIteration) so the measurement reflects ReorderBuffer operations, not allocator work.
Capacity is sized per pattern so no insert ever triggers grow(); the steady-state ring is what gets profiled.
No changes to ReorderBuffer production code.

Test plan

CI: fmt + clippy + nextest pass on Linux, macOS, Windows.
Operator: BENCH_REORDER_CACHE=1 cargo bench -p engine --bench reorder_buffer_cache completes on a Linux host.
Operator: perf stat -e cache-misses,cache-references,L1-dcache-load-misses,LLC-load-misses numbers captured for the 1M cells (driven from instructions in the bench file header).
Operator: cachegrind run on at least one cell (e.g., payload256/shuffle) annotated via cg_annotate.

Add a Criterion bench that exercises ReorderBuffer's insert/drain path at 1M out-of-order indexed items across three payload sizes (32 B, 256 B, 4 KB) and three insertion patterns (fully reverse, random shuffle, near-in-order with 10% deltas). The bench is gated behind BENCH_REORDER_CACHE=1 so default cargo bench runs stay fast. The top-of-file documents how to drive perf stat (Linux) and cachegrind (any platform) against the produced binary, and how to interpret the numbers - favorable cache metrics keep the existing Box<[Option<T>]> layout, while unfavorable metrics motivate a flat Vec plus occupancy-bitmap layout or a hot/cold storage split. Payloads are pre-built outside the timed section via iter_batched so the measurement reflects ReorderBuffer ops, not allocation. Capacity is sized per pattern so no insert ever triggers grow().

… (#4180) * chore(bench): profile ReorderBuffer cache behavior at 1M items (#1854) Add a Criterion bench that exercises ReorderBuffer's insert/drain path at 1M out-of-order indexed items across three payload sizes (32 B, 256 B, 4 KB) and three insertion patterns (fully reverse, random shuffle, near-in-order with 10% deltas). The bench is gated behind BENCH_REORDER_CACHE=1 so default cargo bench runs stay fast. The top-of-file documents how to drive perf stat (Linux) and cachegrind (any platform) against the produced binary, and how to interpret the numbers - favorable cache metrics keep the existing Box<[Option<T>]> layout, while unfavorable metrics motivate a flat Vec plus occupancy-bitmap layout or a hot/cold storage split. Payloads are pre-built outside the timed section via iter_batched so the measurement reflects ReorderBuffer ops, not allocation. Capacity is sized per pattern so no insert ever triggers grow(). * style(engine): fix clippy doc-lazy-continuation and hex literal grouping * style(engine): break paragraph out of nested list in bench doc

oferchen force-pushed the bench/reorderbuffer-cache-behavior-1854 branch 3 times, most recently from 423b9ae to 012a769 Compare May 16, 2026 22:08

oferchen added 3 commits May 17, 2026 01:33

style(engine): fix clippy doc-lazy-continuation and hex literal grouping

9cf4863

style(engine): break paragraph out of nested list in bench doc

726a694

oferchen force-pushed the bench/reorderbuffer-cache-behavior-1854 branch from 012a769 to 726a694 Compare May 16, 2026 22:33

oferchen merged commit 620d31f into master May 16, 2026
14 checks passed

oferchen deleted the bench/reorderbuffer-cache-behavior-1854 branch May 16, 2026 22:33

This was referenced May 17, 2026

bench(engine): decompose parallel dispatch overhead at 100K items (#1551) #4206

Merged

docs(design): ReorderBuffer spill-to-tempfile for stalled successors (#1884) #4228

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bench(engine): profile ReorderBuffer cache behavior at 1M items (#1854)#4180

bench(engine): profile ReorderBuffer cache behavior at 1M items (#1854)#4180
oferchen merged 3 commits into
masterfrom
bench/reorderbuffer-cache-behavior-1854

oferchen commented May 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

oferchen commented May 16, 2026

Summary

Notes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant