Skip to content

bench(engine): profile ReorderBuffer cache behavior at 1M items (#1854)#4180

Merged
oferchen merged 3 commits into
masterfrom
bench/reorderbuffer-cache-behavior-1854
May 16, 2026
Merged

bench(engine): profile ReorderBuffer cache behavior at 1M items (#1854)#4180
oferchen merged 3 commits into
masterfrom
bench/reorderbuffer-cache-behavior-1854

Conversation

@oferchen
Copy link
Copy Markdown
Owner

Summary

  • Adds crates/engine/benches/reorder_buffer_cache.rs, a Criterion bench that exercises ReorderBuffer insert/drain at 1M out-of-order indexed items.
  • Varies item payload size {32 B, 256 B, 4 KB} and insertion order pattern {fully reverse, random shuffle, near-in-order with 10% deltas}; reports ops/sec via Throughput::Elements.
  • Top-of-file documents how to run with perf stat -e cache-misses,cache-references (Linux) and valgrind --tool=cachegrind (any platform), plus what favorable / unfavorable counter results would imply for the storage layout (keep Box<[Option<T>]> vs. switch to flat Vec<T> + occupancy bitmap, or hot/cold split).

Notes

  • Bench is gated behind BENCH_REORDER_CACHE=1 so default cargo bench runs remain fast - per-cell wall time is 5-30 s at 1M items.
  • Payloads are pre-built outside the timed window via iter_batched(PerIteration) so the measurement reflects ReorderBuffer operations, not allocator work.
  • Capacity is sized per pattern so no insert ever triggers grow(); the steady-state ring is what gets profiled.
  • No changes to ReorderBuffer production code.

Test plan

  • CI: fmt + clippy + nextest pass on Linux, macOS, Windows.
  • Operator: BENCH_REORDER_CACHE=1 cargo bench -p engine --bench reorder_buffer_cache completes on a Linux host.
  • Operator: perf stat -e cache-misses,cache-references,L1-dcache-load-misses,LLC-load-misses numbers captured for the 1M cells (driven from instructions in the bench file header).
  • Operator: cachegrind run on at least one cell (e.g., payload256/shuffle) annotated via cg_annotate.

@oferchen oferchen force-pushed the bench/reorderbuffer-cache-behavior-1854 branch 3 times, most recently from 423b9ae to 012a769 Compare May 16, 2026 22:08
oferchen added 3 commits May 17, 2026 01:33
Add a Criterion bench that exercises ReorderBuffer's insert/drain
path at 1M out-of-order indexed items across three payload sizes
(32 B, 256 B, 4 KB) and three insertion patterns (fully reverse,
random shuffle, near-in-order with 10% deltas). The bench is gated
behind BENCH_REORDER_CACHE=1 so default cargo bench runs stay fast.

The top-of-file documents how to drive perf stat (Linux) and
cachegrind (any platform) against the produced binary, and how to
interpret the numbers - favorable cache metrics keep the existing
Box<[Option<T>]> layout, while unfavorable metrics motivate a flat
Vec plus occupancy-bitmap layout or a hot/cold storage split.

Payloads are pre-built outside the timed section via iter_batched
so the measurement reflects ReorderBuffer ops, not allocation.
Capacity is sized per pattern so no insert ever triggers grow().
@oferchen oferchen force-pushed the bench/reorderbuffer-cache-behavior-1854 branch from 012a769 to 726a694 Compare May 16, 2026 22:33
@oferchen oferchen merged commit 620d31f into master May 16, 2026
14 checks passed
@oferchen oferchen deleted the bench/reorderbuffer-cache-behavior-1854 branch May 16, 2026 22:33
oferchen added a commit that referenced this pull request May 18, 2026
… (#4180)

* chore(bench): profile ReorderBuffer cache behavior at 1M items (#1854)

Add a Criterion bench that exercises ReorderBuffer's insert/drain
path at 1M out-of-order indexed items across three payload sizes
(32 B, 256 B, 4 KB) and three insertion patterns (fully reverse,
random shuffle, near-in-order with 10% deltas). The bench is gated
behind BENCH_REORDER_CACHE=1 so default cargo bench runs stay fast.

The top-of-file documents how to drive perf stat (Linux) and
cachegrind (any platform) against the produced binary, and how to
interpret the numbers - favorable cache metrics keep the existing
Box<[Option<T>]> layout, while unfavorable metrics motivate a flat
Vec plus occupancy-bitmap layout or a hot/cold storage split.

Payloads are pre-built outside the timed section via iter_batched
so the measurement reflects ReorderBuffer ops, not allocation.
Capacity is sized per pattern so no insert ever triggers grow().

* style(engine): fix clippy doc-lazy-continuation and hex literal grouping

* style(engine): break paragraph out of nested list in bench doc
oferchen added a commit that referenced this pull request May 18, 2026
… (#4180)

* chore(bench): profile ReorderBuffer cache behavior at 1M items (#1854)

Add a Criterion bench that exercises ReorderBuffer's insert/drain
path at 1M out-of-order indexed items across three payload sizes
(32 B, 256 B, 4 KB) and three insertion patterns (fully reverse,
random shuffle, near-in-order with 10% deltas). The bench is gated
behind BENCH_REORDER_CACHE=1 so default cargo bench runs stay fast.

The top-of-file documents how to drive perf stat (Linux) and
cachegrind (any platform) against the produced binary, and how to
interpret the numbers - favorable cache metrics keep the existing
Box<[Option<T>]> layout, while unfavorable metrics motivate a flat
Vec plus occupancy-bitmap layout or a hot/cold storage split.

Payloads are pre-built outside the timed section via iter_batched
so the measurement reflects ReorderBuffer ops, not allocation.
Capacity is sized per pattern so no insert ever triggers grow().

* style(engine): fix clippy doc-lazy-continuation and hex literal grouping

* style(engine): break paragraph out of nested list in bench doc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant