Skip to content

bench(engine): ReorderBuffer memory occupancy at 100K/500K/1M (#1564)#4204

Merged
oferchen merged 1 commit into
masterfrom
bench/reorderbuffer-memory-1564
May 17, 2026
Merged

bench(engine): ReorderBuffer memory occupancy at 100K/500K/1M (#1564)#4204
oferchen merged 1 commit into
masterfrom
bench/reorderbuffer-memory-1564

Conversation

@oferchen
Copy link
Copy Markdown
Owner

Summary

  • Adds crates/engine/benches/reorderbuffer_memory.rs, a Criterion benchmark that synthesises 100K, 500K, and 1M out-of-order inserts across drift windows of 32, 256, 2048, and 16K to profile the merge point of the parallel delta pipeline.
  • Reports insert+drain throughput via Throughput::Elements and prints metrics().max_depth per (count, drift) pair so operators see the high-water mark.
  • Documents how to interpret max_depth for the in-flight dispatch bound and what favorable vs unfavorable readings mean for the spill (Ensure release workflow uploads architecture-specific assets #1884) and adaptive-sizing (Scan for placeholder markers on the first line #1834) tracks. The 1M case is gated behind BENCH_REORDER_MEMORY_1M=1 to keep default invocations fast.

Test plan

  • cargo bench -p engine --bench reorderbuffer_memory runs the 100K and 500K cases and emits a max_depth line per drift.
  • BENCH_REORDER_MEMORY_1M=1 cargo bench -p engine --bench reorderbuffer_memory exercises the heavy 1M case.
  • CI fmt + clippy gates remain green; no production code touched.

Add a Criterion benchmark that synthesises 100K, 500K, and 1M out-of-order
inserts across drift windows of 32, 256, 2048, and 16K, then reports
insert+drain throughput together with the peak occupancy via the
`metrics().max_depth` accessor.

The benchmark pre-allocates the drifted permutation outside the timed
section and prints `max_depth` once per (count, drift) pair so operators
can compare against in-flight dispatch capacity and decide whether the
spill (#1884) or adaptive-sizing (#1834) paths are warranted. The 1M case
is gated behind `BENCH_REORDER_MEMORY_1M=1` to keep default runs fast.
@oferchen oferchen merged commit 0a24a3c into master May 17, 2026
40 checks passed
@oferchen oferchen deleted the bench/reorderbuffer-memory-1564 branch May 17, 2026 08:13
oferchen added a commit that referenced this pull request May 17, 2026
…#4212)

Document the four CAPACITY_MULTIPLIER sites and the two duplicate
hard-coded `2`s in delta_pipeline.rs, justify each against the recent
dispatch benches (#4203 channel overhead, #4204 reorder memory, #4206
dispatch decomposition, #4209 sp vs mp), and recommend keeping the
default at 2 with one follow-up bench specified to challenge it.
oferchen added a commit that referenced this pull request May 18, 2026
…4204)

Add a Criterion benchmark that synthesises 100K, 500K, and 1M out-of-order
inserts across drift windows of 32, 256, 2048, and 16K, then reports
insert+drain throughput together with the peak occupancy via the
`metrics().max_depth` accessor.

The benchmark pre-allocates the drifted permutation outside the timed
section and prints `max_depth` once per (count, drift) pair so operators
can compare against in-flight dispatch capacity and decide whether the
spill (#1884) or adaptive-sizing (#1834) paths are warranted. The 1M case
is gated behind `BENCH_REORDER_MEMORY_1M=1` to keep default runs fast.
oferchen added a commit that referenced this pull request May 18, 2026
…#4212)

Document the four CAPACITY_MULTIPLIER sites and the two duplicate
hard-coded `2`s in delta_pipeline.rs, justify each against the recent
dispatch benches (#4203 channel overhead, #4204 reorder memory, #4206
dispatch decomposition, #4209 sp vs mp), and recommend keeping the
default at 2 with one follow-up bench specified to challenge it.
oferchen added a commit that referenced this pull request May 18, 2026
…4204)

Add a Criterion benchmark that synthesises 100K, 500K, and 1M out-of-order
inserts across drift windows of 32, 256, 2048, and 16K, then reports
insert+drain throughput together with the peak occupancy via the
`metrics().max_depth` accessor.

The benchmark pre-allocates the drifted permutation outside the timed
section and prints `max_depth` once per (count, drift) pair so operators
can compare against in-flight dispatch capacity and decide whether the
spill (#1884) or adaptive-sizing (#1834) paths are warranted. The 1M case
is gated behind `BENCH_REORDER_MEMORY_1M=1` to keep default runs fast.
oferchen added a commit that referenced this pull request May 18, 2026
…#4212)

Document the four CAPACITY_MULTIPLIER sites and the two duplicate
hard-coded `2`s in delta_pipeline.rs, justify each against the recent
dispatch benches (#4203 channel overhead, #4204 reorder memory, #4206
dispatch decomposition, #4209 sp vs mp), and recommend keeping the
default at 2 with one follow-up bench specified to challenge it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant