bench(engine): ReorderBuffer memory occupancy at 100K/500K/1M (#1564) by oferchen · Pull Request #4204 · oferchen/rsync

oferchen · 2026-05-17T06:57:20Z

Summary

Adds crates/engine/benches/reorderbuffer_memory.rs, a Criterion benchmark that synthesises 100K, 500K, and 1M out-of-order inserts across drift windows of 32, 256, 2048, and 16K to profile the merge point of the parallel delta pipeline.
Reports insert+drain throughput via Throughput::Elements and prints metrics().max_depth per (count, drift) pair so operators see the high-water mark.
Documents how to interpret max_depth for the in-flight dispatch bound and what favorable vs unfavorable readings mean for the spill (Ensure release workflow uploads architecture-specific assets #1884) and adaptive-sizing (Scan for placeholder markers on the first line #1834) tracks. The 1M case is gated behind BENCH_REORDER_MEMORY_1M=1 to keep default invocations fast.

Test plan

cargo bench -p engine --bench reorderbuffer_memory runs the 100K and 500K cases and emits a max_depth line per drift.
BENCH_REORDER_MEMORY_1M=1 cargo bench -p engine --bench reorderbuffer_memory exercises the heavy 1M case.
CI fmt + clippy gates remain green; no production code touched.

Add a Criterion benchmark that synthesises 100K, 500K, and 1M out-of-order inserts across drift windows of 32, 256, 2048, and 16K, then reports insert+drain throughput together with the peak occupancy via the `metrics().max_depth` accessor. The benchmark pre-allocates the drifted permutation outside the timed section and prints `max_depth` once per (count, drift) pair so operators can compare against in-flight dispatch capacity and decide whether the spill (#1884) or adaptive-sizing (#1834) paths are warranted. The 1M case is gated behind `BENCH_REORDER_MEMORY_1M=1` to keep default runs fast.

…#4212) Document the four CAPACITY_MULTIPLIER sites and the two duplicate hard-coded `2`s in delta_pipeline.rs, justify each against the recent dispatch benches (#4203 channel overhead, #4204 reorder memory, #4206 dispatch decomposition, #4209 sp vs mp), and recommend keeping the default at 2 with one follow-up bench specified to challenge it.

…4204) Add a Criterion benchmark that synthesises 100K, 500K, and 1M out-of-order inserts across drift windows of 32, 256, 2048, and 16K, then reports insert+drain throughput together with the peak occupancy via the `metrics().max_depth` accessor. The benchmark pre-allocates the drifted permutation outside the timed section and prints `max_depth` once per (count, drift) pair so operators can compare against in-flight dispatch capacity and decide whether the spill (#1884) or adaptive-sizing (#1834) paths are warranted. The 1M case is gated behind `BENCH_REORDER_MEMORY_1M=1` to keep default runs fast.

…#4212) Document the four CAPACITY_MULTIPLIER sites and the two duplicate hard-coded `2`s in delta_pipeline.rs, justify each against the recent dispatch benches (#4203 channel overhead, #4204 reorder memory, #4206 dispatch decomposition, #4209 sp vs mp), and recommend keeping the default at 2 with one follow-up bench specified to challenge it.

…4204) Add a Criterion benchmark that synthesises 100K, 500K, and 1M out-of-order inserts across drift windows of 32, 256, 2048, and 16K, then reports insert+drain throughput together with the peak occupancy via the `metrics().max_depth` accessor. The benchmark pre-allocates the drifted permutation outside the timed section and prints `max_depth` once per (count, drift) pair so operators can compare against in-flight dispatch capacity and decide whether the spill (#1884) or adaptive-sizing (#1834) paths are warranted. The 1M case is gated behind `BENCH_REORDER_MEMORY_1M=1` to keep default runs fast.

…#4212) Document the four CAPACITY_MULTIPLIER sites and the two duplicate hard-coded `2`s in delta_pipeline.rs, justify each against the recent dispatch benches (#4203 channel overhead, #4204 reorder memory, #4206 dispatch decomposition, #4209 sp vs mp), and recommend keeping the default at 2 with one follow-up bench specified to challenge it.

oferchen merged commit 0a24a3c into master May 17, 2026
40 checks passed

oferchen deleted the bench/reorderbuffer-memory-1564 branch May 17, 2026 08:13

This was referenced May 17, 2026

bench(engine): decompose parallel dispatch overhead at 100K items (#1551) #4206

Merged

docs(design): tune CAPACITY_MULTIPLIER from parallel-dispatch benches (#1553) #4212

Merged

oferchen mentioned this pull request May 17, 2026

docs(design): ReorderBuffer spill-to-tempfile for stalled successors (#1884) #4228

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bench(engine): ReorderBuffer memory occupancy at 100K/500K/1M (#1564)#4204

bench(engine): ReorderBuffer memory occupancy at 100K/500K/1M (#1564)#4204
oferchen merged 1 commit into
masterfrom
bench/reorderbuffer-memory-1564

oferchen commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

oferchen commented May 17, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant