bench(engine): ReorderBuffer memory occupancy at 100K/500K/1M (#1564)#4204
Merged
Conversation
Add a Criterion benchmark that synthesises 100K, 500K, and 1M out-of-order inserts across drift windows of 32, 256, 2048, and 16K, then reports insert+drain throughput together with the peak occupancy via the `metrics().max_depth` accessor. The benchmark pre-allocates the drifted permutation outside the timed section and prints `max_depth` once per (count, drift) pair so operators can compare against in-flight dispatch capacity and decide whether the spill (#1884) or adaptive-sizing (#1834) paths are warranted. The 1M case is gated behind `BENCH_REORDER_MEMORY_1M=1` to keep default runs fast.
This was referenced May 17, 2026
oferchen
added a commit
that referenced
this pull request
May 17, 2026
…#4212) Document the four CAPACITY_MULTIPLIER sites and the two duplicate hard-coded `2`s in delta_pipeline.rs, justify each against the recent dispatch benches (#4203 channel overhead, #4204 reorder memory, #4206 dispatch decomposition, #4209 sp vs mp), and recommend keeping the default at 2 with one follow-up bench specified to challenge it.
3 tasks
oferchen
added a commit
that referenced
this pull request
May 18, 2026
…4204) Add a Criterion benchmark that synthesises 100K, 500K, and 1M out-of-order inserts across drift windows of 32, 256, 2048, and 16K, then reports insert+drain throughput together with the peak occupancy via the `metrics().max_depth` accessor. The benchmark pre-allocates the drifted permutation outside the timed section and prints `max_depth` once per (count, drift) pair so operators can compare against in-flight dispatch capacity and decide whether the spill (#1884) or adaptive-sizing (#1834) paths are warranted. The 1M case is gated behind `BENCH_REORDER_MEMORY_1M=1` to keep default runs fast.
oferchen
added a commit
that referenced
this pull request
May 18, 2026
…#4212) Document the four CAPACITY_MULTIPLIER sites and the two duplicate hard-coded `2`s in delta_pipeline.rs, justify each against the recent dispatch benches (#4203 channel overhead, #4204 reorder memory, #4206 dispatch decomposition, #4209 sp vs mp), and recommend keeping the default at 2 with one follow-up bench specified to challenge it.
oferchen
added a commit
that referenced
this pull request
May 18, 2026
…4204) Add a Criterion benchmark that synthesises 100K, 500K, and 1M out-of-order inserts across drift windows of 32, 256, 2048, and 16K, then reports insert+drain throughput together with the peak occupancy via the `metrics().max_depth` accessor. The benchmark pre-allocates the drifted permutation outside the timed section and prints `max_depth` once per (count, drift) pair so operators can compare against in-flight dispatch capacity and decide whether the spill (#1884) or adaptive-sizing (#1834) paths are warranted. The 1M case is gated behind `BENCH_REORDER_MEMORY_1M=1` to keep default runs fast.
oferchen
added a commit
that referenced
this pull request
May 18, 2026
…#4212) Document the four CAPACITY_MULTIPLIER sites and the two duplicate hard-coded `2`s in delta_pipeline.rs, justify each against the recent dispatch benches (#4203 channel overhead, #4204 reorder memory, #4206 dispatch decomposition, #4209 sp vs mp), and recommend keeping the default at 2 with one follow-up bench specified to challenge it.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
crates/engine/benches/reorderbuffer_memory.rs, a Criterion benchmark that synthesises 100K, 500K, and 1M out-of-order inserts across drift windows of 32, 256, 2048, and 16K to profile the merge point of the parallel delta pipeline.Throughput::Elementsand printsmetrics().max_depthper (count, drift) pair so operators see the high-water mark.max_depthfor the in-flight dispatch bound and what favorable vs unfavorable readings mean for the spill (Ensure release workflow uploads architecture-specific assets #1884) and adaptive-sizing (Scan for placeholder markers on the first line #1834) tracks. The 1M case is gated behindBENCH_REORDER_MEMORY_1M=1to keep default invocations fast.Test plan
cargo bench -p engine --bench reorderbuffer_memoryruns the 100K and 500K cases and emits amax_depthline per drift.BENCH_REORDER_MEMORY_1M=1 cargo bench -p engine --bench reorderbuffer_memoryexercises the heavy 1M case.