Skip to content

bench(engine): single-producer vs multi-producer WorkQueue overhead (#1572)#4209

Merged
oferchen merged 1 commit into
masterfrom
bench/sp-vs-mp-workqueue-overhead-1572
May 17, 2026
Merged

bench(engine): single-producer vs multi-producer WorkQueue overhead (#1572)#4209
oferchen merged 1 commit into
masterfrom
bench/sp-vs-mp-workqueue-overhead-1572

Conversation

@oferchen
Copy link
Copy Markdown
Owner

Summary

  • Adds crates/engine/benches/sp_vs_mp_workqueue.rs (Criterion bench) with two groups that both move 100K items through the concurrent delta work queue:
    • sp/1p/100k: one producer thread pushes 100K pre-allocated DeltaWork items via the default-build Send + !Clone WorkQueueSender path.
    • mp/4p/100k: four producer threads each push 25K pre-allocated items via the gated Clone impl on WorkQueueSender. Compiled only when --features multi-producer is set.
  • Both groups report Throughput::Elements(100_000) so items/sec compare directly regardless of producer count. Inputs are pre-allocated outside the timed section via iter_batched, matching the discipline used by the parallel_dispatch_overhead bench (bench(engine): decompose parallel dispatch overhead at 100K items (#1551) #4206).
  • No Cargo.toml feature change is required; the engine crate already declares multi-producer = [] at crates/engine/Cargo.toml:91. Only the [[bench]] entry is added.

Why

PR #4173 audited every live WorkQueueSender site and concluded that all three production producer sites are correctly single-producer today. Task #1572 produces the quantitative SP-vs-MP overhead figure that gates whether the multi-producer feature should stay opt-in or graduate to default-on if a future caller wants MP fan-in. The bench's top-of-file documentation spells out the decision criteria (>=15% delta threshold, matching the buffer-pool sharded-benchmark gate) and cross-references the #4173 audit, the #4203 sync_channel bench, and the #4206 parallel_dispatch_overhead bench so reviewers can read all four data points together.

Test plan

  • CI fmt + clippy passes.
  • cargo bench -p engine --bench sp_vs_mp_workqueue builds and runs the SP group on the default build (locally verified via cargo fmt only; benches deferred to CI per repo policy).
  • cargo bench -p engine --features multi-producer --bench sp_vs_mp_workqueue builds and runs both groups when the feature is enabled.

…1572)

Adds `crates/engine/benches/sp_vs_mp_workqueue.rs` with two Criterion
groups that both move 100K items through the concurrent delta work
queue:

- `sp/1p/100k`: one producer thread pushes 100K pre-allocated DeltaWork
  items via the default-build `Send + !Clone` WorkQueueSender path.
- `mp/4p/100k`: four producer threads each push 25K pre-allocated items
  via the gated `Clone` impl on WorkQueueSender. Compiled only when
  `--features multi-producer` is set.

Both groups report `Throughput::Elements(100_000)` so items/sec figures
compare directly regardless of how the work is split. Inputs are
pre-allocated outside the timed section via `iter_batched`, matching
the discipline used by the parallel_dispatch_overhead bench.

The MP group is feature-gated behind the engine crate's existing
`multi-producer` feature - no Cargo.toml dependency change is required;
the gate already exists at `crates/engine/Cargo.toml:91`. The top-of-
file documentation cross-references the audit at
`docs/audits/workqueue-sender-multi-producer-audit.md` (PR #4173),
#4203 sync_channel bench, and #4206 parallel_dispatch_overhead bench,
and spells out the decision criteria (>=15% SP-vs-MP delta) that this
bench informs.
oferchen added a commit that referenced this pull request May 17, 2026
…#4212)

Document the four CAPACITY_MULTIPLIER sites and the two duplicate
hard-coded `2`s in delta_pipeline.rs, justify each against the recent
dispatch benches (#4203 channel overhead, #4204 reorder memory, #4206
dispatch decomposition, #4209 sp vs mp), and recommend keeping the
default at 2 with one follow-up bench specified to challenge it.
@oferchen oferchen merged commit 105cb92 into master May 17, 2026
40 checks passed
@oferchen oferchen deleted the bench/sp-vs-mp-workqueue-overhead-1572 branch May 17, 2026 19:13
oferchen added a commit that referenced this pull request May 17, 2026
…1405) (#4224)

Replaces the prior multi-root-focused note at the same path with a
focused #1405 design discussion for parallel generator fan-in.
Surveys the current SP / MP shape (feature-gated Clone at
work_queue/multi_producer.rs), the candidate use cases, the three
adjacent designs (Clone via #1569, Arc-wrap via #1610, explicit
producer-vector constructor), and the ordering, capacity, and bench
evidence that would have to land before flipping the default.

Recommends keeping WorkQueueSender Send + !Clone in default builds
and the multi-producer cargo feature opt-in. Documents the cost so
reviewers can reject naive MP refactors. Promotion to default-on is
gated on PR #4209 SP-vs-MP throughput parity, PR #4214 drain bench
showing no regression, and an actual fan-in caller materialising.
oferchen added a commit that referenced this pull request May 18, 2026
…#4212)

Document the four CAPACITY_MULTIPLIER sites and the two duplicate
hard-coded `2`s in delta_pipeline.rs, justify each against the recent
dispatch benches (#4203 channel overhead, #4204 reorder memory, #4206
dispatch decomposition, #4209 sp vs mp), and recommend keeping the
default at 2 with one follow-up bench specified to challenge it.
oferchen added a commit that referenced this pull request May 18, 2026
…1572) (#4209)

Adds `crates/engine/benches/sp_vs_mp_workqueue.rs` with two Criterion
groups that both move 100K items through the concurrent delta work
queue:

- `sp/1p/100k`: one producer thread pushes 100K pre-allocated DeltaWork
  items via the default-build `Send + !Clone` WorkQueueSender path.
- `mp/4p/100k`: four producer threads each push 25K pre-allocated items
  via the gated `Clone` impl on WorkQueueSender. Compiled only when
  `--features multi-producer` is set.

Both groups report `Throughput::Elements(100_000)` so items/sec figures
compare directly regardless of how the work is split. Inputs are
pre-allocated outside the timed section via `iter_batched`, matching
the discipline used by the parallel_dispatch_overhead bench.

The MP group is feature-gated behind the engine crate's existing
`multi-producer` feature - no Cargo.toml dependency change is required;
the gate already exists at `crates/engine/Cargo.toml:91`. The top-of-
file documentation cross-references the audit at
`docs/audits/workqueue-sender-multi-producer-audit.md` (PR #4173),
#4203 sync_channel bench, and #4206 parallel_dispatch_overhead bench,
and spells out the decision criteria (>=15% SP-vs-MP delta) that this
bench informs.
oferchen added a commit that referenced this pull request May 18, 2026
…1405) (#4224)

Replaces the prior multi-root-focused note at the same path with a
focused #1405 design discussion for parallel generator fan-in.
Surveys the current SP / MP shape (feature-gated Clone at
work_queue/multi_producer.rs), the candidate use cases, the three
adjacent designs (Clone via #1569, Arc-wrap via #1610, explicit
producer-vector constructor), and the ordering, capacity, and bench
evidence that would have to land before flipping the default.

Recommends keeping WorkQueueSender Send + !Clone in default builds
and the multi-producer cargo feature opt-in. Documents the cost so
reviewers can reject naive MP refactors. Promotion to default-on is
gated on PR #4209 SP-vs-MP throughput parity, PR #4214 drain bench
showing no regression, and an actual fan-in caller materialising.
oferchen added a commit that referenced this pull request May 18, 2026
…#4212)

Document the four CAPACITY_MULTIPLIER sites and the two duplicate
hard-coded `2`s in delta_pipeline.rs, justify each against the recent
dispatch benches (#4203 channel overhead, #4204 reorder memory, #4206
dispatch decomposition, #4209 sp vs mp), and recommend keeping the
default at 2 with one follow-up bench specified to challenge it.
oferchen added a commit that referenced this pull request May 18, 2026
…1572) (#4209)

Adds `crates/engine/benches/sp_vs_mp_workqueue.rs` with two Criterion
groups that both move 100K items through the concurrent delta work
queue:

- `sp/1p/100k`: one producer thread pushes 100K pre-allocated DeltaWork
  items via the default-build `Send + !Clone` WorkQueueSender path.
- `mp/4p/100k`: four producer threads each push 25K pre-allocated items
  via the gated `Clone` impl on WorkQueueSender. Compiled only when
  `--features multi-producer` is set.

Both groups report `Throughput::Elements(100_000)` so items/sec figures
compare directly regardless of how the work is split. Inputs are
pre-allocated outside the timed section via `iter_batched`, matching
the discipline used by the parallel_dispatch_overhead bench.

The MP group is feature-gated behind the engine crate's existing
`multi-producer` feature - no Cargo.toml dependency change is required;
the gate already exists at `crates/engine/Cargo.toml:91`. The top-of-
file documentation cross-references the audit at
`docs/audits/workqueue-sender-multi-producer-audit.md` (PR #4173),
#4203 sync_channel bench, and #4206 parallel_dispatch_overhead bench,
and spells out the decision criteria (>=15% SP-vs-MP delta) that this
bench informs.
oferchen added a commit that referenced this pull request May 18, 2026
…1405) (#4224)

Replaces the prior multi-root-focused note at the same path with a
focused #1405 design discussion for parallel generator fan-in.
Surveys the current SP / MP shape (feature-gated Clone at
work_queue/multi_producer.rs), the candidate use cases, the three
adjacent designs (Clone via #1569, Arc-wrap via #1610, explicit
producer-vector constructor), and the ordering, capacity, and bench
evidence that would have to land before flipping the default.

Recommends keeping WorkQueueSender Send + !Clone in default builds
and the multi-producer cargo feature opt-in. Documents the cost so
reviewers can reject naive MP refactors. Promotion to default-on is
gated on PR #4209 SP-vs-MP throughput parity, PR #4214 drain bench
showing no regression, and an actual fan-in caller materialising.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant