docs(audits): WorkQueueSender multi-producer usage audit (#1383) by oferchen · Pull Request #4173 · oferchen/rsync

oferchen · 2026-05-16T19:55:07Z

Summary

Promotes the in-source audit at crates/engine/src/concurrent_delta/multi_producer_audit.rs into a workspace-level audit at docs/audits/workqueue-sender-multi-producer-audit.md with explicit file:line citations for every production producer site and a complete inventory of test-only sites.
Confirms that all 3 production producer sites (two layers of ParallelDeltaPipeline / ThresholdDeltaPipeline plus the sender-drop shutdown signal) correctly use single-producer ownership; zero sites require multi-producer; zero use Arc/Mutex wrappers that would qualify as pseudo-multi-producer.
Top-1 recommendation: keep WorkQueueSender Send + !Clone by default, keep the multi-producer cargo feature gated, and do not introduce an Arc<WorkQueueSender> primitive (Fix Windows device identifier metadata usage #1610 / Fix Windows cross-compilation by gating unix-only user lookups #1613). The crossbeam_channel::Sender is already internally an Arc; doubling the refcount layer adds nothing.
Feeds tasks Align brand fallback with workspace metadata #1405 (design multi-producer WorkQueue for parallel generator fan-in) and Fix enforce-limits override for transport session tests module #1569 / Fix Windows device identifier metadata usage #1610 (Arc-wrapped sender design). Cross-references the existing docs/audits/workqueue-sp-vs-mp-overhead.md (Expand documentation branding validation coverage #1572) benchmark plan and docs/design/arc-workqueue-sender-eval.md (Clarify --bwlimit burst syntax in help output #1383 evaluation note).

Test plan

CI fmt+clippy
CI nextest (stable)
CI Windows, macOS, Linux musl
Pure docs change; no source files modified

Promote the in-source audit at multi_producer_audit.rs into a workspace-level audit with explicit file:line citations for every production producer site and a complete inventory of test-only sites. Findings: all 3 production producer sites correctly use single-producer ownership; zero sites require or pseudo-require multi-producer. Keep WorkQueueSender Send+!Clone by default, keep the multi-producer feature gated, and do not introduce an Arc<WorkQueueSender> primitive.

… WorkQueue (#1573) (#4207) Engages with the #4173 audit conclusion that WorkQueueSender stays single-producer. Shows I1 (#2196) is the wrong instrument for the first-byte hypothesis - enumeration runs before send_file_list entry, so I1 excludes it by construction. Recommends defer pending a W1 benchmark (process start to first inbound flist byte) on multi-root cold-cache workloads.

Design note for the lock-free MPSC variant of WorkQueueReceiver::drain_parallel. Sketches the crossbeam_channel swap, documents the ordering contract delegated to ReorderBuffer, defines the 20% threshold on the #4214 drain_parallel_alternatives bench that gates the migration, and lays out a feature-flag rollout plan. Recommendation is to defer the implementation until the #4214 numbers land on the reference host. Cross-refs #4170, #4173, #4203, #4214.

…1572) (#4209) Adds `crates/engine/benches/sp_vs_mp_workqueue.rs` with two Criterion groups that both move 100K items through the concurrent delta work queue: - `sp/1p/100k`: one producer thread pushes 100K pre-allocated DeltaWork items via the default-build `Send + !Clone` WorkQueueSender path. - `mp/4p/100k`: four producer threads each push 25K pre-allocated items via the gated `Clone` impl on WorkQueueSender. Compiled only when `--features multi-producer` is set. Both groups report `Throughput::Elements(100_000)` so items/sec figures compare directly regardless of how the work is split. Inputs are pre-allocated outside the timed section via `iter_batched`, matching the discipline used by the parallel_dispatch_overhead bench. The MP group is feature-gated behind the engine crate's existing `multi-producer` feature - no Cargo.toml dependency change is required; the gate already exists at `crates/engine/Cargo.toml:91`. The top-of- file documentation cross-references the audit at `docs/audits/workqueue-sender-multi-producer-audit.md` (PR #4173), #4203 sync_channel bench, and #4206 parallel_dispatch_overhead bench, and spells out the decision criteria (>=15% SP-vs-MP delta) that this bench informs.

Adds drain_parallel_alternatives benchmark comparing three fan-in strategies on the WorkQueueReceiver drain path: the current sharded Mutex<Vec> indexed by rayon thread index, a per-thread Vec with final concat (no mutex), and a crossbeam_channel MPSC drain. Runs at 10K and 100K items across 4, 8, and 16 rayon workers, reporting throughput in elements per second so reviewers can compare per-iteration cost directly. Pre-allocates DeltaWork items outside the timed section, isolates each worker count in a private rayon pool, and shares the same simulated per-item compute across strategies so the only delta between groups is the collector itself. Closes the measurement gap that #1681 needs to decide whether the current sharded Mutex<Vec> warrants replacement (#1682, refs #4170 / #4173 / #4203).