docs(audit): apply_batch_parallel verify-vs-write overlap potential (ABW-1) by oferchen · Pull Request #4670 · oferchen/rsync

oferchen · 2026-05-21T13:40:04Z

Summary

Pure research/audit (no source changes). Investigates whether
ParallelDeltaApplier::apply_batch_parallel should pipeline its parallel
verify phase with its serial write phase (the question raised in
project_apply_batch_write_serial.md).

Catalogues the current two-phase shape at
crates/engine/src/concurrent_delta/parallel_apply.rs:515-542: a
par_iter().collect() barrier between verify and a serial drain that
acquires the per-file Mutex<FileSlot> per chunk.
Quantifies wall-clock breakdown across balanced, CPU-bound-verify, and
I/O-bound-write regimes. Writes dominate in all three because verify
scales with K workers and writes do not.
Sketches the bounded-channel + writer-thread pipelined alternative,
identifies the data dependency (per-file chunk_sequence order, already
enforced by FileSlot::ingest + ReorderBuffer), and estimates the
speedup ceiling: ~1.13x balanced, ~1.5x verify-dominated, ~1.03x
write-dominated.

Recommendation

Skip ABW-2/3 unless BR-3i.f
(crates/engine/benches/parallel_verify_chunk.rs) and
parallel_receive_delta_perf show verify and write costs within 2x of
each other on a production-relevant workload. Otherwise the design
complexity (bounded channel, writer thread, error propagation rework, new
test cells) exceeds the gain. Per-file apply_batch_parallel has zero
production callers today (per RJN-1 / PR #4656), so no current path
regresses.

Test plan

Audit reviewed; recommendation actioned via ABW-2 design doc or by
closing the line of work with a note on
project_apply_batch_write_serial.md.

…ABW-1) Catalogues the current `apply_batch_parallel` two-phase shape (parallel verify + serial drain), quantifies the wall-clock breakdown across balanced/CPU-bound/I/O-bound regimes, sketches the bounded-channel pipelined alternative, and recommends gating ABW-2/3 on BR-3i.f bench evidence showing verify and write costs within 2x of each other. Findings: - Today's shape has zero production callers (per RJN-1 audit); promotion is the open question. - For single-file batches the per-file Mutex serialises every write; pipelining buys nothing. - For balanced workloads the expected speedup is ~1.13x; for verify- dominated workloads ~1.5x; for write-dominated workloads ~1.03x. - Only the middle regime justifies the design complexity (bounded channel, writer thread, error propagation rework). Recommendation: skip ABW-2/3 unless measurement places verify and write within 2x; otherwise close the line as investigated.

Discharges the ABW-1 audit's recommendation (PR #4670, section 4) to skip the pipelined verify/write design for apply_batch_parallel until bench evidence shows verify and write costs within 2x of each other on a production-relevant workload cell. - Recaps the ABW-1 quantified speedup table (1.13x balanced, 1.50x CPU-bound, 1.03x I/O-bound, ~0x single-file). - Documents why deferring the design (not just the implementation) is the right call: peak benefit is workload-dependent, complexity-to- payoff is poor in the measured cells, and the PIP-3+5 dispatch heuristic (PR #4666) already gates the degenerate single-file case out of parallel-receive-delta. - Names BR-3j.f (#2508) as the gating re-bench task and lifts the audit's decision gate verbatim. - Preserves the option: per-file Mutex is the real bottleneck; a future multi-threaded-per-file writer or a CPU-bound verify regime would re-open ABW-2.

…-2 rename (#4676) RJN-2 (PR #4660, merged) chose the rename path over the fanout-refactor path, discharging the RJN-1 audit (PR #4656, merged) with apply_chunk_parallel -> apply_one_chunk plus a rustdoc redirect to apply_batch_parallel. RJN-3 (implement fanout) and RJN-4 (bench scheduler shape) were the "if RJN-2 chose refactor" branch that did not get taken; this doc closes both. - RJN-3 stays closed: zero production callers of apply_one_chunk; the real multi-chunk win sits in apply_batch_parallel, where ABW-1 (PR #4670) already recommended deferring per its quantified 1.03x-1.50x speedup range; the ABW-2/3/4 closure doc defers that track pending BR-3j.f (#2508) bench data. - RJN-4 is N/A: with RJN-3 deferred there is no "after" cell to measure; the production-weighted scheduler-shape bench effort belongs in parallel_receive_delta_perf via BR-3j.f, not at the per-chunk entry point. - Re-open conditions: a production caller of apply_one_chunk ships AND profiling shows the per-chunk path is hot. Project memory references project_rayon_join_per_chunk_noop.md and project_apply_batch_write_serial.md - both observations remain accurate under the renamed function.

github-actions Bot added the documentation Improvements or additions to documentation label May 21, 2026

oferchen merged commit e9d9872 into master May 21, 2026
10 checks passed

oferchen deleted the docs/audit-abw-1-verify-write-overlap branch May 21, 2026 13:42

oferchen mentioned this pull request May 21, 2026

docs(design): defer ABW-2/3/4 pending BR-3j.f bench evidence (ABW-1 audit closure) #4673

Merged

2 tasks

oferchen mentioned this pull request May 21, 2026

docs(design): defer RJN-3 (fanout) and RJN-4 (bench) as N/A after RJN-2 rename #4676

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs(audit): apply_batch_parallel verify-vs-write overlap potential (ABW-1)#4670

docs(audit): apply_batch_parallel verify-vs-write overlap potential (ABW-1)#4670
oferchen merged 1 commit into
masterfrom
docs/audit-abw-1-verify-write-overlap

oferchen commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

oferchen commented May 21, 2026

Summary

Recommendation

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant