feat(engine): SpillableReorderBuffer wiring + parallel receive delta apply (#1884, #1368) by oferchen · Pull Request #4319 · oferchen/rsync

oferchen · 2026-05-17T21:45:51Z

Summary

Wires the existing SpillableReorderBuffer into DeltaConsumer behind a new opt-in ConcurrentDeltaConfig { spill_threshold_bytes, spill_dir } and exposes spill activity via DeltaConsumer::stats() -> DeltaConsumerStats { spill_events }. Default path (spawn, spawn_bypass, spawn_with_config(.., ConcurrentDeltaConfig::default())) is bit-for-bit identical to the prior behaviour; the existing histogram/metrics machinery on the bare path is preserved verbatim.
Lands the parallel receive-side delta apply scaffold behind a new parallel-receive-delta cargo feature on the engine crate (forwarded from transfer). engine::concurrent_delta::parallel_apply introduces DeltaChunk and ParallelDeltaApplier with per-file Mutex write serialization and per-file ReorderBuffer chunk replay. ReceiverContext::enable_parallel_receive_delta installs the existing ParallelDeltaPipeline only when the feature is compiled in.
crates/engine/src/concurrent_delta/mod.rs re-exports the union of ConcurrentDeltaConfig, DeltaConsumer, DeltaConsumerStats, HistogramStats, ReorderMetrics, ReorderBuffer, and (feature-gated) DeltaChunk, ParallelDeltaApplier.

Replaces the conflict-stalled PRs #4299 and #4300 with a single combined change on top of current master. The wire protocol is unchanged - the spill layer is strictly receiver-local memory management; the parallel apply path is opt-in for benchmarking the rollout in docs/design/parallel-receive-delta-application.md.

Test plan

…apply Wires the existing SpillableReorderBuffer into DeltaConsumer behind a new opt-in ConcurrentDeltaConfig, and lands the parallel receive-side delta apply scaffold behind the parallel-receive-delta cargo feature. SpillableReorderBuffer wiring (#1884) - New ConcurrentDeltaConfig { spill_threshold_bytes, spill_dir } selects between the bare ReorderBuffer (default, behaviour unchanged) and the bounded-memory SpillableReorderBuffer when a threshold is supplied. - DeltaConsumer::spawn_with_config dispatches via a ReorderMode enum so spawn / spawn_bypass / spawn_with_config share one inner loop entry. - DeltaConsumerStats surfaces the cumulative spill_events counter via a lock-free AtomicU64 published by the reorder thread. - Spill backend construction or I/O failures map to DeltaResult::failed for the offending sequence so the receiver maps to upstream exit code 11 (FileIo) and aborts. Existing histogram/metrics machinery on the bare path is preserved verbatim. Parallel receive delta apply (#1368) - New parallel-receive-delta feature on engine (forwarded from transfer). Default off so production receivers continue to drive the sequential apply loop in receiver/transfer.rs. - engine::concurrent_delta::parallel_apply adds DeltaChunk and ParallelDeltaApplier. Per-file Mutex serialises destination writes, per-file ReorderBuffer replays chunks in submission order, and rayon::join / par_iter fans the verify step across the rayon pool while keeping per-file byte order exact. - ReceiverContext::enable_parallel_receive_delta installs the existing ParallelDeltaPipeline only when the feature is compiled in, leaving the default receiver loop untouched. Re-exports the union of ConcurrentDeltaConfig, DeltaConsumerStats, and (feature-gated) DeltaChunk / ParallelDeltaApplier from crates/engine/src/concurrent_delta/mod.rs alongside the existing HistogramStats, ReorderMetrics, and ReorderBuffer surface. Tests - spillable_consumer_preserves_order_under_pressure drives 1000 items through a 1 KiB budget with a deliberately delayed head-of-line item and asserts both in-order delivery and spill_events > 0. - spillable_consumer_matches_bare_output_byte_for_byte compares spill vs bare paths via SpillCodec encoding. - spawn_with_config_off_matches_spawn and stats_zero_when_spill_disabled pin the default-off invariants. - parallel_apply: in-order, shuffled, and batched byte-equality tests plus a proptest over random chunk sizes / deterministic permutations. Replaces the conflict-stalled PRs #4299 and #4300 with a single combined change on top of current master. Closes #1884 Closes #1368

Add criterion bench and design doc to inform the default-on decision for the parallel-receive-delta feature (#1368 followup, PR #4319 scaffold). The bench drives ParallelDeltaApplier vs a sequential baseline across three workload classes: small_files (10000 x 4 KiB, 50/50 delta/whole), mixed (1000 files, 4 KiB - 4 MiB, 50/50), and large_files (10 x 256 MiB, all delta). Both cells write to in-memory sinks so the comparison isolates apply-loop scheduling cost from disk I/O. The doc lays out five promotion criteria (small_files >= 10% wall-clock win at 4+ threads, zero wire-format divergence, no single-workload regression > 5%, one release cycle of opt-in soak, two consecutive nightly green runs) and three promotion paths (default-features flip, runtime auto-detect on file_count/total_size, per-workload CLI flag), with Path B as the recommended default unless the bench shows parallel wins universally.

…apply (#4319) Wires the existing SpillableReorderBuffer into DeltaConsumer behind a new opt-in ConcurrentDeltaConfig, and lands the parallel receive-side delta apply scaffold behind the parallel-receive-delta cargo feature. SpillableReorderBuffer wiring (#1884) - New ConcurrentDeltaConfig { spill_threshold_bytes, spill_dir } selects between the bare ReorderBuffer (default, behaviour unchanged) and the bounded-memory SpillableReorderBuffer when a threshold is supplied. - DeltaConsumer::spawn_with_config dispatches via a ReorderMode enum so spawn / spawn_bypass / spawn_with_config share one inner loop entry. - DeltaConsumerStats surfaces the cumulative spill_events counter via a lock-free AtomicU64 published by the reorder thread. - Spill backend construction or I/O failures map to DeltaResult::failed for the offending sequence so the receiver maps to upstream exit code 11 (FileIo) and aborts. Existing histogram/metrics machinery on the bare path is preserved verbatim. Parallel receive delta apply (#1368) - New parallel-receive-delta feature on engine (forwarded from transfer). Default off so production receivers continue to drive the sequential apply loop in receiver/transfer.rs. - engine::concurrent_delta::parallel_apply adds DeltaChunk and ParallelDeltaApplier. Per-file Mutex serialises destination writes, per-file ReorderBuffer replays chunks in submission order, and rayon::join / par_iter fans the verify step across the rayon pool while keeping per-file byte order exact. - ReceiverContext::enable_parallel_receive_delta installs the existing ParallelDeltaPipeline only when the feature is compiled in, leaving the default receiver loop untouched. Re-exports the union of ConcurrentDeltaConfig, DeltaConsumerStats, and (feature-gated) DeltaChunk / ParallelDeltaApplier from crates/engine/src/concurrent_delta/mod.rs alongside the existing HistogramStats, ReorderMetrics, and ReorderBuffer surface. Tests - spillable_consumer_preserves_order_under_pressure drives 1000 items through a 1 KiB budget with a deliberately delayed head-of-line item and asserts both in-order delivery and spill_events > 0. - spillable_consumer_matches_bare_output_byte_for_byte compares spill vs bare paths via SpillCodec encoding. - spawn_with_config_off_matches_spawn and stats_zero_when_spill_disabled pin the default-off invariants. - parallel_apply: in-order, shuffled, and batched byte-equality tests plus a proptest over random chunk sizes / deterministic permutations. Replaces the conflict-stalled PRs #4299 and #4300 with a single combined change on top of current master. Closes #1884 Closes #1368

Add criterion bench and design doc to inform the default-on decision for the parallel-receive-delta feature (#1368 followup, PR #4319 scaffold). The bench drives ParallelDeltaApplier vs a sequential baseline across three workload classes: small_files (10000 x 4 KiB, 50/50 delta/whole), mixed (1000 files, 4 KiB - 4 MiB, 50/50), and large_files (10 x 256 MiB, all delta). Both cells write to in-memory sinks so the comparison isolates apply-loop scheduling cost from disk I/O. The doc lays out five promotion criteria (small_files >= 10% wall-clock win at 4+ threads, zero wire-format divergence, no single-workload regression > 5%, one release cycle of opt-in soak, two consecutive nightly green runs) and three promotion paths (default-features flip, runtime auto-detect on file_count/total_size, per-workload CLI flag), with Path B as the recommended default unless the bench shows parallel wins universally.

…apply (#4319) Wires the existing SpillableReorderBuffer into DeltaConsumer behind a new opt-in ConcurrentDeltaConfig, and lands the parallel receive-side delta apply scaffold behind the parallel-receive-delta cargo feature. SpillableReorderBuffer wiring (#1884) - New ConcurrentDeltaConfig { spill_threshold_bytes, spill_dir } selects between the bare ReorderBuffer (default, behaviour unchanged) and the bounded-memory SpillableReorderBuffer when a threshold is supplied. - DeltaConsumer::spawn_with_config dispatches via a ReorderMode enum so spawn / spawn_bypass / spawn_with_config share one inner loop entry. - DeltaConsumerStats surfaces the cumulative spill_events counter via a lock-free AtomicU64 published by the reorder thread. - Spill backend construction or I/O failures map to DeltaResult::failed for the offending sequence so the receiver maps to upstream exit code 11 (FileIo) and aborts. Existing histogram/metrics machinery on the bare path is preserved verbatim. Parallel receive delta apply (#1368) - New parallel-receive-delta feature on engine (forwarded from transfer). Default off so production receivers continue to drive the sequential apply loop in receiver/transfer.rs. - engine::concurrent_delta::parallel_apply adds DeltaChunk and ParallelDeltaApplier. Per-file Mutex serialises destination writes, per-file ReorderBuffer replays chunks in submission order, and rayon::join / par_iter fans the verify step across the rayon pool while keeping per-file byte order exact. - ReceiverContext::enable_parallel_receive_delta installs the existing ParallelDeltaPipeline only when the feature is compiled in, leaving the default receiver loop untouched. Re-exports the union of ConcurrentDeltaConfig, DeltaConsumerStats, and (feature-gated) DeltaChunk / ParallelDeltaApplier from crates/engine/src/concurrent_delta/mod.rs alongside the existing HistogramStats, ReorderMetrics, and ReorderBuffer surface. Tests - spillable_consumer_preserves_order_under_pressure drives 1000 items through a 1 KiB budget with a deliberately delayed head-of-line item and asserts both in-order delivery and spill_events > 0. - spillable_consumer_matches_bare_output_byte_for_byte compares spill vs bare paths via SpillCodec encoding. - spawn_with_config_off_matches_spawn and stats_zero_when_spill_disabled pin the default-off invariants. - parallel_apply: in-order, shuffled, and batched byte-equality tests plus a proptest over random chunk sizes / deterministic permutations. Replaces the conflict-stalled PRs #4299 and #4300 with a single combined change on top of current master. Closes #1884 Closes #1368

Add criterion bench and design doc to inform the default-on decision for the parallel-receive-delta feature (#1368 followup, PR #4319 scaffold). The bench drives ParallelDeltaApplier vs a sequential baseline across three workload classes: small_files (10000 x 4 KiB, 50/50 delta/whole), mixed (1000 files, 4 KiB - 4 MiB, 50/50), and large_files (10 x 256 MiB, all delta). Both cells write to in-memory sinks so the comparison isolates apply-loop scheduling cost from disk I/O. The doc lays out five promotion criteria (small_files >= 10% wall-clock win at 4+ threads, zero wire-format divergence, no single-workload regression > 5%, one release cycle of opt-in soak, two consecutive nightly green runs) and three promotion paths (default-features flip, runtime auto-detect on file_count/total_size, per-workload CLI flag), with Path B as the recommended default unless the bench shows parallel wins universally.

…euristic (PIP-3 + PIP-5) (#4666) * perf(transfer): enable parallel receive-delta by default via Path B heuristic Wires the receiver-side parallel delta apply path into production per `docs/design/parallel-receive-delta-default-on.md` Path B, combining steps 4 and 5 in a single change. - Add `PARALLEL_RECEIVE_FILE_COUNT_THRESHOLD = 100` and `PARALLEL_RECEIVE_BYTES_THRESHOLD = 64 MiB` on the receiver. Thresholds match the existing rayon parallel-stat cutoff convention and the `copy_file_range` 64 MiB crossover. - Add `ReceiverContext::select_receiver_strategy(file_count, total_size)` (pure heuristic), `total_source_bytes()` (sums the in-memory file list), and `dispatch_receiver_strategy()` (logs the decision under the `GENR` debug channel and swaps the delta pipeline when the `parallel-receive-delta` feature is on). - Call `dispatch_receiver_strategy` at the top of `run_sync`, `run_pipelined`, and `run_pipelined_incremental`, immediately after `setup_transfer` returns the file count and the file list is in memory. - Surface the choice as `TransferStats::receiver_strategy_chosen` via the new `ReceiverStrategy { Sequential, Parallel }` enum (`Sequential` by default). - Promote `parallel-receive-delta` into the `default = [...]` set on `engine`, `transfer`, `core`, `cli`, and the workspace binary so the shipped `oc-rsync` picks up the dispatch with no opt-in flag. Stay compatible with `--no-default-features` builds: when the feature is compiled out the dispatcher logs `receiver_strategy=parallel_unavailable` and falls back to sequential so the telemetry counter never lies about the path actually taken. Tests: - Unit coverage for the heuristic boundary matrix (below/above each threshold, exact-boundary, empty transfer). - File-list-integration coverage for `total_source_bytes` and the full `dispatch_receiver_strategy` flow with populated file lists. Wire-format parity stays guarded by the existing proptests (#4300 + #4319). The soak + bench gates (criteria 1, 3, 4, 5 in section 5) are explicit risk acceptance; PIP-4 (interop suite re-run) and PIP-6 (bench backfill) remain as follow-ups. Refs #1368, #2566, #2568. * style(transfer): apply rustfmt to PIP-3+5 receiver-strategy code CI fmt+clippy diff: collapsed select_receiver_strategy multi-line signature and the FileEntry::new_file argument list in dispatch_large_bytes_picks_parallel onto single lines. Behaviour-neutral. * fix(transfer): drop identity-op multiplier in receiver-strategy tests clippy::identity-op fired on `1 * 1024 * 1024`. Collapsed to `1024 * 1024` in the two boundary-matrix tests.

…PIP-3+5 wire-up (#4677) Combined closure note covering three trackers whose deliverables have already shipped or were absorbed into existing design surface: - PIP-2 (#2565): the design doc was supposed to describe the token_loop -> ParallelDeltaApplier migration. That document already exists as docs/design/parallel-receive-delta-default-on.md (shipped in PR #4319 scaffold, extended by PR #4666 default-on flip). Marked completed retroactively. - FFB-3 (#2576): conditional task that only activated if FFB-1 chose Option A or B. FFB-1 (PR #4659) chose Option D - barrier baked into finish_file - so every existing caller already gets the barrier without callsite migration. Marked deferred N-A. - FFB-4 (#2577): explicit flush_workers call at the PIP-3 wire-up site. PIP-3 + PIP-5 (PR #4666 merged) routes through finish_file, which already barriers via Option D. No explicit call needed. Marked satisfied-by-design. Note: FFB-2 implementation (PR #4665, in CI) needed a spin-then-yield fix around finish_file's try_unwrap to close a Windows release-race; future work to restructure DecrementGuard would be FFB-5+ if filed. No source changes.

oferchen merged commit 82fd842 into master May 17, 2026
4 of 15 checks passed

oferchen mentioned this pull request May 17, 2026

bench(engine): parallel-receive-delta perf for default-on decision (#1368 followup) #4330

Merged

5 tasks

github-actions Bot added the enhancement New feature or request label May 17, 2026

oferchen deleted the feat/spill-and-parallel-recv-delta-redo branch May 19, 2026 19:29

oferchen mentioned this pull request May 21, 2026

perf(transfer): enable parallel receive-delta by default via Path B heuristic (PIP-3 + PIP-5) #4666

Merged

5 tasks

oferchen mentioned this pull request May 21, 2026

docs(design): close FFB-3/FFB-4/PIP-2 as satisfied by FFB-1 design + PIP-3+5 wire-up #4677

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(engine): SpillableReorderBuffer wiring + parallel receive delta apply (#1884, #1368)#4319

feat(engine): SpillableReorderBuffer wiring + parallel receive delta apply (#1884, #1368)#4319
oferchen merged 1 commit into
masterfrom
feat/spill-and-parallel-recv-delta-redo

oferchen commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

oferchen commented May 17, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant