feat(engine): SpillableReorderBuffer wiring + parallel receive delta apply (#1884, #1368)#4319
Merged
Merged
Conversation
…apply Wires the existing SpillableReorderBuffer into DeltaConsumer behind a new opt-in ConcurrentDeltaConfig, and lands the parallel receive-side delta apply scaffold behind the parallel-receive-delta cargo feature. SpillableReorderBuffer wiring (#1884) - New ConcurrentDeltaConfig { spill_threshold_bytes, spill_dir } selects between the bare ReorderBuffer (default, behaviour unchanged) and the bounded-memory SpillableReorderBuffer when a threshold is supplied. - DeltaConsumer::spawn_with_config dispatches via a ReorderMode enum so spawn / spawn_bypass / spawn_with_config share one inner loop entry. - DeltaConsumerStats surfaces the cumulative spill_events counter via a lock-free AtomicU64 published by the reorder thread. - Spill backend construction or I/O failures map to DeltaResult::failed for the offending sequence so the receiver maps to upstream exit code 11 (FileIo) and aborts. Existing histogram/metrics machinery on the bare path is preserved verbatim. Parallel receive delta apply (#1368) - New parallel-receive-delta feature on engine (forwarded from transfer). Default off so production receivers continue to drive the sequential apply loop in receiver/transfer.rs. - engine::concurrent_delta::parallel_apply adds DeltaChunk and ParallelDeltaApplier. Per-file Mutex serialises destination writes, per-file ReorderBuffer replays chunks in submission order, and rayon::join / par_iter fans the verify step across the rayon pool while keeping per-file byte order exact. - ReceiverContext::enable_parallel_receive_delta installs the existing ParallelDeltaPipeline only when the feature is compiled in, leaving the default receiver loop untouched. Re-exports the union of ConcurrentDeltaConfig, DeltaConsumerStats, and (feature-gated) DeltaChunk / ParallelDeltaApplier from crates/engine/src/concurrent_delta/mod.rs alongside the existing HistogramStats, ReorderMetrics, and ReorderBuffer surface. Tests - spillable_consumer_preserves_order_under_pressure drives 1000 items through a 1 KiB budget with a deliberately delayed head-of-line item and asserts both in-order delivery and spill_events > 0. - spillable_consumer_matches_bare_output_byte_for_byte compares spill vs bare paths via SpillCodec encoding. - spawn_with_config_off_matches_spawn and stats_zero_when_spill_disabled pin the default-off invariants. - parallel_apply: in-order, shuffled, and batched byte-equality tests plus a proptest over random chunk sizes / deterministic permutations. Replaces the conflict-stalled PRs #4299 and #4300 with a single combined change on top of current master. Closes #1884 Closes #1368
5 tasks
oferchen
added a commit
that referenced
this pull request
May 17, 2026
Add criterion bench and design doc to inform the default-on decision for the parallel-receive-delta feature (#1368 followup, PR #4319 scaffold). The bench drives ParallelDeltaApplier vs a sequential baseline across three workload classes: small_files (10000 x 4 KiB, 50/50 delta/whole), mixed (1000 files, 4 KiB - 4 MiB, 50/50), and large_files (10 x 256 MiB, all delta). Both cells write to in-memory sinks so the comparison isolates apply-loop scheduling cost from disk I/O. The doc lays out five promotion criteria (small_files >= 10% wall-clock win at 4+ threads, zero wire-format divergence, no single-workload regression > 5%, one release cycle of opt-in soak, two consecutive nightly green runs) and three promotion paths (default-features flip, runtime auto-detect on file_count/total_size, per-workload CLI flag), with Path B as the recommended default unless the bench shows parallel wins universally.
oferchen
added a commit
that referenced
this pull request
May 18, 2026
…apply (#4319) Wires the existing SpillableReorderBuffer into DeltaConsumer behind a new opt-in ConcurrentDeltaConfig, and lands the parallel receive-side delta apply scaffold behind the parallel-receive-delta cargo feature. SpillableReorderBuffer wiring (#1884) - New ConcurrentDeltaConfig { spill_threshold_bytes, spill_dir } selects between the bare ReorderBuffer (default, behaviour unchanged) and the bounded-memory SpillableReorderBuffer when a threshold is supplied. - DeltaConsumer::spawn_with_config dispatches via a ReorderMode enum so spawn / spawn_bypass / spawn_with_config share one inner loop entry. - DeltaConsumerStats surfaces the cumulative spill_events counter via a lock-free AtomicU64 published by the reorder thread. - Spill backend construction or I/O failures map to DeltaResult::failed for the offending sequence so the receiver maps to upstream exit code 11 (FileIo) and aborts. Existing histogram/metrics machinery on the bare path is preserved verbatim. Parallel receive delta apply (#1368) - New parallel-receive-delta feature on engine (forwarded from transfer). Default off so production receivers continue to drive the sequential apply loop in receiver/transfer.rs. - engine::concurrent_delta::parallel_apply adds DeltaChunk and ParallelDeltaApplier. Per-file Mutex serialises destination writes, per-file ReorderBuffer replays chunks in submission order, and rayon::join / par_iter fans the verify step across the rayon pool while keeping per-file byte order exact. - ReceiverContext::enable_parallel_receive_delta installs the existing ParallelDeltaPipeline only when the feature is compiled in, leaving the default receiver loop untouched. Re-exports the union of ConcurrentDeltaConfig, DeltaConsumerStats, and (feature-gated) DeltaChunk / ParallelDeltaApplier from crates/engine/src/concurrent_delta/mod.rs alongside the existing HistogramStats, ReorderMetrics, and ReorderBuffer surface. Tests - spillable_consumer_preserves_order_under_pressure drives 1000 items through a 1 KiB budget with a deliberately delayed head-of-line item and asserts both in-order delivery and spill_events > 0. - spillable_consumer_matches_bare_output_byte_for_byte compares spill vs bare paths via SpillCodec encoding. - spawn_with_config_off_matches_spawn and stats_zero_when_spill_disabled pin the default-off invariants. - parallel_apply: in-order, shuffled, and batched byte-equality tests plus a proptest over random chunk sizes / deterministic permutations. Replaces the conflict-stalled PRs #4299 and #4300 with a single combined change on top of current master. Closes #1884 Closes #1368
oferchen
added a commit
that referenced
this pull request
May 18, 2026
Add criterion bench and design doc to inform the default-on decision for the parallel-receive-delta feature (#1368 followup, PR #4319 scaffold). The bench drives ParallelDeltaApplier vs a sequential baseline across three workload classes: small_files (10000 x 4 KiB, 50/50 delta/whole), mixed (1000 files, 4 KiB - 4 MiB, 50/50), and large_files (10 x 256 MiB, all delta). Both cells write to in-memory sinks so the comparison isolates apply-loop scheduling cost from disk I/O. The doc lays out five promotion criteria (small_files >= 10% wall-clock win at 4+ threads, zero wire-format divergence, no single-workload regression > 5%, one release cycle of opt-in soak, two consecutive nightly green runs) and three promotion paths (default-features flip, runtime auto-detect on file_count/total_size, per-workload CLI flag), with Path B as the recommended default unless the bench shows parallel wins universally.
oferchen
added a commit
that referenced
this pull request
May 18, 2026
…apply (#4319) Wires the existing SpillableReorderBuffer into DeltaConsumer behind a new opt-in ConcurrentDeltaConfig, and lands the parallel receive-side delta apply scaffold behind the parallel-receive-delta cargo feature. SpillableReorderBuffer wiring (#1884) - New ConcurrentDeltaConfig { spill_threshold_bytes, spill_dir } selects between the bare ReorderBuffer (default, behaviour unchanged) and the bounded-memory SpillableReorderBuffer when a threshold is supplied. - DeltaConsumer::spawn_with_config dispatches via a ReorderMode enum so spawn / spawn_bypass / spawn_with_config share one inner loop entry. - DeltaConsumerStats surfaces the cumulative spill_events counter via a lock-free AtomicU64 published by the reorder thread. - Spill backend construction or I/O failures map to DeltaResult::failed for the offending sequence so the receiver maps to upstream exit code 11 (FileIo) and aborts. Existing histogram/metrics machinery on the bare path is preserved verbatim. Parallel receive delta apply (#1368) - New parallel-receive-delta feature on engine (forwarded from transfer). Default off so production receivers continue to drive the sequential apply loop in receiver/transfer.rs. - engine::concurrent_delta::parallel_apply adds DeltaChunk and ParallelDeltaApplier. Per-file Mutex serialises destination writes, per-file ReorderBuffer replays chunks in submission order, and rayon::join / par_iter fans the verify step across the rayon pool while keeping per-file byte order exact. - ReceiverContext::enable_parallel_receive_delta installs the existing ParallelDeltaPipeline only when the feature is compiled in, leaving the default receiver loop untouched. Re-exports the union of ConcurrentDeltaConfig, DeltaConsumerStats, and (feature-gated) DeltaChunk / ParallelDeltaApplier from crates/engine/src/concurrent_delta/mod.rs alongside the existing HistogramStats, ReorderMetrics, and ReorderBuffer surface. Tests - spillable_consumer_preserves_order_under_pressure drives 1000 items through a 1 KiB budget with a deliberately delayed head-of-line item and asserts both in-order delivery and spill_events > 0. - spillable_consumer_matches_bare_output_byte_for_byte compares spill vs bare paths via SpillCodec encoding. - spawn_with_config_off_matches_spawn and stats_zero_when_spill_disabled pin the default-off invariants. - parallel_apply: in-order, shuffled, and batched byte-equality tests plus a proptest over random chunk sizes / deterministic permutations. Replaces the conflict-stalled PRs #4299 and #4300 with a single combined change on top of current master. Closes #1884 Closes #1368
oferchen
added a commit
that referenced
this pull request
May 18, 2026
Add criterion bench and design doc to inform the default-on decision for the parallel-receive-delta feature (#1368 followup, PR #4319 scaffold). The bench drives ParallelDeltaApplier vs a sequential baseline across three workload classes: small_files (10000 x 4 KiB, 50/50 delta/whole), mixed (1000 files, 4 KiB - 4 MiB, 50/50), and large_files (10 x 256 MiB, all delta). Both cells write to in-memory sinks so the comparison isolates apply-loop scheduling cost from disk I/O. The doc lays out five promotion criteria (small_files >= 10% wall-clock win at 4+ threads, zero wire-format divergence, no single-workload regression > 5%, one release cycle of opt-in soak, two consecutive nightly green runs) and three promotion paths (default-features flip, runtime auto-detect on file_count/total_size, per-workload CLI flag), with Path B as the recommended default unless the bench shows parallel wins universally.
Merged
5 tasks
oferchen
added a commit
that referenced
this pull request
May 21, 2026
…euristic (PIP-3 + PIP-5) (#4666) * perf(transfer): enable parallel receive-delta by default via Path B heuristic Wires the receiver-side parallel delta apply path into production per `docs/design/parallel-receive-delta-default-on.md` Path B, combining steps 4 and 5 in a single change. - Add `PARALLEL_RECEIVE_FILE_COUNT_THRESHOLD = 100` and `PARALLEL_RECEIVE_BYTES_THRESHOLD = 64 MiB` on the receiver. Thresholds match the existing rayon parallel-stat cutoff convention and the `copy_file_range` 64 MiB crossover. - Add `ReceiverContext::select_receiver_strategy(file_count, total_size)` (pure heuristic), `total_source_bytes()` (sums the in-memory file list), and `dispatch_receiver_strategy()` (logs the decision under the `GENR` debug channel and swaps the delta pipeline when the `parallel-receive-delta` feature is on). - Call `dispatch_receiver_strategy` at the top of `run_sync`, `run_pipelined`, and `run_pipelined_incremental`, immediately after `setup_transfer` returns the file count and the file list is in memory. - Surface the choice as `TransferStats::receiver_strategy_chosen` via the new `ReceiverStrategy { Sequential, Parallel }` enum (`Sequential` by default). - Promote `parallel-receive-delta` into the `default = [...]` set on `engine`, `transfer`, `core`, `cli`, and the workspace binary so the shipped `oc-rsync` picks up the dispatch with no opt-in flag. Stay compatible with `--no-default-features` builds: when the feature is compiled out the dispatcher logs `receiver_strategy=parallel_unavailable` and falls back to sequential so the telemetry counter never lies about the path actually taken. Tests: - Unit coverage for the heuristic boundary matrix (below/above each threshold, exact-boundary, empty transfer). - File-list-integration coverage for `total_source_bytes` and the full `dispatch_receiver_strategy` flow with populated file lists. Wire-format parity stays guarded by the existing proptests (#4300 + #4319). The soak + bench gates (criteria 1, 3, 4, 5 in section 5) are explicit risk acceptance; PIP-4 (interop suite re-run) and PIP-6 (bench backfill) remain as follow-ups. Refs #1368, #2566, #2568. * style(transfer): apply rustfmt to PIP-3+5 receiver-strategy code CI fmt+clippy diff: collapsed select_receiver_strategy multi-line signature and the FileEntry::new_file argument list in dispatch_large_bytes_picks_parallel onto single lines. Behaviour-neutral. * fix(transfer): drop identity-op multiplier in receiver-strategy tests clippy::identity-op fired on `1 * 1024 * 1024`. Collapsed to `1024 * 1024` in the two boundary-matrix tests.
2 tasks
oferchen
added a commit
that referenced
this pull request
May 21, 2026
…PIP-3+5 wire-up (#4677) Combined closure note covering three trackers whose deliverables have already shipped or were absorbed into existing design surface: - PIP-2 (#2565): the design doc was supposed to describe the token_loop -> ParallelDeltaApplier migration. That document already exists as docs/design/parallel-receive-delta-default-on.md (shipped in PR #4319 scaffold, extended by PR #4666 default-on flip). Marked completed retroactively. - FFB-3 (#2576): conditional task that only activated if FFB-1 chose Option A or B. FFB-1 (PR #4659) chose Option D - barrier baked into finish_file - so every existing caller already gets the barrier without callsite migration. Marked deferred N-A. - FFB-4 (#2577): explicit flush_workers call at the PIP-3 wire-up site. PIP-3 + PIP-5 (PR #4666 merged) routes through finish_file, which already barriers via Option D. No explicit call needed. Marked satisfied-by-design. Note: FFB-2 implementation (PR #4665, in CI) needed a spin-then-yield fix around finish_file's try_unwrap to close a Windows release-race; future work to restructure DecrementGuard would be FFB-5+ if filed. No source changes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
SpillableReorderBufferintoDeltaConsumerbehind a new opt-inConcurrentDeltaConfig { spill_threshold_bytes, spill_dir }and exposes spill activity viaDeltaConsumer::stats() -> DeltaConsumerStats { spill_events }. Default path (spawn,spawn_bypass,spawn_with_config(.., ConcurrentDeltaConfig::default())) is bit-for-bit identical to the prior behaviour; the existing histogram/metrics machinery on the bare path is preserved verbatim.parallel-receive-deltacargo feature on theenginecrate (forwarded fromtransfer).engine::concurrent_delta::parallel_applyintroducesDeltaChunkandParallelDeltaApplierwith per-fileMutexwrite serialization and per-fileReorderBufferchunk replay.ReceiverContext::enable_parallel_receive_deltainstalls the existingParallelDeltaPipelineonly when the feature is compiled in.crates/engine/src/concurrent_delta/mod.rsre-exports the union ofConcurrentDeltaConfig,DeltaConsumer,DeltaConsumerStats,HistogramStats,ReorderMetrics,ReorderBuffer, and (feature-gated)DeltaChunk,ParallelDeltaApplier.Replaces the conflict-stalled PRs #4299 and #4300 with a single combined change on top of current master. The wire protocol is unchanged - the spill layer is strictly receiver-local memory management; the parallel apply path is opt-in for benchmarking the rollout in
docs/design/parallel-receive-delta-application.md.Test plan
cargo check -p engine --all-features --testscleancargo check -p transfer --features parallel-receive-delta --testscleancargo fmt --all -- --checkcleanspillable_consumer_preserves_order_under_pressure- 1000 items through a 1 KiB budget, deliberately delayed head-of-line item, verifies in-order delivery andspill_events > 0spillable_consumer_matches_bare_output_byte_for_byte- compares spill vs bare paths viaSpillCodecencodingspawn_with_config_off_matches_spawn,stats_zero_when_spill_disabledpin the default-off invariantsConcurrentDeltaConfigcoversdefault,off,with_spill_threshold,with_spill_dir,spill_enabledparallel_apply::*- in-order, shuffled, batched byte-equality across two files, missing/double registration, finish-with-pending errorsrandom_chunk_sizes_and_permutations_match_sequential(48 cases, deterministic permutation per seed)concurrent_delta::consumer::*tests pass alongside the new spill suite