Skip to content

docs(design): adaptive thread-pool sizing replaces static num_cpus*2#3652

Merged
oferchen merged 1 commit into
masterfrom
docs/adaptive-thread-pool-sizing
May 5, 2026
Merged

docs(design): adaptive thread-pool sizing replaces static num_cpus*2#3652
oferchen merged 1 commit into
masterfrom
docs/adaptive-thread-pool-sizing

Conversation

@oferchen
Copy link
Copy Markdown
Owner

@oferchen oferchen commented May 5, 2026

Summary

Cross-cutting design ADR that replaces the static num_cpus * 2 defaults in two recently-landed designs with feedback-driven adaptive sizing.

Amends:

The static default underutilizes idle cores on I/O-bound workloads (long-haul SSH, slow disk) and oversubscribes on CPU-bound workloads (compression, checksum). This note specifies a single adaptive sizer both sites can opt into, joining the existing adaptive family alongside the BufferPool capacity resizer (#1640 / #1641) and adaptive queue depth (#1735).

Key sections

  • Telemetry signals: pool utilization, queue depth/stall, per-worker idle EWMA, CPU vs I/O classification - all reused from existing adaptive subsystems with no new hot-path counters.
  • Two sizing modes considered: PI-controller in [60%, 85%] band (recommended) vs AIMD (rejected as too jittery for long-running transfers).
  • Hard bounds: [max(2, num_cpus / 2), num_cpus * 4].
  • Asymmetric cadence: 5 s grow, 30 s shrink. Convergence guard for thrash detection.
  • Override path: OC_RSYNC_ADAPTIVE_THREADS=0 env-var disable; transfer-worker-threads = adaptive | <fixed> daemon config knob; BufferPool::with_capacity(Adaptive | Fixed) API.
  • Failure semantics: sizer runs on a dedicated thread under catch_unwind; on panic, pool stays at last-known-good size, no respawn loop.
  • Telemetry: structured debug_log! line at -vv per sizing decision (mirrors Update oc help banner to use branded daemon wording #1369 SPSC contention metrics pattern).
  • Migration plan: 5 phases with default-on + env-var disable; remove disable after one release once telemetry confirms stability.
  • Risks: oscillation, memory bloat, sizer overhead at low concurrency, sizer-vs-sizer interaction, integration churn - each with explicit mitigation.
  • Rayon thread pool sizing explicitly NOT adopted (rayon's own scheduler conflicts).

Wire-compat invariants

Zero impact. Pool sizing is process-internal. The sizer does not change the rsync wire protocol, multiplex framing, daemon greeting, auth handshake, file-list / delta / end-of-transfer envelopes, or any frame's byte count, byte order, or padding. Same invariant the two parent designs already establish.

Test plan

  • No code changes - design doc only, so the only verification is documentation review.
  • CI fmt + lint on the markdown file (no Rust touched).
  • Reviewer cross-check of cited file:line references against current crates/engine/src/local_copy/buffer_pool/ and crates/daemon/src/daemon/.

Cross-cutting follow-up to PRs #3645 (buffer pool sharding) and
#3649 (daemon async accept). Both designs default to num_cpus * 2
which underutilizes I/O-bound workloads and oversubscribes
CPU-bound ones. This note specifies a feedback-driven adaptive
sizer with a PI-controller in the [60%, 85%] utilization band,
hard bounds [max(2, n/2), n*4], 5 s grow / 30 s shrink cadence,
and a daemon config knob "transfer-worker-threads = adaptive |
<int>" plus an OC_RSYNC_ADAPTIVE_THREADS env-var disable for the
first release. Zero wire-protocol impact.
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label May 5, 2026
@oferchen oferchen merged commit 52477b2 into master May 5, 2026
8 checks passed
@oferchen oferchen deleted the docs/adaptive-thread-pool-sizing branch May 5, 2026 09:59
oferchen added a commit that referenced this pull request May 18, 2026
…3652)

Cross-cutting follow-up to PRs #3645 (buffer pool sharding) and
#3649 (daemon async accept). Both designs default to num_cpus * 2
which underutilizes I/O-bound workloads and oversubscribes
CPU-bound ones. This note specifies a feedback-driven adaptive
sizer with a PI-controller in the [60%, 85%] utilization band,
hard bounds [max(2, n/2), n*4], 5 s grow / 30 s shrink cadence,
and a daemon config knob "transfer-worker-threads = adaptive |
<int>" plus an OC_RSYNC_ADAPTIVE_THREADS env-var disable for the
first release. Zero wire-protocol impact.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant