Skip to content

docs(design): daemon async accept loop with sync transfer workers#3649

Merged
oferchen merged 1 commit into
masterfrom
docs/daemon-async-accept-design-1674
May 5, 2026
Merged

docs(design): daemon async accept loop with sync transfer workers#3649
oferchen merged 1 commit into
masterfrom
docs/daemon-async-accept-design-1674

Conversation

@oferchen
Copy link
Copy Markdown
Owner

@oferchen oferchen commented May 5, 2026

Summary

Adds docs/design/daemon-async-accept-sync-workers.md (TODO #1674), a 624-line design note specifying a hybrid execution model for the rsync daemon at high concurrency: a tokio async accept loop hands off accepted connections to a fixed-size pool of synchronous worker threads via a bounded channel. Transfer state machine and wire format are unchanged; the design only replaces the accept-and-dispatch layer.

Key sections

  • Problem statement at 1k-10k concurrent connections (thread spawn cost, accept serialisation, stack VM pressure), citing crates/daemon/src/daemon/sections/server_runtime/accept_loop.rs and connection.rs.
  • Today's synchronous model: serve_connections, run_single_listener_loop, run_dual_stack_loop, spawn_connection_worker, with catch_unwind panic isolation.
  • Three pool-handoff strategies (bounded channel + dedicated pool, tokio::task::spawn_blocking, explicit-join pool) with reasoning.
  • Recommended approach: bounded sync channel + dedicated std::thread pool (Strategy A), reusing the channel abstraction documented in docs/design/async-channel-abstraction.md.
  • Pool sizing: num_cpus * 2 default, configurable via transfer-worker-threads directive.
  • Backpressure via bounded handoff plus kernel SYN backlog; operator guidance for listen_backlog and net.core.somaxconn.
  • Migration plan: feature-gated behind existing async feature in crates/daemon/Cargo.toml, opt-in via use-async-listener = true, default off until benchmarked.
  • Failure semantics: per-worker catch_unwind, drop stream, release accounting, loop. Pool is not poisoned.
  • Auth/TLS placement: TLS termination at the async layer (future, references stunnel-replacement Handle device copies as zero-length files #2052); auth stays in the sync worker.
  • Risks: tokio overhead on small daemons, pool starvation, accept-vs-worker imbalance.
  • Tracking items (not added to persistent TODO list): implementation, benchmarks at 100/1k/10k concurrency, panic-isolation test, somaxconn documentation, drop-counter metric, stuck-worker watchdog.

Wire-compat invariants confirmed

  • Zero impact on the wire protocol: handshake, version negotiation, auth, multiplex framing, file-list / delta / end-of-transfer all unchanged.
  • The transfer state machine (Handshaking -> Authenticating -> Listing | Transferring -> Completed | Failed in crates/daemon/src/daemon/session_registry.rs) runs in the same handle_session body, byte-for-byte the same code path.
  • No changes to crates/protocol, crates/engine, crates/checksums, crates/transfer, or crates/core are implied by this design.
  • Default daemon behaviour stays on the existing synchronous accept loop; the async path is opt-in and feature-gated.

Test plan

  • Reviewer reads doc; confirms file/line citations resolve.
  • Reviewer confirms the strategy choice (A) matches the async-channel abstraction in docs/design/async-channel-abstraction.md.
  • No code changes; CI runs documentation/lint workflows only.

Specifies a hybrid model where a tokio async accept loop hands off
accepted connections to a fixed-size pool of synchronous worker
threads via a bounded channel. Transfer state machine and wire format
are unchanged; the design only replaces the accept-and-dispatch layer.

Tracking #1674. References #1591 (async-channel abstraction),
#1934/#1935 (async-daemon feature gate), #2052 (TLS termination).
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label May 5, 2026
@oferchen oferchen merged commit 6c60e1b into master May 5, 2026
8 checks passed
@oferchen oferchen deleted the docs/daemon-async-accept-design-1674 branch May 5, 2026 09:59
oferchen added a commit that referenced this pull request May 5, 2026
…3652)

Cross-cutting follow-up to PRs #3645 (buffer pool sharding) and
#3649 (daemon async accept). Both designs default to num_cpus * 2
which underutilizes I/O-bound workloads and oversubscribes
CPU-bound ones. This note specifies a feedback-driven adaptive
sizer with a PI-controller in the [60%, 85%] utilization band,
hard bounds [max(2, n/2), n*4], 5 s grow / 30 s shrink cadence,
and a daemon config knob "transfer-worker-threads = adaptive |
<int>" plus an OC_RSYNC_ADAPTIVE_THREADS env-var disable for the
first release. Zero wire-protocol impact.
oferchen added a commit that referenced this pull request May 18, 2026
)

Specifies a hybrid model where a tokio async accept loop hands off
accepted connections to a fixed-size pool of synchronous worker
threads via a bounded channel. Transfer state machine and wire format
are unchanged; the design only replaces the accept-and-dispatch layer.

Tracking #1674. References #1591 (async-channel abstraction),
#1934/#1935 (async-daemon feature gate), #2052 (TLS termination).
oferchen added a commit that referenced this pull request May 18, 2026
…3652)

Cross-cutting follow-up to PRs #3645 (buffer pool sharding) and
#3649 (daemon async accept). Both designs default to num_cpus * 2
which underutilizes I/O-bound workloads and oversubscribes
CPU-bound ones. This note specifies a feedback-driven adaptive
sizer with a PI-controller in the [60%, 85%] utilization band,
hard bounds [max(2, n/2), n*4], 5 s grow / 30 s shrink cadence,
and a daemon config knob "transfer-worker-threads = adaptive |
<int>" plus an OC_RSYNC_ADAPTIVE_THREADS env-var disable for the
first release. Zero wire-protocol impact.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant