docs(design): sharded BufferPool layout for high thread counts#3645
Merged
Conversation
Design note for #1295 covering current two-level pool, contention hypothesis at 32+ threads, three sharding approaches, and the recommended shard-then-fallback layout. Activation gated at 16+ rayon workers; trait surface and wire protocol unchanged.
3 tasks
oferchen
added a commit
that referenced
this pull request
May 5, 2026
…3652) Cross-cutting follow-up to PRs #3645 (buffer pool sharding) and #3649 (daemon async accept). Both designs default to num_cpus * 2 which underutilizes I/O-bound workloads and oversubscribes CPU-bound ones. This note specifies a feedback-driven adaptive sizer with a PI-controller in the [60%, 85%] utilization band, hard bounds [max(2, n/2), n*4], 5 s grow / 30 s shrink cadence, and a daemon config knob "transfer-worker-threads = adaptive | <int>" plus an OC_RSYNC_ADAPTIVE_THREADS env-var disable for the first release. Zero wire-protocol impact.
oferchen
added a commit
that referenced
this pull request
May 18, 2026
Design note for #1295 covering current two-level pool, contention hypothesis at 32+ threads, three sharding approaches, and the recommended shard-then-fallback layout. Activation gated at 16+ rayon workers; trait surface and wire protocol unchanged.
oferchen
added a commit
that referenced
this pull request
May 18, 2026
…3652) Cross-cutting follow-up to PRs #3645 (buffer pool sharding) and #3649 (daemon async accept). Both designs default to num_cpus * 2 which underutilizes I/O-bound workloads and oversubscribes CPU-bound ones. This note specifies a feedback-driven adaptive sizer with a PI-controller in the [60%, 85%] utilization band, hard bounds [max(2, n/2), n*4], 5 s grow / 30 s shrink cadence, and a daemon config knob "transfer-worker-threads = adaptive | <int>" plus an OC_RSYNC_ADAPTIVE_THREADS env-var disable for the first release. Zero wire-protocol impact.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Design note for #1295 covering the current two-level
BufferPool, a contention hypothesis at 32+ threads, three sharding approaches, and the recommended shard-then-fallback layout activated at 16+ rayon workers.Key sections
crates/engine/src/local_copy/buffer_pool/line citations (pool.rs,thread_local_cache.rs,pressure.rs,allocator.rs).crossbeam_queue::ArrayQueuehead/tail cursors at high producer/consumer asymmetry.num_cpus * 2clamped 4..64, shard capacity small, global fallback retains the existing soft-cap admission protocol.rayon::current_num_threads() >= 16. Decision is frozen at construction.BufferAllocatorunchanged), telemetry additions, risks, and the follow-up benchmark workloads bound to Handle extended ASCII whitespace in remote shell parser #1297.Wire-compat
Zero impact. The buffer pool is internal to local-copy and delta-transfer paths; it does not appear in any wire format, capability string, or persisted state.
Test plan
crates/engine/src/local_copy/buffer_pool/source.num_cpus * 2shard count.