Add brand parsing helpers for environment overrides#1370
Merged
Conversation
3 tasks
oferchen
added a commit
that referenced
this pull request
May 7, 2026
Propose promoting the single-slot thread_local cache (#1329) into a fixed-depth per-thread cache to keep most acquire/return cycles off the shared ArrayQueue, with overflow falling through to the existing pool. Documents the idle-memory trade-off and cites the sharded-queue alternative in #1271 as fallback.
3 tasks
oferchen
added a commit
that referenced
this pull request
May 16, 2026
Add criterion microbench under crates/transfer/benches profiling four result-collection strategies (shared Arc<Mutex<Vec>>, sharded Mutex<Vec> by rayon worker id, crossbeam SegQueue, crossbeam unbounded channel) over 100K items at 1/4/8/16 rayon worker counts. Throughput is reported in elements/sec so the bench output names the worker count at which the single shared mutex saturates. The shipping parallel-stat path in crates/transfer/src/parallel_io.rs does not use Arc<Mutex<Vec>>; it collects via into_par_iter().map(f).collect(), which delegates to rayon's lock-free reducer. Document this in docs/audits/parallel-stat-collection.md and keep the microbench as the baseline against which any future PR that proposes reintroducing a shared mutex on this path must be measured (tracked under #1192, #1271, #1297, #1370, #1682).
oferchen
added a commit
that referenced
this pull request
May 16, 2026
… (#4170) * chore(bench): add parallel-stat collector contention microbench Add criterion microbench under crates/transfer/benches profiling four result-collection strategies (shared Arc<Mutex<Vec>>, sharded Mutex<Vec> by rayon worker id, crossbeam SegQueue, crossbeam unbounded channel) over 100K items at 1/4/8/16 rayon worker counts. Throughput is reported in elements/sec so the bench output names the worker count at which the single shared mutex saturates. The shipping parallel-stat path in crates/transfer/src/parallel_io.rs does not use Arc<Mutex<Vec>>; it collects via into_par_iter().map(f).collect(), which delegates to rayon's lock-free reducer. Document this in docs/audits/parallel-stat-collection.md and keep the microbench as the baseline against which any future PR that proposes reintroducing a shared mutex on this path must be measured (tracked under #1192, #1271, #1297, #1370, #1682). * chore: sync Cargo.lock after dev-dep addition
4 tasks
oferchen
added a commit
that referenced
this pull request
May 17, 2026
Adds a Criterion bench in crates/transfer/benches that compares three
rayon dispatch shapes - `into_par_iter` over an owned Vec, `par_bridge`
over a Vec iterator, and `par_bridge` over a pure generator - across
{1, 4, 8, 16} workers on a fixed 100K small-file workload. Throughput
is reported in items/sec via `Throughput::Elements`. The numbers feed
the dispatch-shape decisions tracked in #1284, #1370, and #1681.
3 tasks
oferchen
added a commit
that referenced
this pull request
May 17, 2026
…ems (#4203) Add a criterion bench that compares per-item send+recv cost across `std::sync::mpsc::channel`, `crossbeam_channel::unbounded`, and `crossbeam_channel::bounded(1024)` at 100K items. Sweeps payload sizes {32 B, 256 B, 4 KB} and thread shapes {1S+1R, 4S+1R, 1S+4R, 4S+4R} to expose the contention regimes that appear in the engine and receiver pipelines. Payloads are pre-allocated outside the timed section so the measurement isolates channel cost from allocation. Throughput is reported via `Throughput::Elements(100_000)` so the criterion summary gives items/sec directly. Pure userspace synchronisation - no I/O, no platform-specific paths, runs identically on Linux, macOS, and Windows. Refs #1592 #1681 #1370
This was referenced May 17, 2026
oferchen
added a commit
that referenced
this pull request
May 17, 2026
Consolidates issues #1271 and #1370 into a single design proposal for promoting the engine BufferPool's thread-local cache from a single slot in front of a shared ArrayQueue to a per-thread slab as the primary storage, with the existing ArrayQueue demoted to a bounded global overflow / balancer. Covers: contrast with sharded-mutex (#1295) and per-thread-cache (#1370) alternatives, LIFO slab structure with cross-thread return handling, bounded memory accounting layered on top of #2245's byte budget, preserved PooledBuffer Drop and BufferAllocator surface, failure modes (thread teardown, panicking thread, long-lived buffers), comparison table, trigger conditions for adoption, and a five-step implementation sequence. Recommendation is to defer implementation until profiling at 32+ sustained threads proves the existing two-level layout actually contends.
oferchen
added a commit
that referenced
this pull request
May 17, 2026
Adds an opt-in per-thread LIFO slab in front of BufferPool, gated behind the new `thread-slab-pool` feature (default off). The slab keeps a depth-bounded `Vec<Vec<u8>>` per thread with explicit slot and byte caps (default 8 slots, 1 MiB), and falls through to the existing lock-free central queue for cross-thread returns, overflow, and periodic donation of long-lived buffers. The public surface (BufferPool, BufferGuard, BorrowedBufferGuard Drop) is unchanged - callers always go through `pool.return_buffer` and the guard Drop signatures stay byte-for-byte identical. Switching between the single-slot and slab backends happens at compile time inside `thread_local_cache.rs`, so `pool.rs` reads the same on both paths. Implements the design at docs/design/per-thread-buffer-slab.md merged via #4230. Closes part of #1271 and #1370.
oferchen
added a commit
that referenced
this pull request
May 18, 2026
Propose promoting the single-slot thread_local cache (#1329) into a fixed-depth per-thread cache to keep most acquire/return cycles off the shared ArrayQueue, with overflow falling through to the existing pool. Documents the idle-memory trade-off and cites the sharded-queue alternative in #1271 as fallback.
oferchen
added a commit
that referenced
this pull request
May 18, 2026
… (#4170) * chore(bench): add parallel-stat collector contention microbench Add criterion microbench under crates/transfer/benches profiling four result-collection strategies (shared Arc<Mutex<Vec>>, sharded Mutex<Vec> by rayon worker id, crossbeam SegQueue, crossbeam unbounded channel) over 100K items at 1/4/8/16 rayon worker counts. Throughput is reported in elements/sec so the bench output names the worker count at which the single shared mutex saturates. The shipping parallel-stat path in crates/transfer/src/parallel_io.rs does not use Arc<Mutex<Vec>>; it collects via into_par_iter().map(f).collect(), which delegates to rayon's lock-free reducer. Document this in docs/audits/parallel-stat-collection.md and keep the microbench as the baseline against which any future PR that proposes reintroducing a shared mutex on this path must be measured (tracked under #1192, #1271, #1297, #1370, #1682). * chore: sync Cargo.lock after dev-dep addition
oferchen
added a commit
that referenced
this pull request
May 18, 2026
Adds a Criterion bench in crates/transfer/benches that compares three
rayon dispatch shapes - `into_par_iter` over an owned Vec, `par_bridge`
over a Vec iterator, and `par_bridge` over a pure generator - across
{1, 4, 8, 16} workers on a fixed 100K small-file workload. Throughput
is reported in items/sec via `Throughput::Elements`. The numbers feed
the dispatch-shape decisions tracked in #1284, #1370, and #1681.
oferchen
added a commit
that referenced
this pull request
May 18, 2026
…ems (#4203) Add a criterion bench that compares per-item send+recv cost across `std::sync::mpsc::channel`, `crossbeam_channel::unbounded`, and `crossbeam_channel::bounded(1024)` at 100K items. Sweeps payload sizes {32 B, 256 B, 4 KB} and thread shapes {1S+1R, 4S+1R, 1S+4R, 4S+4R} to expose the contention regimes that appear in the engine and receiver pipelines. Payloads are pre-allocated outside the timed section so the measurement isolates channel cost from allocation. Throughput is reported via `Throughput::Elements(100_000)` so the criterion summary gives items/sec directly. Pure userspace synchronisation - no I/O, no platform-specific paths, runs identically on Linux, macOS, and Windows. Refs #1592 #1681 #1370
oferchen
added a commit
that referenced
this pull request
May 18, 2026
Consolidates issues #1271 and #1370 into a single design proposal for promoting the engine BufferPool's thread-local cache from a single slot in front of a shared ArrayQueue to a per-thread slab as the primary storage, with the existing ArrayQueue demoted to a bounded global overflow / balancer. Covers: contrast with sharded-mutex (#1295) and per-thread-cache (#1370) alternatives, LIFO slab structure with cross-thread return handling, bounded memory accounting layered on top of #2245's byte budget, preserved PooledBuffer Drop and BufferAllocator surface, failure modes (thread teardown, panicking thread, long-lived buffers), comparison table, trigger conditions for adoption, and a five-step implementation sequence. Recommendation is to defer implementation until profiling at 32+ sustained threads proves the existing two-level layout actually contends.
oferchen
added a commit
that referenced
this pull request
May 18, 2026
Adds an opt-in per-thread LIFO slab in front of BufferPool, gated behind the new `thread-slab-pool` feature (default off). The slab keeps a depth-bounded `Vec<Vec<u8>>` per thread with explicit slot and byte caps (default 8 slots, 1 MiB), and falls through to the existing lock-free central queue for cross-thread returns, overflow, and periodic donation of long-lived buffers. The public surface (BufferPool, BufferGuard, BorrowedBufferGuard Drop) is unchanged - callers always go through `pool.return_buffer` and the guard Drop signatures stay byte-for-byte identical. Switching between the single-slot and slab backends happens at compile time inside `thread_local_cache.rs`, so `pool.rs` reads the same on both paths. Implements the design at docs/design/per-thread-buffer-slab.md merged via #4230. Closes part of #1271 and #1370.
oferchen
added a commit
that referenced
this pull request
May 18, 2026
… (#4170) * chore(bench): add parallel-stat collector contention microbench Add criterion microbench under crates/transfer/benches profiling four result-collection strategies (shared Arc<Mutex<Vec>>, sharded Mutex<Vec> by rayon worker id, crossbeam SegQueue, crossbeam unbounded channel) over 100K items at 1/4/8/16 rayon worker counts. Throughput is reported in elements/sec so the bench output names the worker count at which the single shared mutex saturates. The shipping parallel-stat path in crates/transfer/src/parallel_io.rs does not use Arc<Mutex<Vec>>; it collects via into_par_iter().map(f).collect(), which delegates to rayon's lock-free reducer. Document this in docs/audits/parallel-stat-collection.md and keep the microbench as the baseline against which any future PR that proposes reintroducing a shared mutex on this path must be measured (tracked under #1192, #1271, #1297, #1370, #1682). * chore: sync Cargo.lock after dev-dep addition
oferchen
added a commit
that referenced
this pull request
May 18, 2026
Adds a Criterion bench in crates/transfer/benches that compares three
rayon dispatch shapes - `into_par_iter` over an owned Vec, `par_bridge`
over a Vec iterator, and `par_bridge` over a pure generator - across
{1, 4, 8, 16} workers on a fixed 100K small-file workload. Throughput
is reported in items/sec via `Throughput::Elements`. The numbers feed
the dispatch-shape decisions tracked in #1284, #1370, and #1681.
oferchen
added a commit
that referenced
this pull request
May 18, 2026
…ems (#4203) Add a criterion bench that compares per-item send+recv cost across `std::sync::mpsc::channel`, `crossbeam_channel::unbounded`, and `crossbeam_channel::bounded(1024)` at 100K items. Sweeps payload sizes {32 B, 256 B, 4 KB} and thread shapes {1S+1R, 4S+1R, 1S+4R, 4S+4R} to expose the contention regimes that appear in the engine and receiver pipelines. Payloads are pre-allocated outside the timed section so the measurement isolates channel cost from allocation. Throughput is reported via `Throughput::Elements(100_000)` so the criterion summary gives items/sec directly. Pure userspace synchronisation - no I/O, no platform-specific paths, runs identically on Linux, macOS, and Windows. Refs #1592 #1681 #1370
oferchen
added a commit
that referenced
this pull request
May 18, 2026
Consolidates issues #1271 and #1370 into a single design proposal for promoting the engine BufferPool's thread-local cache from a single slot in front of a shared ArrayQueue to a per-thread slab as the primary storage, with the existing ArrayQueue demoted to a bounded global overflow / balancer. Covers: contrast with sharded-mutex (#1295) and per-thread-cache (#1370) alternatives, LIFO slab structure with cross-thread return handling, bounded memory accounting layered on top of #2245's byte budget, preserved PooledBuffer Drop and BufferAllocator surface, failure modes (thread teardown, panicking thread, long-lived buffers), comparison table, trigger conditions for adoption, and a five-step implementation sequence. Recommendation is to defer implementation until profiling at 32+ sustained threads proves the existing two-level layout actually contends.
oferchen
added a commit
that referenced
this pull request
May 18, 2026
Adds an opt-in per-thread LIFO slab in front of BufferPool, gated behind the new `thread-slab-pool` feature (default off). The slab keeps a depth-bounded `Vec<Vec<u8>>` per thread with explicit slot and byte caps (default 8 slots, 1 MiB), and falls through to the existing lock-free central queue for cross-thread returns, overflow, and periodic donation of long-lived buffers. The public surface (BufferPool, BufferGuard, BorrowedBufferGuard Drop) is unchanged - callers always go through `pool.return_buffer` and the guard Drop signatures stay byte-for-byte identical. Switching between the single-slot and slab backends happens at compile time inside `thread_local_cache.rs`, so `pool.rs` reads the same on both paths. Implements the design at docs/design/per-thread-buffer-slab.md merged via #4230. Closes part of #1271 and #1370.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Brand::from_strwith a documentedBrandParseErrorso branding aliases are parsed consistentlyOC_RSYNC_BRANDoverride and extend unit tests to cover accepted and rejected inputsTesting
https://chatgpt.com/codex/tasks/task_e_6900c62f2d14832387a6e2f3866f6acc