Add brand parsing helpers for environment overrides by oferchen · Pull Request #1370 · oferchen/rsync

oferchen · 2025-10-28T13:44:58Z

Summary

implement Brand::from_str with a documented BrandParseError so branding aliases are parsed consistently
reuse the parser when evaluating the OC_RSYNC_BRAND override and extend unit tests to cover accepted and rejected inputs

Testing

cargo test

https://chatgpt.com/codex/tasks/task_e_6900c62f2d14832387a6e2f3866f6acc

Propose promoting the single-slot thread_local cache (#1329) into a fixed-depth per-thread cache to keep most acquire/return cycles off the shared ArrayQueue, with overflow falling through to the existing pool. Documents the idle-memory trade-off and cites the sharded-queue alternative in #1271 as fallback.

Add criterion microbench under crates/transfer/benches profiling four result-collection strategies (shared Arc<Mutex<Vec>>, sharded Mutex<Vec> by rayon worker id, crossbeam SegQueue, crossbeam unbounded channel) over 100K items at 1/4/8/16 rayon worker counts. Throughput is reported in elements/sec so the bench output names the worker count at which the single shared mutex saturates. The shipping parallel-stat path in crates/transfer/src/parallel_io.rs does not use Arc<Mutex<Vec>>; it collects via into_par_iter().map(f).collect(), which delegates to rayon's lock-free reducer. Document this in docs/audits/parallel-stat-collection.md and keep the microbench as the baseline against which any future PR that proposes reintroducing a shared mutex on this path must be measured (tracked under #1192, #1271, #1297, #1370, #1682).

… (#4170) * chore(bench): add parallel-stat collector contention microbench Add criterion microbench under crates/transfer/benches profiling four result-collection strategies (shared Arc<Mutex<Vec>>, sharded Mutex<Vec> by rayon worker id, crossbeam SegQueue, crossbeam unbounded channel) over 100K items at 1/4/8/16 rayon worker counts. Throughput is reported in elements/sec so the bench output names the worker count at which the single shared mutex saturates. The shipping parallel-stat path in crates/transfer/src/parallel_io.rs does not use Arc<Mutex<Vec>>; it collects via into_par_iter().map(f).collect(), which delegates to rayon's lock-free reducer. Document this in docs/audits/parallel-stat-collection.md and keep the microbench as the baseline against which any future PR that proposes reintroducing a shared mutex on this path must be measured (tracked under #1192, #1271, #1297, #1370, #1682). * chore: sync Cargo.lock after dev-dep addition

Adds a Criterion bench in crates/transfer/benches that compares three rayon dispatch shapes - `into_par_iter` over an owned Vec, `par_bridge` over a Vec iterator, and `par_bridge` over a pure generator - across {1, 4, 8, 16} workers on a fixed 100K small-file workload. Throughput is reported in items/sec via `Throughput::Elements`. The numbers feed the dispatch-shape decisions tracked in #1284, #1370, and #1681.

…ems (#4203) Add a criterion bench that compares per-item send+recv cost across `std::sync::mpsc::channel`, `crossbeam_channel::unbounded`, and `crossbeam_channel::bounded(1024)` at 100K items. Sweeps payload sizes {32 B, 256 B, 4 KB} and thread shapes {1S+1R, 4S+1R, 1S+4R, 4S+4R} to expose the contention regimes that appear in the engine and receiver pipelines. Payloads are pre-allocated outside the timed section so the measurement isolates channel cost from allocation. Throughput is reported via `Throughput::Elements(100_000)` so the criterion summary gives items/sec directly. Pure userspace synchronisation - no I/O, no platform-specific paths, runs identically on Linux, macOS, and Windows. Refs #1592 #1681 #1370

Consolidates issues #1271 and #1370 into a single design proposal for promoting the engine BufferPool's thread-local cache from a single slot in front of a shared ArrayQueue to a per-thread slab as the primary storage, with the existing ArrayQueue demoted to a bounded global overflow / balancer. Covers: contrast with sharded-mutex (#1295) and per-thread-cache (#1370) alternatives, LIFO slab structure with cross-thread return handling, bounded memory accounting layered on top of #2245's byte budget, preserved PooledBuffer Drop and BufferAllocator surface, failure modes (thread teardown, panicking thread, long-lived buffers), comparison table, trigger conditions for adoption, and a five-step implementation sequence. Recommendation is to defer implementation until profiling at 32+ sustained threads proves the existing two-level layout actually contends.

Adds an opt-in per-thread LIFO slab in front of BufferPool, gated behind the new `thread-slab-pool` feature (default off). The slab keeps a depth-bounded `Vec<Vec<u8>>` per thread with explicit slot and byte caps (default 8 slots, 1 MiB), and falls through to the existing lock-free central queue for cross-thread returns, overflow, and periodic donation of long-lived buffers. The public surface (BufferPool, BufferGuard, BorrowedBufferGuard Drop) is unchanged - callers always go through `pool.return_buffer` and the guard Drop signatures stay byte-for-byte identical. Switching between the single-slot and slab backends happens at compile time inside `thread_local_cache.rs`, so `pool.rs` reads the same on both paths. Implements the design at docs/design/per-thread-buffer-slab.md merged via #4230. Closes part of #1271 and #1370.