Skip to content

Add brand parsing helpers for environment overrides#1370

Merged
oferchen merged 1 commit into
masterfrom
prepare-for-production-release
Oct 28, 2025
Merged

Add brand parsing helpers for environment overrides#1370
oferchen merged 1 commit into
masterfrom
prepare-for-production-release

Conversation

@oferchen
Copy link
Copy Markdown
Owner

Summary

  • implement Brand::from_str with a documented BrandParseError so branding aliases are parsed consistently
  • reuse the parser when evaluating the OC_RSYNC_BRAND override and extend unit tests to cover accepted and rejected inputs

Testing

  • cargo test

https://chatgpt.com/codex/tasks/task_e_6900c62f2d14832387a6e2f3866f6acc

@oferchen oferchen merged commit 8092c45 into master Oct 28, 2025
@oferchen oferchen deleted the prepare-for-production-release branch October 28, 2025 13:45
oferchen added a commit that referenced this pull request May 7, 2026
Propose promoting the single-slot thread_local cache (#1329) into a
fixed-depth per-thread cache to keep most acquire/return cycles off the
shared ArrayQueue, with overflow falling through to the existing pool.
Documents the idle-memory trade-off and cites the sharded-queue
alternative in #1271 as fallback.
oferchen added a commit that referenced this pull request May 16, 2026
Add criterion microbench under crates/transfer/benches profiling four
result-collection strategies (shared Arc<Mutex<Vec>>, sharded
Mutex<Vec> by rayon worker id, crossbeam SegQueue, crossbeam unbounded
channel) over 100K items at 1/4/8/16 rayon worker counts. Throughput
is reported in elements/sec so the bench output names the worker
count at which the single shared mutex saturates.

The shipping parallel-stat path in crates/transfer/src/parallel_io.rs
does not use Arc<Mutex<Vec>>; it collects via
into_par_iter().map(f).collect(), which delegates to rayon's lock-free
reducer. Document this in docs/audits/parallel-stat-collection.md and
keep the microbench as the baseline against which any future PR that
proposes reintroducing a shared mutex on this path must be measured
(tracked under #1192, #1271, #1297, #1370, #1682).
oferchen added a commit that referenced this pull request May 16, 2026
… (#4170)

* chore(bench): add parallel-stat collector contention microbench

Add criterion microbench under crates/transfer/benches profiling four
result-collection strategies (shared Arc<Mutex<Vec>>, sharded
Mutex<Vec> by rayon worker id, crossbeam SegQueue, crossbeam unbounded
channel) over 100K items at 1/4/8/16 rayon worker counts. Throughput
is reported in elements/sec so the bench output names the worker
count at which the single shared mutex saturates.

The shipping parallel-stat path in crates/transfer/src/parallel_io.rs
does not use Arc<Mutex<Vec>>; it collects via
into_par_iter().map(f).collect(), which delegates to rayon's lock-free
reducer. Document this in docs/audits/parallel-stat-collection.md and
keep the microbench as the baseline against which any future PR that
proposes reintroducing a shared mutex on this path must be measured
(tracked under #1192, #1271, #1297, #1370, #1682).

* chore: sync Cargo.lock after dev-dep addition
oferchen added a commit that referenced this pull request May 17, 2026
Adds a Criterion bench in crates/transfer/benches that compares three
rayon dispatch shapes - `into_par_iter` over an owned Vec, `par_bridge`
over a Vec iterator, and `par_bridge` over a pure generator - across
{1, 4, 8, 16} workers on a fixed 100K small-file workload. Throughput
is reported in items/sec via `Throughput::Elements`. The numbers feed
the dispatch-shape decisions tracked in #1284, #1370, and #1681.
oferchen added a commit that referenced this pull request May 17, 2026
…ems (#4203)

Add a criterion bench that compares per-item send+recv cost across
`std::sync::mpsc::channel`, `crossbeam_channel::unbounded`, and
`crossbeam_channel::bounded(1024)` at 100K items. Sweeps payload sizes
{32 B, 256 B, 4 KB} and thread shapes {1S+1R, 4S+1R, 1S+4R, 4S+4R} to
expose the contention regimes that appear in the engine and receiver
pipelines. Payloads are pre-allocated outside the timed section so the
measurement isolates channel cost from allocation. Throughput is
reported via `Throughput::Elements(100_000)` so the criterion summary
gives items/sec directly.

Pure userspace synchronisation - no I/O, no platform-specific paths,
runs identically on Linux, macOS, and Windows.

Refs #1592 #1681 #1370
oferchen added a commit that referenced this pull request May 17, 2026
Consolidates issues #1271 and #1370 into a single design proposal for
promoting the engine BufferPool's thread-local cache from a single
slot in front of a shared ArrayQueue to a per-thread slab as the
primary storage, with the existing ArrayQueue demoted to a bounded
global overflow / balancer.

Covers: contrast with sharded-mutex (#1295) and per-thread-cache
(#1370) alternatives, LIFO slab structure with cross-thread return
handling, bounded memory accounting layered on top of #2245's byte
budget, preserved PooledBuffer Drop and BufferAllocator surface,
failure modes (thread teardown, panicking thread, long-lived
buffers), comparison table, trigger conditions for adoption, and a
five-step implementation sequence. Recommendation is to defer
implementation until profiling at 32+ sustained threads proves the
existing two-level layout actually contends.
oferchen added a commit that referenced this pull request May 17, 2026
Adds an opt-in per-thread LIFO slab in front of BufferPool, gated behind
the new `thread-slab-pool` feature (default off). The slab keeps a
depth-bounded `Vec<Vec<u8>>` per thread with explicit slot and byte caps
(default 8 slots, 1 MiB), and falls through to the existing lock-free
central queue for cross-thread returns, overflow, and periodic donation
of long-lived buffers.

The public surface (BufferPool, BufferGuard, BorrowedBufferGuard Drop)
is unchanged - callers always go through `pool.return_buffer` and the
guard Drop signatures stay byte-for-byte identical. Switching between
the single-slot and slab backends happens at compile time inside
`thread_local_cache.rs`, so `pool.rs` reads the same on both paths.

Implements the design at docs/design/per-thread-buffer-slab.md merged
via #4230. Closes part of #1271 and #1370.
oferchen added a commit that referenced this pull request May 18, 2026
Propose promoting the single-slot thread_local cache (#1329) into a
fixed-depth per-thread cache to keep most acquire/return cycles off the
shared ArrayQueue, with overflow falling through to the existing pool.
Documents the idle-memory trade-off and cites the sharded-queue
alternative in #1271 as fallback.
oferchen added a commit that referenced this pull request May 18, 2026
… (#4170)

* chore(bench): add parallel-stat collector contention microbench

Add criterion microbench under crates/transfer/benches profiling four
result-collection strategies (shared Arc<Mutex<Vec>>, sharded
Mutex<Vec> by rayon worker id, crossbeam SegQueue, crossbeam unbounded
channel) over 100K items at 1/4/8/16 rayon worker counts. Throughput
is reported in elements/sec so the bench output names the worker
count at which the single shared mutex saturates.

The shipping parallel-stat path in crates/transfer/src/parallel_io.rs
does not use Arc<Mutex<Vec>>; it collects via
into_par_iter().map(f).collect(), which delegates to rayon's lock-free
reducer. Document this in docs/audits/parallel-stat-collection.md and
keep the microbench as the baseline against which any future PR that
proposes reintroducing a shared mutex on this path must be measured
(tracked under #1192, #1271, #1297, #1370, #1682).

* chore: sync Cargo.lock after dev-dep addition
oferchen added a commit that referenced this pull request May 18, 2026
Adds a Criterion bench in crates/transfer/benches that compares three
rayon dispatch shapes - `into_par_iter` over an owned Vec, `par_bridge`
over a Vec iterator, and `par_bridge` over a pure generator - across
{1, 4, 8, 16} workers on a fixed 100K small-file workload. Throughput
is reported in items/sec via `Throughput::Elements`. The numbers feed
the dispatch-shape decisions tracked in #1284, #1370, and #1681.
oferchen added a commit that referenced this pull request May 18, 2026
…ems (#4203)

Add a criterion bench that compares per-item send+recv cost across
`std::sync::mpsc::channel`, `crossbeam_channel::unbounded`, and
`crossbeam_channel::bounded(1024)` at 100K items. Sweeps payload sizes
{32 B, 256 B, 4 KB} and thread shapes {1S+1R, 4S+1R, 1S+4R, 4S+4R} to
expose the contention regimes that appear in the engine and receiver
pipelines. Payloads are pre-allocated outside the timed section so the
measurement isolates channel cost from allocation. Throughput is
reported via `Throughput::Elements(100_000)` so the criterion summary
gives items/sec directly.

Pure userspace synchronisation - no I/O, no platform-specific paths,
runs identically on Linux, macOS, and Windows.

Refs #1592 #1681 #1370
oferchen added a commit that referenced this pull request May 18, 2026
Consolidates issues #1271 and #1370 into a single design proposal for
promoting the engine BufferPool's thread-local cache from a single
slot in front of a shared ArrayQueue to a per-thread slab as the
primary storage, with the existing ArrayQueue demoted to a bounded
global overflow / balancer.

Covers: contrast with sharded-mutex (#1295) and per-thread-cache
(#1370) alternatives, LIFO slab structure with cross-thread return
handling, bounded memory accounting layered on top of #2245's byte
budget, preserved PooledBuffer Drop and BufferAllocator surface,
failure modes (thread teardown, panicking thread, long-lived
buffers), comparison table, trigger conditions for adoption, and a
five-step implementation sequence. Recommendation is to defer
implementation until profiling at 32+ sustained threads proves the
existing two-level layout actually contends.
oferchen added a commit that referenced this pull request May 18, 2026
Adds an opt-in per-thread LIFO slab in front of BufferPool, gated behind
the new `thread-slab-pool` feature (default off). The slab keeps a
depth-bounded `Vec<Vec<u8>>` per thread with explicit slot and byte caps
(default 8 slots, 1 MiB), and falls through to the existing lock-free
central queue for cross-thread returns, overflow, and periodic donation
of long-lived buffers.

The public surface (BufferPool, BufferGuard, BorrowedBufferGuard Drop)
is unchanged - callers always go through `pool.return_buffer` and the
guard Drop signatures stay byte-for-byte identical. Switching between
the single-slot and slab backends happens at compile time inside
`thread_local_cache.rs`, so `pool.rs` reads the same on both paths.

Implements the design at docs/design/per-thread-buffer-slab.md merged
via #4230. Closes part of #1271 and #1370.
oferchen added a commit that referenced this pull request May 18, 2026
… (#4170)

* chore(bench): add parallel-stat collector contention microbench

Add criterion microbench under crates/transfer/benches profiling four
result-collection strategies (shared Arc<Mutex<Vec>>, sharded
Mutex<Vec> by rayon worker id, crossbeam SegQueue, crossbeam unbounded
channel) over 100K items at 1/4/8/16 rayon worker counts. Throughput
is reported in elements/sec so the bench output names the worker
count at which the single shared mutex saturates.

The shipping parallel-stat path in crates/transfer/src/parallel_io.rs
does not use Arc<Mutex<Vec>>; it collects via
into_par_iter().map(f).collect(), which delegates to rayon's lock-free
reducer. Document this in docs/audits/parallel-stat-collection.md and
keep the microbench as the baseline against which any future PR that
proposes reintroducing a shared mutex on this path must be measured
(tracked under #1192, #1271, #1297, #1370, #1682).

* chore: sync Cargo.lock after dev-dep addition
oferchen added a commit that referenced this pull request May 18, 2026
Adds a Criterion bench in crates/transfer/benches that compares three
rayon dispatch shapes - `into_par_iter` over an owned Vec, `par_bridge`
over a Vec iterator, and `par_bridge` over a pure generator - across
{1, 4, 8, 16} workers on a fixed 100K small-file workload. Throughput
is reported in items/sec via `Throughput::Elements`. The numbers feed
the dispatch-shape decisions tracked in #1284, #1370, and #1681.
oferchen added a commit that referenced this pull request May 18, 2026
…ems (#4203)

Add a criterion bench that compares per-item send+recv cost across
`std::sync::mpsc::channel`, `crossbeam_channel::unbounded`, and
`crossbeam_channel::bounded(1024)` at 100K items. Sweeps payload sizes
{32 B, 256 B, 4 KB} and thread shapes {1S+1R, 4S+1R, 1S+4R, 4S+4R} to
expose the contention regimes that appear in the engine and receiver
pipelines. Payloads are pre-allocated outside the timed section so the
measurement isolates channel cost from allocation. Throughput is
reported via `Throughput::Elements(100_000)` so the criterion summary
gives items/sec directly.

Pure userspace synchronisation - no I/O, no platform-specific paths,
runs identically on Linux, macOS, and Windows.

Refs #1592 #1681 #1370
oferchen added a commit that referenced this pull request May 18, 2026
Consolidates issues #1271 and #1370 into a single design proposal for
promoting the engine BufferPool's thread-local cache from a single
slot in front of a shared ArrayQueue to a per-thread slab as the
primary storage, with the existing ArrayQueue demoted to a bounded
global overflow / balancer.

Covers: contrast with sharded-mutex (#1295) and per-thread-cache
(#1370) alternatives, LIFO slab structure with cross-thread return
handling, bounded memory accounting layered on top of #2245's byte
budget, preserved PooledBuffer Drop and BufferAllocator surface,
failure modes (thread teardown, panicking thread, long-lived
buffers), comparison table, trigger conditions for adoption, and a
five-step implementation sequence. Recommendation is to defer
implementation until profiling at 32+ sustained threads proves the
existing two-level layout actually contends.
oferchen added a commit that referenced this pull request May 18, 2026
Adds an opt-in per-thread LIFO slab in front of BufferPool, gated behind
the new `thread-slab-pool` feature (default off). The slab keeps a
depth-bounded `Vec<Vec<u8>>` per thread with explicit slot and byte caps
(default 8 slots, 1 MiB), and falls through to the existing lock-free
central queue for cross-thread returns, overflow, and periodic donation
of long-lived buffers.

The public surface (BufferPool, BufferGuard, BorrowedBufferGuard Drop)
is unchanged - callers always go through `pool.return_buffer` and the
guard Drop signatures stay byte-for-byte identical. Switching between
the single-slot and slab backends happens at compile time inside
`thread_local_cache.rs`, so `pool.rs` reads the same on both paths.

Implements the design at docs/design/per-thread-buffer-slab.md merged
via #4230. Closes part of #1271 and #1370.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant