docs(design): per-thread buffer slab for engine BufferPool (#1271, #1370) by oferchen · Pull Request #4230 · oferchen/rsync

oferchen · 2026-05-17T19:24:57Z

Summary

Consolidates issues Track module bandwidth limiter changes #1271 and Add brand parsing helpers for environment overrides #1370 into a single design note proposing a per-thread buffer slab as the primary storage layer for the engine BufferPool, with the current crossbeam_queue::ArrayQueue demoted to a bounded global overflow / balancer.
Distinguishes this proposal from the two existing nearby designs: buffer-pool-sharding.md (Optimize bandwidth parser ASCII scan #1295, sharded ArrayQueue with global fallback) and per-thread-buffer-pools.md (Add brand parsing helpers for environment overrides #1370 first pass, per-thread cache in front of the queue).
Recommendation is to defer implementation until profiling at 32+ sustained threads on production-representative workloads proves the existing two-level layout actually contends; the doc lays out the bench extensions and trigger thresholds that must be met before either this slab or sharding lands.

Test plan

Reviewers confirm the comparison table (single queue vs sharded vs slab) is accurate against pool.rs, byte_budget.rs, and buffer-pool-sharding.md.
Reviewers confirm the cross-thread return strategy choice (global overflow vs steal-from-other-thread) is the right default.
Reviewers confirm the trigger conditions in section 8 are the right gates before any implementation lands.

Consolidates issues #1271 and #1370 into a single design proposal for promoting the engine BufferPool's thread-local cache from a single slot in front of a shared ArrayQueue to a per-thread slab as the primary storage, with the existing ArrayQueue demoted to a bounded global overflow / balancer. Covers: contrast with sharded-mutex (#1295) and per-thread-cache (#1370) alternatives, LIFO slab structure with cross-thread return handling, bounded memory accounting layered on top of #2245's byte budget, preserved PooledBuffer Drop and BufferAllocator surface, failure modes (thread teardown, panicking thread, long-lived buffers), comparison table, trigger conditions for adoption, and a five-step implementation sequence. Recommendation is to defer implementation until profiling at 32+ sustained threads proves the existing two-level layout actually contends.

Adds an opt-in per-thread LIFO slab in front of BufferPool, gated behind the new `thread-slab-pool` feature (default off). The slab keeps a depth-bounded `Vec<Vec<u8>>` per thread with explicit slot and byte caps (default 8 slots, 1 MiB), and falls through to the existing lock-free central queue for cross-thread returns, overflow, and periodic donation of long-lived buffers. The public surface (BufferPool, BufferGuard, BorrowedBufferGuard Drop) is unchanged - callers always go through `pool.return_buffer` and the guard Drop signatures stay byte-for-byte identical. Switching between the single-slot and slab backends happens at compile time inside `thread_local_cache.rs`, so `pool.rs` reads the same on both paths. Implements the design at docs/design/per-thread-buffer-slab.md merged via #4230. Closes part of #1271 and #1370.

Consolidates issues #1271 and #1370 into a single design proposal for promoting the engine BufferPool's thread-local cache from a single slot in front of a shared ArrayQueue to a per-thread slab as the primary storage, with the existing ArrayQueue demoted to a bounded global overflow / balancer. Covers: contrast with sharded-mutex (#1295) and per-thread-cache (#1370) alternatives, LIFO slab structure with cross-thread return handling, bounded memory accounting layered on top of #2245's byte budget, preserved PooledBuffer Drop and BufferAllocator surface, failure modes (thread teardown, panicking thread, long-lived buffers), comparison table, trigger conditions for adoption, and a five-step implementation sequence. Recommendation is to defer implementation until profiling at 32+ sustained threads proves the existing two-level layout actually contends.

Adds an opt-in per-thread LIFO slab in front of BufferPool, gated behind the new `thread-slab-pool` feature (default off). The slab keeps a depth-bounded `Vec<Vec<u8>>` per thread with explicit slot and byte caps (default 8 slots, 1 MiB), and falls through to the existing lock-free central queue for cross-thread returns, overflow, and periodic donation of long-lived buffers. The public surface (BufferPool, BufferGuard, BorrowedBufferGuard Drop) is unchanged - callers always go through `pool.return_buffer` and the guard Drop signatures stay byte-for-byte identical. Switching between the single-slot and slab backends happens at compile time inside `thread_local_cache.rs`, so `pool.rs` reads the same on both paths. Implements the design at docs/design/per-thread-buffer-slab.md merged via #4230. Closes part of #1271 and #1370.

Consolidates issues #1271 and #1370 into a single design proposal for promoting the engine BufferPool's thread-local cache from a single slot in front of a shared ArrayQueue to a per-thread slab as the primary storage, with the existing ArrayQueue demoted to a bounded global overflow / balancer. Covers: contrast with sharded-mutex (#1295) and per-thread-cache (#1370) alternatives, LIFO slab structure with cross-thread return handling, bounded memory accounting layered on top of #2245's byte budget, preserved PooledBuffer Drop and BufferAllocator surface, failure modes (thread teardown, panicking thread, long-lived buffers), comparison table, trigger conditions for adoption, and a five-step implementation sequence. Recommendation is to defer implementation until profiling at 32+ sustained threads proves the existing two-level layout actually contends.

Adds an opt-in per-thread LIFO slab in front of BufferPool, gated behind the new `thread-slab-pool` feature (default off). The slab keeps a depth-bounded `Vec<Vec<u8>>` per thread with explicit slot and byte caps (default 8 slots, 1 MiB), and falls through to the existing lock-free central queue for cross-thread returns, overflow, and periodic donation of long-lived buffers. The public surface (BufferPool, BufferGuard, BorrowedBufferGuard Drop) is unchanged - callers always go through `pool.return_buffer` and the guard Drop signatures stay byte-for-byte identical. Switching between the single-slot and slab backends happens at compile time inside `thread_local_cache.rs`, so `pool.rs` reads the same on both paths. Implements the design at docs/design/per-thread-buffer-slab.md merged via #4230. Closes part of #1271 and #1370.

oferchen merged commit 09194b6 into master May 17, 2026
8 checks passed

github-actions Bot added the documentation Improvements or additions to documentation label May 17, 2026

oferchen mentioned this pull request May 17, 2026

feat(engine): per-thread buffer slab BufferPool alternative (#1271, #1370) #4286

Merged

4 tasks

oferchen deleted the docs/per-thread-buffer-slab-1271 branch May 19, 2026 19:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs(design): per-thread buffer slab for engine BufferPool (#1271, #1370)#4230

docs(design): per-thread buffer slab for engine BufferPool (#1271, #1370)#4230
oferchen merged 1 commit into
masterfrom
docs/per-thread-buffer-slab-1271

oferchen commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

oferchen commented May 17, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant