docs(design): per-thread buffer slab for engine BufferPool (#1271, #1370)#4230
Merged
Conversation
Consolidates issues #1271 and #1370 into a single design proposal for promoting the engine BufferPool's thread-local cache from a single slot in front of a shared ArrayQueue to a per-thread slab as the primary storage, with the existing ArrayQueue demoted to a bounded global overflow / balancer. Covers: contrast with sharded-mutex (#1295) and per-thread-cache (#1370) alternatives, LIFO slab structure with cross-thread return handling, bounded memory accounting layered on top of #2245's byte budget, preserved PooledBuffer Drop and BufferAllocator surface, failure modes (thread teardown, panicking thread, long-lived buffers), comparison table, trigger conditions for adoption, and a five-step implementation sequence. Recommendation is to defer implementation until profiling at 32+ sustained threads proves the existing two-level layout actually contends.
4 tasks
oferchen
added a commit
that referenced
this pull request
May 17, 2026
Adds an opt-in per-thread LIFO slab in front of BufferPool, gated behind the new `thread-slab-pool` feature (default off). The slab keeps a depth-bounded `Vec<Vec<u8>>` per thread with explicit slot and byte caps (default 8 slots, 1 MiB), and falls through to the existing lock-free central queue for cross-thread returns, overflow, and periodic donation of long-lived buffers. The public surface (BufferPool, BufferGuard, BorrowedBufferGuard Drop) is unchanged - callers always go through `pool.return_buffer` and the guard Drop signatures stay byte-for-byte identical. Switching between the single-slot and slab backends happens at compile time inside `thread_local_cache.rs`, so `pool.rs` reads the same on both paths. Implements the design at docs/design/per-thread-buffer-slab.md merged via #4230. Closes part of #1271 and #1370.
oferchen
added a commit
that referenced
this pull request
May 18, 2026
Consolidates issues #1271 and #1370 into a single design proposal for promoting the engine BufferPool's thread-local cache from a single slot in front of a shared ArrayQueue to a per-thread slab as the primary storage, with the existing ArrayQueue demoted to a bounded global overflow / balancer. Covers: contrast with sharded-mutex (#1295) and per-thread-cache (#1370) alternatives, LIFO slab structure with cross-thread return handling, bounded memory accounting layered on top of #2245's byte budget, preserved PooledBuffer Drop and BufferAllocator surface, failure modes (thread teardown, panicking thread, long-lived buffers), comparison table, trigger conditions for adoption, and a five-step implementation sequence. Recommendation is to defer implementation until profiling at 32+ sustained threads proves the existing two-level layout actually contends.
oferchen
added a commit
that referenced
this pull request
May 18, 2026
Adds an opt-in per-thread LIFO slab in front of BufferPool, gated behind the new `thread-slab-pool` feature (default off). The slab keeps a depth-bounded `Vec<Vec<u8>>` per thread with explicit slot and byte caps (default 8 slots, 1 MiB), and falls through to the existing lock-free central queue for cross-thread returns, overflow, and periodic donation of long-lived buffers. The public surface (BufferPool, BufferGuard, BorrowedBufferGuard Drop) is unchanged - callers always go through `pool.return_buffer` and the guard Drop signatures stay byte-for-byte identical. Switching between the single-slot and slab backends happens at compile time inside `thread_local_cache.rs`, so `pool.rs` reads the same on both paths. Implements the design at docs/design/per-thread-buffer-slab.md merged via #4230. Closes part of #1271 and #1370.
oferchen
added a commit
that referenced
this pull request
May 18, 2026
Consolidates issues #1271 and #1370 into a single design proposal for promoting the engine BufferPool's thread-local cache from a single slot in front of a shared ArrayQueue to a per-thread slab as the primary storage, with the existing ArrayQueue demoted to a bounded global overflow / balancer. Covers: contrast with sharded-mutex (#1295) and per-thread-cache (#1370) alternatives, LIFO slab structure with cross-thread return handling, bounded memory accounting layered on top of #2245's byte budget, preserved PooledBuffer Drop and BufferAllocator surface, failure modes (thread teardown, panicking thread, long-lived buffers), comparison table, trigger conditions for adoption, and a five-step implementation sequence. Recommendation is to defer implementation until profiling at 32+ sustained threads proves the existing two-level layout actually contends.
oferchen
added a commit
that referenced
this pull request
May 18, 2026
Adds an opt-in per-thread LIFO slab in front of BufferPool, gated behind the new `thread-slab-pool` feature (default off). The slab keeps a depth-bounded `Vec<Vec<u8>>` per thread with explicit slot and byte caps (default 8 slots, 1 MiB), and falls through to the existing lock-free central queue for cross-thread returns, overflow, and periodic donation of long-lived buffers. The public surface (BufferPool, BufferGuard, BorrowedBufferGuard Drop) is unchanged - callers always go through `pool.return_buffer` and the guard Drop signatures stay byte-for-byte identical. Switching between the single-slot and slab backends happens at compile time inside `thread_local_cache.rs`, so `pool.rs` reads the same on both paths. Implements the design at docs/design/per-thread-buffer-slab.md merged via #4230. Closes part of #1271 and #1370.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
BufferPool, with the currentcrossbeam_queue::ArrayQueuedemoted to a bounded global overflow / balancer.buffer-pool-sharding.md(Optimize bandwidth parser ASCII scan #1295, shardedArrayQueuewith global fallback) andper-thread-buffer-pools.md(Add brand parsing helpers for environment overrides #1370 first pass, per-thread cache in front of the queue).Test plan
pool.rs,byte_budget.rs, andbuffer-pool-sharding.md.