Skip to content

docs(design): per-thread buffer slab for engine BufferPool (#1271, #1370)#4230

Merged
oferchen merged 1 commit into
masterfrom
docs/per-thread-buffer-slab-1271
May 17, 2026
Merged

docs(design): per-thread buffer slab for engine BufferPool (#1271, #1370)#4230
oferchen merged 1 commit into
masterfrom
docs/per-thread-buffer-slab-1271

Conversation

@oferchen
Copy link
Copy Markdown
Owner

Summary

Test plan

  • Reviewers confirm the comparison table (single queue vs sharded vs slab) is accurate against pool.rs, byte_budget.rs, and buffer-pool-sharding.md.
  • Reviewers confirm the cross-thread return strategy choice (global overflow vs steal-from-other-thread) is the right default.
  • Reviewers confirm the trigger conditions in section 8 are the right gates before any implementation lands.

Consolidates issues #1271 and #1370 into a single design proposal for
promoting the engine BufferPool's thread-local cache from a single
slot in front of a shared ArrayQueue to a per-thread slab as the
primary storage, with the existing ArrayQueue demoted to a bounded
global overflow / balancer.

Covers: contrast with sharded-mutex (#1295) and per-thread-cache
(#1370) alternatives, LIFO slab structure with cross-thread return
handling, bounded memory accounting layered on top of #2245's byte
budget, preserved PooledBuffer Drop and BufferAllocator surface,
failure modes (thread teardown, panicking thread, long-lived
buffers), comparison table, trigger conditions for adoption, and a
five-step implementation sequence. Recommendation is to defer
implementation until profiling at 32+ sustained threads proves the
existing two-level layout actually contends.
@oferchen oferchen merged commit 09194b6 into master May 17, 2026
8 checks passed
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label May 17, 2026
oferchen added a commit that referenced this pull request May 17, 2026
Adds an opt-in per-thread LIFO slab in front of BufferPool, gated behind
the new `thread-slab-pool` feature (default off). The slab keeps a
depth-bounded `Vec<Vec<u8>>` per thread with explicit slot and byte caps
(default 8 slots, 1 MiB), and falls through to the existing lock-free
central queue for cross-thread returns, overflow, and periodic donation
of long-lived buffers.

The public surface (BufferPool, BufferGuard, BorrowedBufferGuard Drop)
is unchanged - callers always go through `pool.return_buffer` and the
guard Drop signatures stay byte-for-byte identical. Switching between
the single-slot and slab backends happens at compile time inside
`thread_local_cache.rs`, so `pool.rs` reads the same on both paths.

Implements the design at docs/design/per-thread-buffer-slab.md merged
via #4230. Closes part of #1271 and #1370.
oferchen added a commit that referenced this pull request May 18, 2026
Consolidates issues #1271 and #1370 into a single design proposal for
promoting the engine BufferPool's thread-local cache from a single
slot in front of a shared ArrayQueue to a per-thread slab as the
primary storage, with the existing ArrayQueue demoted to a bounded
global overflow / balancer.

Covers: contrast with sharded-mutex (#1295) and per-thread-cache
(#1370) alternatives, LIFO slab structure with cross-thread return
handling, bounded memory accounting layered on top of #2245's byte
budget, preserved PooledBuffer Drop and BufferAllocator surface,
failure modes (thread teardown, panicking thread, long-lived
buffers), comparison table, trigger conditions for adoption, and a
five-step implementation sequence. Recommendation is to defer
implementation until profiling at 32+ sustained threads proves the
existing two-level layout actually contends.
oferchen added a commit that referenced this pull request May 18, 2026
Adds an opt-in per-thread LIFO slab in front of BufferPool, gated behind
the new `thread-slab-pool` feature (default off). The slab keeps a
depth-bounded `Vec<Vec<u8>>` per thread with explicit slot and byte caps
(default 8 slots, 1 MiB), and falls through to the existing lock-free
central queue for cross-thread returns, overflow, and periodic donation
of long-lived buffers.

The public surface (BufferPool, BufferGuard, BorrowedBufferGuard Drop)
is unchanged - callers always go through `pool.return_buffer` and the
guard Drop signatures stay byte-for-byte identical. Switching between
the single-slot and slab backends happens at compile time inside
`thread_local_cache.rs`, so `pool.rs` reads the same on both paths.

Implements the design at docs/design/per-thread-buffer-slab.md merged
via #4230. Closes part of #1271 and #1370.
oferchen added a commit that referenced this pull request May 18, 2026
Consolidates issues #1271 and #1370 into a single design proposal for
promoting the engine BufferPool's thread-local cache from a single
slot in front of a shared ArrayQueue to a per-thread slab as the
primary storage, with the existing ArrayQueue demoted to a bounded
global overflow / balancer.

Covers: contrast with sharded-mutex (#1295) and per-thread-cache
(#1370) alternatives, LIFO slab structure with cross-thread return
handling, bounded memory accounting layered on top of #2245's byte
budget, preserved PooledBuffer Drop and BufferAllocator surface,
failure modes (thread teardown, panicking thread, long-lived
buffers), comparison table, trigger conditions for adoption, and a
five-step implementation sequence. Recommendation is to defer
implementation until profiling at 32+ sustained threads proves the
existing two-level layout actually contends.
oferchen added a commit that referenced this pull request May 18, 2026
Adds an opt-in per-thread LIFO slab in front of BufferPool, gated behind
the new `thread-slab-pool` feature (default off). The slab keeps a
depth-bounded `Vec<Vec<u8>>` per thread with explicit slot and byte caps
(default 8 slots, 1 MiB), and falls through to the existing lock-free
central queue for cross-thread returns, overflow, and periodic donation
of long-lived buffers.

The public surface (BufferPool, BufferGuard, BorrowedBufferGuard Drop)
is unchanged - callers always go through `pool.return_buffer` and the
guard Drop signatures stay byte-for-byte identical. Switching between
the single-slot and slab backends happens at compile time inside
`thread_local_cache.rs`, so `pool.rs` reads the same on both paths.

Implements the design at docs/design/per-thread-buffer-slab.md merged
via #4230. Closes part of #1271 and #1370.
@oferchen oferchen deleted the docs/per-thread-buffer-slab-1271 branch May 19, 2026 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant