From d51ae22b65a7f1ee6dfe506f18e3eb3ba6ca74c3 Mon Sep 17 00:00:00 2001 From: Ofer Chen Date: Tue, 5 May 2026 22:52:27 +0300 Subject: [PATCH] docs(audits): BufferPool capacity selection and sizing assumptions (#1637) Audit current pool defaults (COPY_BUFFER_SIZE = 128 KiB, max_buffers = available_parallelism()), document the workloads they were tuned for, and call out where the defaults underprovision (1M-file parallel bursts) and overprovision (sequential single-threaded transfers, long-lived daemon mode). Cover the adaptive grow/shrink policy from #1638-#1641, the OC_RSYNC_BUFFER_POOL_SIZE and OC_RSYNC_BUFFER_POOL_STATS env vars from #1643, and the memory cap interaction from #1188. Close with sizing rules of thumb for operators and open questions for the pending #1642 benchmark. --- docs/audits/buffer-pool-capacity-sizing.md | 433 +++++++++++++++++++++ 1 file changed, 433 insertions(+) create mode 100644 docs/audits/buffer-pool-capacity-sizing.md diff --git a/docs/audits/buffer-pool-capacity-sizing.md b/docs/audits/buffer-pool-capacity-sizing.md new file mode 100644 index 000000000..a372eff76 --- /dev/null +++ b/docs/audits/buffer-pool-capacity-sizing.md @@ -0,0 +1,433 @@ +# BufferPool capacity selection and sizing assumptions + +Last verified: 2026-05-05 against +`crates/engine/src/local_copy/buffer_pool/{mod,pool,global,pressure,memory_cap,allocator,thread_local_cache,throughput,guard}.rs`, +`crates/engine/src/local_copy/mod.rs`, +`crates/engine/src/local_copy/context_impl/state.rs`, +`crates/engine/src/local_copy/executor/file/copy/transfer/execute.rs`, +`crates/engine/src/local_copy/executor/directory/parallel_checksum.rs`, +`crates/engine/src/lib.rs`, `crates/transfer/tests/buffer_pool_cross_crate.rs`, +and benches under `crates/engine/benches/`. + +Tracking issue: #1637. Related: #1010-#1012 (global pool, completed), +#1187-#1189 (memory cap, completed), #1265 (lock-free swap, completed), +#1295 (sharded design, completed), #1297 (benchmark, pending), #1329-#1330 +(`ArrayQueue` migration, completed), #1336 (split guard, completed), #1342 +(`BufferAllocator` trait, completed), #1363 (dynamic capacity, completed), +#1638-#1641 (adaptive sizing, completed), #1642 (adaptive bench, pending), +#1643 (CLI/env override, completed), #2045 (io_uring + adaptive design, +in flight). + +## Scope + +Document where the `BufferPool` capacity numbers come from, where they hold, +where they break down, what the adaptive policy added in #1638-#1641 does +and does not solve, and what knobs are exposed to operators today via the +`OC_RSYNC_BUFFER_POOL_SIZE` and `OC_RSYNC_BUFFER_POOL_STATS` environment +variables wired in #1643. The output is a sizing reference for the pending +#1642 benchmark and a pre-flight checklist for the #2045 io_uring fixed +buffer registration design. + +## 1. Current default capacity + +Two distinct quantities matter here. Conflating them is the most common +mistake in tuning the pool. + +- **Buffer size (per-buffer byte length)**: 128 KiB, set by the constant + `COPY_BUFFER_SIZE = 128 * 1024` at + `crates/engine/src/local_copy/mod.rs:165`. This is also the value of + `ADAPTIVE_BUFFER_MEDIUM` at `crates/engine/src/local_copy/mod.rs:172` + and is the default `buffer_size` field initialized in + `BufferPool::new()` at `crates/engine/src/local_copy/buffer_pool/pool.rs:172`. +- **Pool capacity (number of buffers retained)**: `available_parallelism()` + with a fallback of 4, returned by `BufferPool::default()` at + `crates/engine/src/local_copy/buffer_pool/pool.rs:835-840` and used by + `GlobalBufferPoolConfig::default()` at + `crates/engine/src/local_copy/buffer_pool/global.rs:57-71`. + +Two further constants govern the queue's underlying storage and the +adaptive resizer's hard limits: + +- `DEFAULT_QUEUE_CAPACITY = 256` at + `crates/engine/src/local_copy/buffer_pool/pool.rs:32`. The lock-free + `crossbeam_queue::ArrayQueue` is sized to the larger of `max_buffers` + and this constant via `queue_capacity()` at + `crates/engine/src/local_copy/buffer_pool/pool.rs:40-42`. The soft cap + is enforced separately on return. +- `MAX_CAPACITY = 256` and `MIN_CAPACITY = 2` at + `crates/engine/src/local_copy/buffer_pool/pressure.rs:53` and + `crates/engine/src/local_copy/buffer_pool/pressure.rs:47`. These bound + the adaptive resizer's grow and shrink decisions. + +The adaptive size table is at +`crates/engine/src/local_copy/mod.rs:168-180` and dispatched by +`adaptive_buffer_size()` at `crates/engine/src/local_copy/mod.rs:203-215`: +8 KiB, 32 KiB, 128 KiB, 512 KiB, 1 MiB, switching at 64 KiB, 1 MiB, 64 MiB, +and 256 MiB file-size thresholds. + +## 2. Where the value came from + +The numbers are not arbitrary but they were not chosen against a +representative workload either. The trail in the comments and history is: + +- `COPY_BUFFER_SIZE = 128 * 1024` predates the buffer pool and matches the + `read`/`write` block size used historically across the local-copy + executor. The constant doubles as `ADAPTIVE_BUFFER_MEDIUM` and is the + bucket selected for files between 1 MiB and 64 MiB. Tests at + `crates/engine/src/local_copy/buffer_pool/tests.rs:290-292` assert the + invariant `ADAPTIVE_BUFFER_MEDIUM == COPY_BUFFER_SIZE` so that the + thread-local fast path can short-circuit when an adaptive request lands + in the medium bucket (see `acquire_adaptive_from()` at + `crates/engine/src/local_copy/buffer_pool/pool.rs:432-449`). +- `available_parallelism()` was introduced in PR #2979 (commit + `dfeba4e3a`, "feat: implement global bounded buffer pool singleton") + and motivated by the rayon thread pool topology: one buffer per worker + thread is the saturation point for the dominant workload (one file per + worker at a time). +- `DEFAULT_QUEUE_CAPACITY = 256` was sized to match `MAX_CAPACITY = 256` + in the adaptive resizer landed by PR #3248 (commit `a1c4ec3cf`, + "feat: add adaptive BufferPool resizing based on allocation pressure"). + The comment at `crates/engine/src/local_copy/buffer_pool/pool.rs:30-31` + cites 8 MiB of pooled memory at 64 buffers x 128 KiB; the upper bound + reaches 32 MiB at 256 buffers x 128 KiB, matching the budget cited at + `crates/engine/src/local_copy/buffer_pool/pressure.rs:51-52`. +- The env var `OC_RSYNC_BUFFER_POOL_SIZE` and the telemetry counters + printed via `OC_RSYNC_BUFFER_POOL_STATS=1` landed in PR #3253 + (commit `6edfd1667`, "feat: add BufferPool telemetry counters and env + var pool sizing"). See + `crates/engine/src/local_copy/buffer_pool/global.rs:49`, + `crates/engine/src/local_copy/buffer_pool/global.rs:61-65`, and the + drop-time print at + `crates/engine/src/local_copy/buffer_pool/pool.rs:849-861`. +- The throughput tracker came in PR #3032 (commit `9687f9434`, + "feat: add EMA throughput tracker and dynamic buffer sizing to + BufferPool"). It is opt-in via `with_throughput_tracking()` at + `crates/engine/src/local_copy/buffer_pool/pool.rs:270-273` and is not + active by default. + +## 3. Workloads it was tuned for + +The defaults assume a steady-state local-copy run that looks like a rayon +parallel walk with one in-flight buffer per worker thread: + +- **100k small files in parallel**: `parallel_checksum.rs:92` pulls the + global pool once and threads `Arc` into `hash_file_contents` + at `crates/engine/src/local_copy/executor/directory/parallel_checksum.rs:142,151,153`. + Each worker holds one buffer, returns it on file end, and the + thread-local slot at + `crates/engine/src/local_copy/buffer_pool/thread_local_cache.rs:24-27` + serves the next acquire with zero synchronization. The central queue + is touched only on the first acquire per thread. +- **1 GiB single-file local copy**: the executor at + `crates/engine/src/local_copy/executor/file/copy/transfer/execute.rs:376-379` + takes one buffer via `BufferPool::acquire_adaptive_from()` for the + duration of the copy. With `available_parallelism()` >= 1, that fits + trivially. The 1 MiB `ADAPTIVE_BUFFER_HUGE` bucket at + `crates/engine/src/local_copy/mod.rs:175-180` halves the syscall count + on the read/write fallback path versus `ADAPTIVE_BUFFER_LARGE`. +- **Multi-thread parallel stat / parallel checksum**: same pattern as the + 100k case. Workers process candidates sequentially and recycle a single + buffer through the thread-local slot. The lock-free `ArrayQueue` only + comes into play when a thread retires its slot while another thread's + slot is full, an uncommon event in steady state. + +In all three cases the buffer pool's role is amortizing the +`vec![0u8; 128 * 1024]` that `DefaultAllocator::allocate()` at +`crates/engine/src/local_copy/buffer_pool/allocator.rs:51-53` would +otherwise issue per file. The hot-path per-acquire cost in steady state +is one `RefCell` borrow plus one `set_len()` (see the unsafe block at +`crates/engine/src/local_copy/buffer_pool/pool.rs:550-556` that elides +the `resize(size, 0)` memset that profiling measured at 26 % of runtime +before #3253). + +## 4. Where it underprovisions + +The default capacity equals hardware parallelism, which is exactly enough +for the "one buffer per worker" assumption and nothing more. Workloads +that violate that assumption see allocator pressure that the pool cannot +absorb until the adaptive resizer reacts: + +- **1M small files in parallel**: rayon spawns `available_parallelism()` + workers but the thread-local cache is per OS thread, not per rayon + task. Bursts of `par_iter()` work can route returns through the + central queue. With `max_buffers = N_cpus`, every return past the + thread-local slot lands directly at the soft cap. Subsequent acquires + on a cold thread allocate fresh through `pop_buffer()` at + `crates/engine/src/local_copy/buffer_pool/pool.rs:621-641`. Miss rate + spikes during workload acceleration until the adaptive resizer fires + at the next 64-op boundary. +- **Sub-tasking inside a single file**: e.g. the parallel checksum + pipeline acquiring a second buffer for verify or hash-strong rehash. + With one slot per thread, the second acquire goes to the central + queue or a fresh allocation. At `max_buffers = N_cpus` the central + queue is empty in steady state, so the second-buffer path is + fresh-allocate every time. +- **Memory-cap'd configurations with `try_acquire_from()`**: the + non-blocking variant at + `crates/engine/src/local_copy/buffer_pool/pool.rs:391-422` returns + `None` at the cap. Callers that retry without backoff (e.g. the + io_uring submission loop in #2045) will burn CPU spinning. There is + no test coverage today for the rate-limited acquire pattern; #1642 + should add it. + +The grow path doubles capacity at most once every 64 acquires (see +`CHECK_INTERVAL = 64` at +`crates/engine/src/local_copy/buffer_pool/pressure.rs:29`). Going from 8 +buffers to the 256 ceiling takes five doublings, which is 320 acquires +of accumulated pressure, plus the 64 ops between checks. On a 1M-file +workload this is invisible. On a short burst it is the entire workload. + +## 5. Where it overprovisions + +The other failure mode is keeping memory pinned that nothing will reuse. +The pool's idle footprint is bounded but not negligible: + +- **Sequential single-threaded transfers**: a `oc-rsync src dst` invocation + with `RAYON_NUM_THREADS=1` still picks `max_buffers = N_cpus` from + `available_parallelism()`. On a 16-core host that is 16 x 128 KiB = + 2 MiB of central pool capacity that the workload will never fill, + plus one TLS slot in active use. +- **Long-lived process with bursty workloads**: the soft cap is the + retention target for the central queue, not a high-water mark. Once + the adaptive resizer has grown the pool to a peak, the shrink path at + `crates/engine/src/local_copy/buffer_pool/pressure.rs:155-163` only + fires when `utilization < 30 %` and `miss_rate < 10 %` simultaneously. + A workload that oscillates between idle and saturated misses the + shrink window and stays at peak. +- **Daemon mode with short-lived sessions**: each session inherits the + process-wide singleton (see + `crates/engine/src/local_copy/context_impl/state.rs:36`), so the pool + retains the worst-case sizing across sessions. There is no per-session + reset. + +The hard ceiling at 256 buffers x 128 KiB caps idle memory at 32 MiB, +which is safe for a server-class deployment but not negligible on a +constrained embedded target. The 4 KiB / 256 KiB clamp in +`recommended_buffer_size()` at +`crates/engine/src/local_copy/buffer_pool/pool.rs:326-339` interacts +poorly with `ADAPTIVE_BUFFER_HUGE = 1 MiB`: a throughput-tracked pool +would never recommend a 1 MiB buffer even when the file size adaptive +table would. + +## 6. Adaptive grow/shrink policy from #1638-#1641 + +Implemented in `crates/engine/src/local_copy/buffer_pool/pressure.rs` +and wired into the acquire path via `pop_buffer()` at +`crates/engine/src/local_copy/buffer_pool/pool.rs:621-641` and +`maybe_resize()` at +`crates/engine/src/local_copy/buffer_pool/pool.rs:650-678`. + +Policy summary: + +- **Check cadence**: every 64 acquires + (`CHECK_INTERVAL` at `crates/engine/src/local_copy/buffer_pool/pressure.rs:29`, + power of two for bitwise modular check at + `crates/engine/src/local_copy/buffer_pool/pressure.rs:107-110`). +- **Grow trigger**: `miss_rate > 20 %` + (`MISS_RATE_GROW_THRESHOLD` at + `crates/engine/src/local_copy/buffer_pool/pressure.rs:35`). New + capacity is `min(current * 2, 256)` per + `crates/engine/src/local_copy/buffer_pool/pressure.rs:136-142`. +- **Shrink trigger**: `utilization < 30 %` AND `miss_rate < 10 %` + (`UTILIZATION_SHRINK_THRESHOLD` at + `crates/engine/src/local_copy/buffer_pool/pressure.rs:41` combined with + the `MISS_RATE_GROW_THRESHOLD / 2` guard at + `crates/engine/src/local_copy/buffer_pool/pressure.rs:155-156`). New + capacity is `max(current / 2, 2)`. +- **Shrink reclamation**: excess buffers above the new cap are popped + and deallocated immediately at + `crates/engine/src/local_copy/buffer_pool/pool.rs:664-676`, decrementing + `central_count` per reclamation. + +What it solves: + +- Mismatched defaults on hosts where `N_cpus` is much smaller than the + effective parallelism (e.g. a workload that fans out via + `rayon::scope` and exceeds `available_parallelism()` briefly). +- Long-running daemon processes whose workload mix changes shape over + time. Periodic re-evaluation eventually converges to the new + steady-state size. + +What it does not solve: + +- **Cold-start bursts**: the first 64 ops on a fresh pool are evaluated + but cannot trigger growth because there are no prior samples; the + resizer's first useful evaluation is at op 128. +- **Churn under uneven workloads**: a workload that alternates between + bursts and idle in cycles shorter than `CHECK_INTERVAL` ops will not + reach a stable size. The shrink path's dual threshold (utilization + AND low miss rate) prevents the worst thrashing, but the pool can + still oscillate between sizes that are both wrong for the average + workload. +- **Per-thread starvation**: the resizer adjusts the central queue's + soft cap. It cannot influence which thread holds a TLS slot, so a + workload with fewer hot threads than buffers leaves the queue full + while a few threads spin acquire-allocate-deallocate. +- **Hard memory cap interaction**: the resizer has no view of + `MemoryCap::outstanding()`. Growing the soft cap when checked-out + memory is already at the hard cap is a no-op against acquire + blocking, but it does increase eventual idle memory after returns. + +## 7. CLI/env override surface (#1643) + +There is no `--buffer-pool-size` CLI flag today. The override surface is +two environment variables, both consumed by the engine crate without +proxying through `cli` or `core`. + +- **`OC_RSYNC_BUFFER_POOL_SIZE`** at + `crates/engine/src/local_copy/buffer_pool/global.rs:49`. Parsed at + `GlobalBufferPoolConfig::default()` at + `crates/engine/src/local_copy/buffer_pool/global.rs:61-65`: + - Type: positive `usize`. Zero, negative, and non-numeric values are + silently ignored and the pool falls back to + `available_parallelism()`. The behaviour is fixed by tests at + `crates/engine/src/local_copy/buffer_pool/global.rs:243-289`. + - Default: `available_parallelism()` with a fallback of 4. + - Bounds: lower bound 1 (zero is rejected). Upper bound is whatever + the OS allocator can serve x the per-buffer 128 KiB cost. The + adaptive resizer's `MAX_CAPACITY = 256` does not cap the env-var + value because the env var sets the soft cap directly. A user who + sets `OC_RSYNC_BUFFER_POOL_SIZE=10000` gets a pool that retains up + to 10000 buffers (1.25 GiB at 128 KiB each), although the underlying + `ArrayQueue` will be sized accordingly via + `queue_capacity()` at `crates/engine/src/local_copy/buffer_pool/pool.rs:40-42`. + - Read once at first access. Setting the variable after the singleton + has been initialized has no effect. +- **`OC_RSYNC_BUFFER_POOL_STATS`** at + `crates/engine/src/local_copy/buffer_pool/pool.rs:850`. Boolean (`"1"` + enables, anything else is off). Checked only at pool drop, prints + `reuses`, `allocations`, `growths`, and `hit_rate` on stderr. No + effect on pool behaviour, telemetry only. + +There is no programmatic flag plumbed through `core::CoreConfig` or +`cli::Args`. The `init_global_buffer_pool()` function at +`crates/engine/src/local_copy/buffer_pool/global.rs:112-120` is exposed +publicly for embedders but the binary entry points do not call it. + +## 8. Memory cap interaction (#1188) + +The memory cap is a hard upper bound on outstanding (checked-out) bytes +implemented in `crates/engine/src/local_copy/buffer_pool/memory_cap.rs`. +It is opt-in via `with_memory_cap()` at +`crates/engine/src/local_copy/buffer_pool/pool.rs:253-257` and is not +configured by default. The default `BufferPool` and the global singleton +both run uncapped. + +Interaction surface: + +- **What the cap counts**: bytes outstanding (checked out by callers). + Idle buffers in the central queue or in TLS slots do not count, since + they are immediately reusable. The accounting is at + `crates/engine/src/local_copy/buffer_pool/memory_cap.rs:14-22`. +- **Backpressure semantics**: `wait_and_reserve()` at + `crates/engine/src/local_copy/buffer_pool/memory_cap.rs:56-109` uses a + CAS fast path then a condvar slow path. Returners notify all waiters + via `track_return()` at + `crates/engine/src/local_copy/buffer_pool/memory_cap.rs:142-152`. +- **Adaptive growth interaction**: the adaptive resizer can grow the + soft cap above what the memory cap will admit. When this happens, + `acquire_from()` blocks at the cap regardless of the soft cap value. + No deadlock is possible because returns are unconditional, but + throughput collapses to the rate of returns. This is the failure + mode that #1642 must measure. +- **`recommended_buffer_size()` clamp**: at + `crates/engine/src/local_copy/buffer_pool/pool.rs:326-339`, the + recommended size is capped at `memory_cap / 4`. This protects against + a single buffer pinning a quarter of the cap, but it also caps the + recommendation below `ADAPTIVE_BUFFER_HUGE = 1 MiB` for any cap below + 4 MiB. + +The cap and the adaptive resizer were designed independently. They +compose correctly (no shared mutable state) but not optimally (the +resizer cannot detect cap-induced misses, since cap waits do not +register as pool misses; only fresh allocations do). + +## 9. Recommended sizing rules of thumb for users + +Given the analysis above, here are the heuristics worth documenting in +operator-facing notes: + +- **Default is correct for most local copies**. If `OC_RSYNC_BUFFER_POOL_SIZE` + is unset and the workload is one rsync invocation per minute or less, + the pool's idle memory is bounded by 256 x 128 KiB = 32 MiB and the + hot path is amortized. Do not tune. +- **Override when `N_cpus` >> hot threads**. On a 64-core host running + oc-rsync with `--no-detach` and a single-threaded workload, set + `OC_RSYNC_BUFFER_POOL_SIZE=4` or `8` to cap idle memory. +- **Override when sub-tasking is heavy**. Workloads using + `--checksum` plus `--whole-file` plus a deep parallel verify can hold + more than one buffer per thread. Set + `OC_RSYNC_BUFFER_POOL_SIZE=2*N_cpus` and verify with + `OC_RSYNC_BUFFER_POOL_STATS=1` that `hit_rate > 95 %` and `growths == 0`. +- **Avoid setting above 256 unless you have measured the win**. The + adaptive resizer caps growth at 256 for a reason; an env-var override + larger than that just disables the implicit safety bound on the + central queue. +- **Memory cap is for adversarial environments**. Containerized + deployments with strict cgroups limits should plumb a memory cap via + `init_global_buffer_pool()` plus a custom `with_memory_cap()` call; + the env var alone does not expose this. +- **Use stats output before tuning**. Run with + `OC_RSYNC_BUFFER_POOL_STATS=1` and inspect the stderr line at process + exit. A `hit_rate > 95 %` means the default is fine; below 80 % means + the workload is allocating fresh buffers more often than the resizer + can react. A non-zero `growths` count means the workload exceeded the + initial capacity at least once; if it grows every run, raise the env + var to the steady-state size. + +## 10. Open questions for #1642 benchmark + +The benchmark under #1642 should answer the following, none of which are +settled by the existing micro-benchmarks at +`crates/engine/benches/buffer_pool_benchmark.rs` or +`crates/engine/benches/buffer_pool_contention.rs`: + +1. What is the cold-start miss rate on a 1M-file workload at + `available_parallelism()` capacity, and how many `CHECK_INTERVAL` + boundaries elapse before the resizer reaches steady-state? The + theoretical lower bound is 5 doublings x 64 ops = 320 acquires; the + observed value depends on rayon scheduling. +2. Does the grow path ever actually fire on the dominant local-copy + workload, or is the global singleton's startup capacity already + sufficient? Use `total_growths()` at + `crates/engine/src/local_copy/buffer_pool/pool.rs:779-781`. +3. Does shrink ever fire in long-running daemon mode, or does idle + memory accrete monotonically? Add a workload alternation (saturate, + idle, saturate) and verify the resizer reaches the shrink threshold. +4. What is the cap-blocked acquire rate when `with_memory_cap()` is set + below `max_buffers x buffer_size`? This is not measurable from + `total_misses()` because cap waits do not record as misses; the + benchmark must instrument `MemoryCap::outstanding()` directly. +5. What is the measured win of the unsafe `set_len()` shortcut at + `crates/engine/src/local_copy/buffer_pool/pool.rs:550-556` over the + `resize(size, 0)` it replaces? The 26 % CPU figure cited in the + comment was measured pre-#3253; revalidate on the current code path + to justify keeping the unsafe block. +6. How does the adaptive policy interact with the throughput tracker's + `recommended_buffer_size()` when both are enabled? The recommendation + targets 10 ms of data per buffer + (`TARGET_BUFFER_DURATION_SECS = 0.01` at + `crates/engine/src/local_copy/buffer_pool/throughput.rs:48`), which + for a 1 GiB/s sustained throughput recommends 10 MiB clamped to the + `MAX_BUFFER_SIZE = 256 * 1024` ceiling at + `crates/engine/src/local_copy/buffer_pool/throughput.rs:42`. The + adaptive table for >= 256 MiB files would prefer 1 MiB buffers; the + tracker forces 256 KiB. Which wins on real hardware? +7. How does the soft cap interact with the lock-free `ArrayQueue`'s + fixed hard capacity when `OC_RSYNC_BUFFER_POOL_SIZE > 256`? The + queue is sized via `queue_capacity()` at + `crates/engine/src/local_copy/buffer_pool/pool.rs:40-42` to + `max(max_buffers, 256)`, so a value of 10000 produces a 10000-slot + queue and matching idle memory. Confirm via stress test that the + admission CAS at + `crates/engine/src/local_copy/buffer_pool/pool.rs:584-612` does not + regress when capacity is two orders of magnitude above the default. +8. What is the right default for the pending io_uring registered-buffer + path under #2045? The fixed-buffer registration cost amortizes only + if the pool's churn rate is low; the benchmark should establish a + baseline reuse rate against which #2045 can claim a win. + +The answers feed back into whether the 128 KiB / `N_cpus` defaults +should change, whether `OC_RSYNC_BUFFER_POOL_SIZE` should be promoted to +a `--buffer-pool-size` CLI flag, and whether the adaptive resizer's +thresholds need workload-specific overrides.