CQL compression might cause reactor stalls on buffer allocation #13437

michoecho · 2023-04-04T17:35:14Z

Compression libraries operate on contiguous buffers. We sometimes compress large chunks of data, which causes a need for large contiguous buffers, which are problematic for the allocator.

The allocator impact can be minimized by reusing buffers. This is the case e.g. for out CQL frame compression routines.
However, as proven by https://github.com/scylladb/scylla-enterprise/issues/2694, this reuse isn't aggressive enough — even if the buffers could be reused perfectly, they are periodically reallocated (once per 100000 uses) anyway, often enough for the resulting stalls to be worrisome.

Large contiguous buffers put large pressure on the allocator and are a common source of reactor stalls. Therefore, Scylla avoids their use, replacing it with fragmented buffers whenever possible. However, the use of large contiguous buffers is impossible to avoid when dealing with some external libraries (i.e. some compression libraries, like LZ4). Fortunately, calls to external libraries are synchronous, so we can minimize the allocator impact by reusing a single buffer between calls. An implementation of such a reusable buffer has two conflicting goals: to allocate as rarely as possible, and to waste as little memory as possible. The bigger the buffer, the more likely that it will be able to handle future requests without reallocation, but also the memory memory it ties up. If request sizes are repetitive, the near-optimal solution is to simply resize the buffer up to match the biggest seen request, and never resize down. However, if we anticipate pathologically large requests, which are caused by an application/configuration bug and are never repeated again after they are fixed, we might want to resize down after such pathological requests stop, so that the memory they took isn't tied up forever. The current implementation of reusable buffers handles this by resizing down to 0 every 100'000 requests. This patch attempts to solve a few shortcomings of the current implementation. 1. Resizing to 0 is too aggressive. During regular operation, we will surely need to resize it back to the previous size again. If something is allocated in the hole left by the old buffer, this might cause a stall. We prefer to resize down only after pathological requests. 2. When resizing, the current implementation allocates the new buffer before freeing the old one. This increases allocator pressure for no reason. 3. When resizing up, the buffer is resized to exactly the requested size. That is, if the current size is 1MiB, following requests of 1MiB+1B and 1MiB+2B will both cause a resize. It's preferable to limit the set of possible sizes so that every reset doesn't tend to cause multiple resizes of almost the same size. The natural set of sizes is powers of 2, because that's what the underlying buddy allocator uses. No waste is caused by rounding up the allocation to a power of 2. 4. The interval of 100'000 uses is both too low and too arbitrary. This is up for discussion, but I think that it's preferable to base the dynamics of the buffer on time, rather than the number of uses. It's more predictable to humans. The implementation proposed in this patch addresses these as follows: 1. Instead of resizing down to 0, we resize to the biggest size seen in the last period. As long as at least one maximal (up to a power of 2) "normal" request appears each period, the buffer will never have to be resized. 2. The capacity of the buffer is always rounded up to the nearest power of 2. 3. The resize down period is no longer measured in number of requests but in real time. Additionally, since a shared buffer in asynchronous code is quite a footgun, some rudimentary refcounting is added to assert that only one reference to the buffer exists at a time, and that the buffer isn't downsized while a reference to it exists. Fixes scylladb#13437

avikivity · 2023-05-14T17:49:31Z

Not backporting, performance only and not a regression.

michoecho mentioned this issue Apr 4, 2023

utils: redesign reusable_buffer #13324

Merged

scylladb-promoter closed this as completed in bf26a8c Apr 27, 2023

scylladb-promoter added the Backport candidate label Apr 27, 2023

DoronArazii added this to the 5.3 milestone May 1, 2023

avikivity removed the Backport candidate label May 14, 2023

mykaul mentioned this issue Dec 7, 2023

Counter updates timeouts & bad_alloc #16187

Closed

1 task

mykaul modified the milestones: 5.3, 5.4 Dec 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CQL compression might cause reactor stalls on buffer allocation #13437

CQL compression might cause reactor stalls on buffer allocation #13437

michoecho commented Apr 4, 2023

avikivity commented May 14, 2023

CQL compression might cause reactor stalls on buffer allocation #13437

CQL compression might cause reactor stalls on buffer allocation #13437

Comments

michoecho commented Apr 4, 2023

avikivity commented May 14, 2023