Performance of `ArrayRetainableByteBufferPool.acquire()` can degenerate pathologically as the buckets grow in size #9311

lorban · 2023-02-03T14:53:48Z

Jetty version(s)
12

Description
Since ArrayRetainableByteBufferPool.acquire() is backed by ConcurrentPool.acquire(), each execution requires iterating over a number of entries proportional to the usage level of the bucket, i.e.: a bucket that is 90% used requires iterating over 90% of the entries on average.

When the number of buffers in a single bucket is quite small (in the 100s), those iterations are so cheap they have a negligible cost as the array is small enough to fit into the CPU cache. But when that number of buffers reaches the 10s of thousands, these iterations become a pathological bottleneck.

The text was updated successfully, but these errors were encountered:

lorban · 2023-02-03T15:00:07Z

This was found in MultiPartServletTest.testLargePart() when the default buffer pool is used as this test needs 262144 buffers (2GB split in 8KB) to complete.

With the ArrayRetainableByteBufferPool.acquire() implementation being:

ConcurrentPool.acquire()
if null, reserve and allocate

this means it would require 262144! (yes; factorial of 262144, which approaches infinity) compareAndSet calls for the test to eventually finish.

@lachlan-roberts could you please add a TODO referring to this issue to your test that uses a non-pooling pool as a workaround?

lorban · 2023-02-03T15:19:08Z

The root of this issue seems to be that the algorithm backing ConcurrentPool does not scale well to large sizes.

There are two major users of the ConcurrentPool:

connection pooling
This seems to be less of a problem for this use-case as a connection pool sized at 1000 entries is quite substantial and very uncommon, and acquiring from it may not happen millions of times per second while it mostly contains busy connections sounds like a very niche use-case.
It is nevertheless a possibility, even if a remote one.
buffer pooling
Having 10's of thousands of buffers of a specific size (especially the small ones < 16KB) doesn't sound unrealistic nor very uncommon so solving this issue seems more pressing.
Since ConcurrentPool is fully abstracted by ArrayRetainableByteBuffer we could add some mitigation strategies to handle this problem, like for instance:
- hard capping the # of buffer per bucket to a few hundreds
- adding another oeju.Pool with a different algorithm that scales well with large bucket sizes (e.g.: the previous queue-based pool algorithm would) and making ArrayRetainableByteBuffer to choose between the two based on a heuristic

Maybe we also should expose a constant like ConcurrentPool.MAX_REASONABLE_SIZE set to a few 100s and warn when a concurrent pool is instantiated with a size over that constant?

lorban · 2023-02-03T15:19:18Z

@gregw @sbordet

Signed-off-by: Lachlan Roberts <lachlan@webtide.com>

sbordet · 2023-02-06T17:23:42Z

Re-instating a Pool implementation based on a queue seems the best solution.

It would have the con that it will be either based on ConcurrentLinkedDeque (and hence a linked-list data structure, not particularly memory friendly), or on a locked queue (and it will be contended).

Another idea could be to see if using a bitmap would help, see class java.util.BitSet.
With simple binary operations (that typically are mapped to single CPU operations) would be possible to find the first idle entry in the list.
For example, let 1 be non-idle and 0 be idle, if the bitmap is 11111110000 would be possible to find quickly that the first 0 is at index 7 (from left).

gregw · 2023-02-06T20:43:23Z

There are 2 reasons for using a Pool:

to put a limit on the total number of resources that can be allocated
to avoid stressing the GC algorithm by frequently allocating/releasing objects that can easily be reused.

So for ByteBuffers we don't really want 1. because we don't want to block any allocation in any part of the code. Instead we police maximum memory commitment by other limits (on numbers/sizes).
Thus we are really only interested in 2., so if we start finding ourselves implementing algorithms with similar problems as GC (finding holes in large memory areas), then we should probably not bother and just delegate to the smart folks who write GCs

Let's setup a meeting to discuss the options.

gregw · 2023-02-06T21:34:46Z

Currently the algorithm in ArrayByteBufferPool (aka ArrayRetainableByteBufferPool is:

if no Bucket for the size/type
- return newly allocated buffer
if we can acquire an entry from the bucket
- return it's buffer
if we can reserve an new entry from the bucket
- return newly enabled entry with a newly allocated buffer and releaser
return newly allocated buffer

I think there is scope to add in a queue as a "second level cache":

if no Bucket for the size/type
- return newly allocated buffer
if we can acquire an entry from the bucket
- return it's buffer
if we can reserve an new entry from the bucket
- return newly enabled entry with a newly allocated buffer and releaser
if we can pop a buffer from the secondary queue (held by the bucket)
- return it
return newly allocated buffer

Using buffers from a queue not only saves on the allocation of the ByteBuffer, but also on the allocation of the retainable wrapper, potentially costing the allocation of the queue structure.

So we would configure this pool with:

a maximum bucket pool size (possibly 0), which must be small enough to be efficient
a maximum bucket queue size (possible 0 or infinite), to control how many buffers are in the secondary "cache"

gregw · 2023-02-06T22:23:16Z

See #9319 for a queued 2nd layer "cache" on the pool.

gregw · 2023-02-07T05:40:16Z

See also #9320 that is a Pool.Wrapper that can apply a queue over any other pool. I like this better, but it might not be as fast as #9319, but it is more general.

After chatting with Simone, I think the key thing to do now is to remove the bucket size from the ByteBufferPool constructor (or at least deprecate that constructor) and work out the pool size vs queue size heuristically.

lorban · 2023-02-07T15:21:59Z

I like #9320 better but we should probably benchmark the differences with #9319, possibly both with a micro-benchmark as well as a macro one (our perf test?) as the introduction of ArrayRetainableByteBuffer in 10.0.x was motivated because it brought a noticeable perf boost over the previous queue-based pool and we don't want to go back to the previous level of perf.

I agree that bucket size should be an internal-only concern that we should figure out with a heuristic.

It looks like such change to ArrayRetainableByteBuffer would slightly move it from the land of buffer pools to the land of memory allocators.

gregw · 2023-02-07T22:29:11Z

@lorban please take over those branches for further work on this.

…x-documentation-operations-logging * upstream/jetty-12.0.x: (35 commits) Fixes jetty#9326 - Rename DecryptedEndPoint to SslEndPoint. Jetty 10 Upgrade to Hazelcast 5 and totally disable auto join multicast etc.. (fix build on CI) (jetty#9331) jetty#9328 - changes from review jetty#9287 - catch error in ee9 maxRequestSize MultiPart test Jetty 12.0.x 9301 fix ee10 jstl jpms (jetty#9321) Issue jetty#9301 Fix dependencies for ee10-glassfish-jstl module (jetty#9303) Jetty 12 Hazelcast 5.x and disable auto detection/multicast" (jetty#9332) jetty#9287 - fix further test failures Fixed imports. Issue jetty#7650 - Fix race condition when stopping QueuedThreadPool (jetty#9325) jetty#9287 - remove unpaired release of Content.Chunk Issue jetty#8991 - rename websocket isDemanding() method to isAutoDemanding() Issue jetty#9287 - fix failing tests changes f rom review add todo to revert to normal pool after fix for jetty#9311 Issue jetty#9309 - Introducing test for requestlog format with spaces use non-pooling RetainableByteBufferPool to work around performance bug consumeAvailable should use number of reads instead of bytes fix for retainable merge changes from review ...

- fixed double-release bug in MultiPartFormData - used ByteBufferPool.NonPooling pool to work around #9311 Signed-off-by: Ludovic Orban <lorban@bitronix.be>

#9408: restored HugeResourceTest: - fixed double-release bug in MultiPartFormData - used ByteBufferPool.NonPooling pool to work around #9311 Signed-off-by: Ludovic Orban <lorban@bitronix.be>

This adds a QueuedPool that can wrap any other pool and provided a queue of Entries as a kind of cache.

lorban added Bug For general bugs on Jetty side Jetty 12 labels Feb 3, 2023

joakime added the Performance label Feb 3, 2023

lachlan-roberts added a commit that referenced this issue Feb 6, 2023

add todo to revert to normal pool after fix for #9311

1610cd3

Signed-off-by: Lachlan Roberts <lachlan@webtide.com>

gregw mentioned this issue Feb 6, 2023

ByteBufferPool with Queue #9319

Closed

gregw mentioned this issue Feb 7, 2023

Pool Wrapper with Queue #9320

Merged

lorban mentioned this issue Feb 23, 2023

HugeResourceTest failing #9408

Closed

lorban added a commit that referenced this issue Feb 23, 2023

#9408: restored HugeResourceTest:

ab2934e

- fixed double-release bug in MultiPartFormData - used ByteBufferPool.NonPooling pool to work around #9311 Signed-off-by: Ludovic Orban <lorban@bitronix.be>

lorban mentioned this issue Feb 23, 2023

Fix HugeResourceTest #9411

Merged

lorban added a commit that referenced this issue Feb 24, 2023

#9408: restored HugeResourceTest:

bbbeed7

- fixed double-release bug in MultiPartFormData - used ByteBufferPool.NonPooling pool to work around #9311 Signed-off-by: Ludovic Orban <lorban@bitronix.be>

lorban added a commit that referenced this issue Feb 27, 2023

Fix HugeResourceTest (#9411)

e819809

#9408: restored HugeResourceTest: - fixed double-release bug in MultiPartFormData - used ByteBufferPool.NonPooling pool to work around #9311 Signed-off-by: Ludovic Orban <lorban@bitronix.be>

lorban pushed a commit that referenced this issue Feb 27, 2023

#9311 Add Queued and Compound Pool implementations

1122b79

This adds a QueuedPool that can wrap any other pool and provided a queue of Entries as a kind of cache.

lorban pushed a commit that referenced this issue Feb 27, 2023

#9311 Add Queued and Compound Pool implementations

474136d

This adds a QueuedPool that can wrap any other pool and provided a queue of Entries as a kind of cache.

lorban self-assigned this Feb 27, 2023

lorban closed this as completed Feb 27, 2023

lorban linked a pull request Feb 27, 2023 that will close this issue

Pool Wrapper with Queue #9320

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance of `ArrayRetainableByteBufferPool.acquire()` can degenerate pathologically as the buckets grow in size #9311

Performance of `ArrayRetainableByteBufferPool.acquire()` can degenerate pathologically as the buckets grow in size #9311

lorban commented Feb 3, 2023 •

edited

lorban commented Feb 3, 2023 •

edited

lorban commented Feb 3, 2023

lorban commented Feb 3, 2023

sbordet commented Feb 6, 2023

gregw commented Feb 6, 2023

gregw commented Feb 6, 2023

gregw commented Feb 6, 2023

gregw commented Feb 7, 2023

lorban commented Feb 7, 2023

gregw commented Feb 7, 2023

Performance of ArrayRetainableByteBufferPool.acquire() can degenerate pathologically as the buckets grow in size #9311

Performance of ArrayRetainableByteBufferPool.acquire() can degenerate pathologically as the buckets grow in size #9311

Comments

lorban commented Feb 3, 2023 • edited

lorban commented Feb 3, 2023 • edited

lorban commented Feb 3, 2023

lorban commented Feb 3, 2023

sbordet commented Feb 6, 2023

gregw commented Feb 6, 2023

gregw commented Feb 6, 2023

gregw commented Feb 6, 2023

gregw commented Feb 7, 2023

lorban commented Feb 7, 2023

gregw commented Feb 7, 2023

Performance of `ArrayRetainableByteBufferPool.acquire()` can degenerate pathologically as the buckets grow in size #9311

Performance of `ArrayRetainableByteBufferPool.acquire()` can degenerate pathologically as the buckets grow in size #9311

lorban commented Feb 3, 2023 •

edited

lorban commented Feb 3, 2023 •

edited