Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of ArrayRetainableByteBufferPool.acquire() can degenerate pathologically as the buckets grow in size #9311

Closed
lorban opened this issue Feb 3, 2023 · 10 comments · Fixed by #9320
Assignees
Labels
Bug For general bugs on Jetty side Performance

Comments

@lorban
Copy link
Contributor

lorban commented Feb 3, 2023

Jetty version(s)
12

Description
Since ArrayRetainableByteBufferPool.acquire() is backed by ConcurrentPool.acquire(), each execution requires iterating over a number of entries proportional to the usage level of the bucket, i.e.: a bucket that is 90% used requires iterating over 90% of the entries on average.

When the number of buffers in a single bucket is quite small (in the 100s), those iterations are so cheap they have a negligible cost as the array is small enough to fit into the CPU cache. But when that number of buffers reaches the 10s of thousands, these iterations become a pathological bottleneck.

@lorban lorban added Bug For general bugs on Jetty side Jetty 12 labels Feb 3, 2023
@lorban
Copy link
Contributor Author

lorban commented Feb 3, 2023

This was found in MultiPartServletTest.testLargePart() when the default buffer pool is used as this test needs 262144 buffers (2GB split in 8KB) to complete.

With the ArrayRetainableByteBufferPool.acquire() implementation being:

  1. ConcurrentPool.acquire()
  2. if null, reserve and allocate

this means it would require 262144! (yes; factorial of 262144, which approaches infinity) compareAndSet calls for the test to eventually finish.

@lachlan-roberts could you please add a TODO referring to this issue to your test that uses a non-pooling pool as a workaround?

@lorban
Copy link
Contributor Author

lorban commented Feb 3, 2023

The root of this issue seems to be that the algorithm backing ConcurrentPool does not scale well to large sizes.

There are two major users of the ConcurrentPool:

  1. connection pooling
    This seems to be less of a problem for this use-case as a connection pool sized at 1000 entries is quite substantial and very uncommon, and acquiring from it may not happen millions of times per second while it mostly contains busy connections sounds like a very niche use-case.
    It is nevertheless a possibility, even if a remote one.

  2. buffer pooling
    Having 10's of thousands of buffers of a specific size (especially the small ones < 16KB) doesn't sound unrealistic nor very uncommon so solving this issue seems more pressing.
    Since ConcurrentPool is fully abstracted by ArrayRetainableByteBuffer we could add some mitigation strategies to handle this problem, like for instance:

    • hard capping the # of buffer per bucket to a few hundreds
    • adding another oeju.Pool with a different algorithm that scales well with large bucket sizes (e.g.: the previous queue-based pool algorithm would) and making ArrayRetainableByteBuffer to choose between the two based on a heuristic

Maybe we also should expose a constant like ConcurrentPool.MAX_REASONABLE_SIZE set to a few 100s and warn when a concurrent pool is instantiated with a size over that constant?

@lorban
Copy link
Contributor Author

lorban commented Feb 3, 2023

@gregw @sbordet

lachlan-roberts added a commit that referenced this issue Feb 6, 2023
Signed-off-by: Lachlan Roberts <lachlan@webtide.com>
@sbordet
Copy link
Contributor

sbordet commented Feb 6, 2023

Re-instating a Pool implementation based on a queue seems the best solution.

It would have the con that it will be either based on ConcurrentLinkedDeque (and hence a linked-list data structure, not particularly memory friendly), or on a locked queue (and it will be contended).

Another idea could be to see if using a bitmap would help, see class java.util.BitSet.
With simple binary operations (that typically are mapped to single CPU operations) would be possible to find the first idle entry in the list.
For example, let 1 be non-idle and 0 be idle, if the bitmap is 11111110000 would be possible to find quickly that the first 0 is at index 7 (from left).

@gregw
Copy link
Contributor

gregw commented Feb 6, 2023

There are 2 reasons for using a Pool:

  1. to put a limit on the total number of resources that can be allocated
  2. to avoid stressing the GC algorithm by frequently allocating/releasing objects that can easily be reused.

So for ByteBuffers we don't really want 1. because we don't want to block any allocation in any part of the code. Instead we police maximum memory commitment by other limits (on numbers/sizes).
Thus we are really only interested in 2., so if we start finding ourselves implementing algorithms with similar problems as GC (finding holes in large memory areas), then we should probably not bother and just delegate to the smart folks who write GCs

Let's setup a meeting to discuss the options.

@gregw
Copy link
Contributor

gregw commented Feb 6, 2023

Currently the algorithm in ArrayByteBufferPool (aka ArrayRetainableByteBufferPool is:

  • if no Bucket for the size/type
    • return newly allocated buffer
  • if we can acquire an entry from the bucket
    • return it's buffer
  • if we can reserve an new entry from the bucket
    • return newly enabled entry with a newly allocated buffer and releaser
  • return newly allocated buffer

I think there is scope to add in a queue as a "second level cache":

  • if no Bucket for the size/type
    • return newly allocated buffer
  • if we can acquire an entry from the bucket
    • return it's buffer
  • if we can reserve an new entry from the bucket
    • return newly enabled entry with a newly allocated buffer and releaser
  • if we can pop a buffer from the secondary queue (held by the bucket)
    • return it
  • return newly allocated buffer

Using buffers from a queue not only saves on the allocation of the ByteBuffer, but also on the allocation of the retainable wrapper, potentially costing the allocation of the queue structure.

So we would configure this pool with:

  • a maximum bucket pool size (possibly 0), which must be small enough to be efficient
  • a maximum bucket queue size (possible 0 or infinite), to control how many buffers are in the secondary "cache"

@gregw
Copy link
Contributor

gregw commented Feb 6, 2023

See #9319 for a queued 2nd layer "cache" on the pool.

@gregw
Copy link
Contributor

gregw commented Feb 7, 2023

See also #9320 that is a Pool.Wrapper that can apply a queue over any other pool. I like this better, but it might not be as fast as #9319, but it is more general.

After chatting with Simone, I think the key thing to do now is to remove the bucket size from the ByteBufferPool constructor (or at least deprecate that constructor) and work out the pool size vs queue size heuristically.

@lorban
Copy link
Contributor Author

lorban commented Feb 7, 2023

I like #9320 better but we should probably benchmark the differences with #9319, possibly both with a micro-benchmark as well as a macro one (our perf test?) as the introduction of ArrayRetainableByteBuffer in 10.0.x was motivated because it brought a noticeable perf boost over the previous queue-based pool and we don't want to go back to the previous level of perf.

I agree that bucket size should be an internal-only concern that we should figure out with a heuristic.

It looks like such change to ArrayRetainableByteBuffer would slightly move it from the land of buffer pools to the land of memory allocators.

@gregw
Copy link
Contributor

gregw commented Feb 7, 2023

@lorban please take over those branches for further work on this.

gregpoulos pushed a commit to gregpoulos/jetty.project that referenced this issue Feb 9, 2023
…x-documentation-operations-logging

* upstream/jetty-12.0.x: (35 commits)
  Fixes jetty#9326 - Rename DecryptedEndPoint to SslEndPoint.
  Jetty 10 Upgrade to Hazelcast 5 and totally disable auto join multicast etc.. (fix build on CI) (jetty#9331)
  jetty#9328 - changes from review
  jetty#9287 - catch error in ee9 maxRequestSize MultiPart test
  Jetty 12.0.x 9301 fix ee10 jstl jpms (jetty#9321)
  Issue jetty#9301 Fix dependencies for ee10-glassfish-jstl module (jetty#9303)
  Jetty 12 Hazelcast 5.x and disable auto detection/multicast" (jetty#9332)
  jetty#9287 - fix further test failures
  Fixed imports.
  Issue jetty#7650 - Fix race condition when stopping QueuedThreadPool (jetty#9325)
  jetty#9287 - remove unpaired release of Content.Chunk
  Issue jetty#8991 - rename websocket isDemanding() method to isAutoDemanding()
  Issue jetty#9287 - fix failing tests
  changes f rom review
  add todo to revert to normal pool after fix for jetty#9311
  Issue jetty#9309 - Introducing test for requestlog format with spaces
  use non-pooling RetainableByteBufferPool to work around performance bug
  consumeAvailable should use number of reads instead of bytes
  fix for retainable merge
  changes from review
  ...
lorban added a commit that referenced this issue Feb 23, 2023
 - fixed double-release bug in MultiPartFormData
 - used ByteBufferPool.NonPooling pool to work around #9311

Signed-off-by: Ludovic Orban <lorban@bitronix.be>
lorban added a commit that referenced this issue Feb 24, 2023
 - fixed double-release bug in MultiPartFormData
 - used ByteBufferPool.NonPooling pool to work around #9311

Signed-off-by: Ludovic Orban <lorban@bitronix.be>
lorban added a commit that referenced this issue Feb 27, 2023
#9408: restored HugeResourceTest:
 - fixed double-release bug in MultiPartFormData
 - used ByteBufferPool.NonPooling pool to work around #9311

Signed-off-by: Ludovic Orban <lorban@bitronix.be>
lorban pushed a commit that referenced this issue Feb 27, 2023
This adds a QueuedPool that can wrap any other pool and provided a queue of Entries as a kind of cache.
lorban pushed a commit that referenced this issue Feb 27, 2023
This adds a QueuedPool that can wrap any other pool and provided a queue of Entries as a kind of cache.
@lorban lorban self-assigned this Feb 27, 2023
@lorban lorban closed this as completed Feb 27, 2023
@lorban lorban linked a pull request Feb 27, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For general bugs on Jetty side Performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants