Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resteasy-reactive runs out of direct buffer memory when processing large multipart messages #26187

Closed
fedinskiy opened this issue Jun 16, 2022 · 25 comments · Fixed by #27805
Closed
Labels
area/resteasy-reactive kind/bug Something isn't working
Milestone

Comments

@fedinskiy
Copy link
Contributor

Describe the bug

I have two Rest Reactive methods, one accepts File and another multipart of File and String. If running on resource limit machine(this behavior first arose on Github Actions), the second method fails with cryptic error while processing files of 2048 MB size, while the first handles larger(2054 MB) files without problems.

Expected behavior

Method should fail with more concise error or not fail at all.

Actual behavior

Method fails with this error:

2022-06-16T13:28:18.2721622Z 13:28:18,223 INFO  [app] 13:28:18,105 Request failed: java.io.IOException: java.io.IOException: java.lang.OutOfMemoryError: Direct buffer memory
2022-06-16T13:28:18.2726667Z 13:28:18,224 INFO  [app] 	at org.jboss.resteasy.reactive.server.vertx.VertxInputStream$VertxBlockingInput.readBlocking(VertxInputStream.java:253)
2022-06-16T13:28:18.2730751Z 13:28:18,224 INFO  [app] 	at org.jboss.resteasy.reactive.server.vertx.VertxInputStream.readIntoBuffer(VertxInputStream.java:120)
2022-06-16T13:28:18.2735393Z 13:28:18,226 INFO  [app] 	at org.jboss.resteasy.reactive.server.vertx.VertxInputStream.read(VertxInputStream.java:82)
2022-06-16T13:28:18.2736182Z 13:28:18,227 INFO  [app] 	at org.jboss.resteasy.reactive.server.vertx.VertxInputStream.read(VertxInputStream.java:70)
2022-06-16T13:28:18.2795835Z 13:28:18,228 INFO  [app] 	at org.jboss.resteasy.reactive.server.core.multipart.MultiPartParserDefinition$MultiPartUploadHandler.parseBlocking(MultiPartParserDefinition.java:217)
2022-06-16T13:28:18.2806940Z 13:28:18,228 INFO  [app] 	at org.jboss.resteasy.reactive.server.handlers.FormBodyHandler.handle(FormBodyHandler.java:88)
2022-06-16T13:28:18.2811409Z 13:28:18,229 INFO  [app] 	at org.jboss.resteasy.reactive.server.handlers.FormBodyHandler.handle(FormBodyHandler.java:25)
2022-06-16T13:28:18.2816304Z 13:28:18,229 INFO  [app] 	at org.jboss.resteasy.reactive.common.core.AbstractResteasyReactiveContext.run(AbstractResteasyReactiveContext.java:141)
2022-06-16T13:28:18.2822693Z 13:28:18,229 INFO  [app] 	at io.quarkus.vertx.core.runtime.VertxCoreRecorder$13.runWith(VertxCoreRecorder.java:545)
2022-06-16T13:28:18.2826225Z 13:28:18,230 INFO  [app] 	at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2449)
2022-06-16T13:28:18.2830935Z 13:28:18,230 INFO  [app] 	at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1478)
2022-06-16T13:28:18.2836012Z 13:28:18,230 INFO  [app] 	at org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:29)
2022-06-16T13:28:18.2888414Z 13:28:18,231 INFO  [app] 	at org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:29)
2022-06-16T13:28:18.2892511Z 13:28:18,232 INFO  [app] 	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
2022-06-16T13:28:18.2898697Z 13:28:18,232 INFO  [app] 	at java.base/java.lang.Thread.run(Thread.java:829)
2022-06-16T13:28:18.2899383Z 13:28:18,232 INFO  [app] 	Suppressed: java.io.IOException: java.io.IOException: java.lang.OutOfMemoryError: Direct buffer memory
2022-06-16T13:28:18.2903591Z 13:28:18,232 INFO  [app] 		at org.jboss.resteasy.reactive.server.vertx.VertxInputStream$VertxBlockingInput.readBlocking(VertxInputStream.java:253)
2022-06-16T13:28:18.2904557Z 13:28:18,233 INFO  [app] 		at org.jboss.resteasy.reactive.server.vertx.VertxInputStream.readIntoBuffer(VertxInputStream.java:120)
2022-06-16T13:28:18.2914225Z 13:28:18,233 INFO  [app] 		at org.jboss.resteasy.reactive.server.vertx.VertxInputStream.close(VertxInputStream.java:148)
2022-06-16T13:28:18.2917218Z 13:28:18,233 INFO  [app] 		at org.jboss.resteasy.reactive.server.core.multipart.MultiPartParserDefinition$MultiPartUploadHandler.parseBlocking(MultiPartParserDefinition.java:214)
2022-06-16T13:28:18.2921708Z 13:28:18,233 INFO  [app] 		... 10 more
2022-06-16T13:28:18.2926611Z 13:28:18,233 INFO  [app] 	Caused by: java.io.IOException: java.lang.OutOfMemoryError: Direct buffer memory
2022-06-16T13:28:18.2931355Z 13:28:18,234 INFO  [app] 		at org.jboss.resteasy.reactive.server.vertx.VertxInputStream$VertxBlockingInput$2.handle(VertxInputStream.java:200)
2022-06-16T13:28:18.2934186Z 13:28:18,234 INFO  [app] 		at org.jboss.resteasy.reactive.server.vertx.VertxInputStream$VertxBlockingInput$2.handle(VertxInputStream.java:196)
2022-06-16T13:28:18.2938685Z 13:28:18,234 INFO  [app] 		at io.vertx.core.impl.AbstractContext.dispatch(AbstractContext.java:100)
2022-06-16T13:28:18.2943012Z 13:28:18,235 INFO  [app] 		at io.vertx.core.http.impl.HttpEventHandler.handleException(HttpEventHandler.java:89)
2022-06-16T13:28:18.2947915Z 13:28:18,235 INFO  [app] 		at io.vertx.core.http.impl.Http1xServerRequest.handleException(Http1xServerRequest.java:635)
2022-06-16T13:28:18.2948945Z 13:28:18,235 INFO  [app] 		at io.vertx.core.http.impl.Http1xServerConnection.handleException(Http1xServerConnection.java:479)
2022-06-16T13:28:18.2953455Z 13:28:18,235 INFO  [app] 		at io.vertx.core.net.impl.VertxHandler.exceptionCaught(VertxHandler.java:136)
2022-06-16T13:28:18.2957910Z 13:28:18,235 INFO  [app] 		at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302)
2022-06-16T13:28:18.2964868Z 13:28:18,236 INFO  [app] 		at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:281)
2022-06-16T13:28:18.2967697Z 13:28:18,236 INFO  [app] 		at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:273)
2022-06-16T13:28:18.2972847Z 13:28:18,236 INFO  [app] 		at io.netty.channel.DefaultChannelPipeline$HeadContext.exceptionCaught(DefaultChannelPipeline.java:1377)
2022-06-16T13:28:18.2973933Z 13:28:18,238 INFO  [app] 		at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302)
2022-06-16T13:28:18.2980565Z 13:28:18,239 INFO  [app] 		at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:281)
2022-06-16T13:28:18.2989152Z 13:28:18,239 INFO  [app] 		at io.netty.channel.DefaultChannelPipeline.fireExceptionCaught(DefaultChannelPipeline.java:907)
2022-06-16T13:28:18.2994600Z 13:28:18,239 INFO  [app] 		at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.handleReadException(AbstractNioByteChannel.java:125)
2022-06-16T13:28:18.2998533Z 13:28:18,239 INFO  [app] 		at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:177)
2022-06-16T13:28:18.3004558Z 13:28:18,239 INFO  [app] 		at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)
2022-06-16T13:28:18.3007277Z 13:28:18,239 INFO  [app] 		at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
2022-06-16T13:28:18.3014467Z 13:28:18,240 INFO  [app] 		at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
2022-06-16T13:28:18.3018855Z 13:28:18,240 INFO  [app] 		at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
2022-06-16T13:28:18.3023471Z 13:28:18,240 INFO  [app] 		at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
2022-06-16T13:28:18.3028808Z 13:28:18,240 INFO  [app] 		at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
2022-06-16T13:28:18.3032242Z 13:28:18,240 INFO  [app] 		... 2 more
2022-06-16T13:28:18.3035985Z 13:28:18,240 INFO  [app] 	Caused by: java.lang.OutOfMemoryError: Direct buffer memory
2022-06-16T13:28:18.3040268Z 13:28:18,240 INFO  [app] 		at java.base/java.nio.Bits.reserveMemory(Bits.java:175)
2022-06-16T13:28:18.3044246Z 13:28:18,241 INFO  [app] 		at java.base/java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:118)
2022-06-16T13:28:18.3048818Z 13:28:18,241 INFO  [app] 		at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:317)
2022-06-16T13:28:18.3049459Z 13:28:18,241 INFO  [app] 		at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:648)
2022-06-16T13:28:18.3053956Z 13:28:18,241 INFO  [app] 		at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:623)
2022-06-16T13:28:18.3054534Z 13:28:18,241 INFO  [app] 		at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:202)
2022-06-16T13:28:18.3058583Z 13:28:18,241 INFO  [app] 		at io.netty.buffer.PoolArena.tcacheAllocateNormal(PoolArena.java:186)
2022-06-16T13:28:18.3059067Z 13:28:18,241 INFO  [app] 		at io.netty.buffer.PoolArena.allocate(PoolArena.java:136)
2022-06-16T13:28:18.3063255Z 13:28:18,242 INFO  [app] 		at io.netty.buffer.PoolArena.allocate(PoolArena.java:126)
2022-06-16T13:28:18.3072936Z 13:28:18,242 INFO  [app] 		at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:394)
2022-06-16T13:28:18.3077717Z 13:28:18,242 INFO  [app] 		at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:188)
2022-06-16T13:28:18.3089443Z 13:28:18,246 INFO  [app] 		at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:179)
2022-06-16T13:28:18.3099295Z 13:28:18,246 INFO  [app] 		at io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:140)
2022-06-16T13:28:18.3111835Z 13:28:18,251 INFO  [app] 		at io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:120)
2022-06-16T13:28:18.3123149Z 13:28:18,252 INFO  [app] 		at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:150)
2022-06-16T13:28:18.3128101Z 13:28:18,252 INFO  [app] 		... 8 more
2022-06-16T13:28:18.3139424Z 13:28:18,252 INFO  [app] 	[CIRCULAR REFERENCE:java.io.IOException: java.lang.OutOfMemoryError: Direct buffer memory]

How to Reproduce?

This steps should be run on memory limited machine(preferably, VM)

  1. git clone https://github.com/fedinskiy/quarkus-test-suite.git -b reproducer/multipart_oom tests
  2. cd tests
  3. mvn -Dall-modules -pl http/rest-client-reactive/ clean verify -Dit.test=LargeFileHandlingIT
    Test uploadMultipart fails, tests uploadBigFileThroughClient and uploadThroughClient work.

Output of uname -a or ver

4.18.0-372.9.1.el8.x86_64

Output of java -version

openjdk 11.0.13 2021-10-19 LTS(also temurin 11.0.15)

GraalVM version (if different from Java)

No response

Quarkus version or git rev

2.7.6, 2.9.1 and ca836996b345a6cad93a79a6d8600c5dd809ed92

Build tool (ie. output of mvnw --version or gradlew --version)

3.8.3, 3.8.4 and 3.8.6

Additional information

CI logs:
https://github.com/quarkus-qe/quarkus-test-suite/runs/6919232274?check_suite_focus=true

On 4G VM, which is described in the environment section, the test do not threw any errors, but hangs with warnings like these, but with different thread numbers:

12:55:30,733 INFO  [app] Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "vert.x-eventloop-thread-2"
12:55:34,739 INFO  [app] Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "vertx-blocked-thread-checker"
12:55:38,741 INFO  [app] Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "executor-thread-0"
@fedinskiy fedinskiy added the kind/bug Something isn't working label Jun 16, 2022
@fedinskiy fedinskiy changed the title Resteaqsy-reactive runs out of direct buffer memory when processing large multipart messages Resteasy-reactive runs out of direct buffer memory when processing large multipart messages Jun 16, 2022
@quarkus-bot
Copy link

quarkus-bot bot commented Jun 17, 2022

/cc @FroMage, @geoand, @stuartwdouglas

@geoand
Copy link
Contributor

geoand commented Jun 17, 2022

@stuartwdouglas mind taking a look at this one?

@Sgitario
Copy link
Contributor

This issue is due to the usage of DirectByteBuffer in Netty.
The problem is that JDK will cache the byte buffer arrays with unlimited size.
One workaround is to configure the JDK to only cache smaller instances, for example, using -Djdk.nio.maxCachedBufferSize=262144, then JDK won't cache buffers larger than 256Kb. I tried this solution and now the test passed even after repeating the same test up to 20 times.

Having said this, I'm not sure how to approach this issue:
a. Either document that users need to disable buffers caching when dealing with large files
b. Or configuring Vert.x - Netty somehow, @geoand anybody who could help here?
c. Or providing a new property

If there is nothing we can do to tune Vert.x/Netty for a general use case (not only large files), I would say we should go for option a.

@geoand
Copy link
Contributor

geoand commented Jun 22, 2022

Thanks a lot!

I'll have to check it out

@stuartwdouglas
Copy link
Member

Any idea what is allocating the large buffers? This sounds like a bug, we should not be allocating large buffers.

@geoand
Copy link
Contributor

geoand commented Jun 23, 2022

To be honest, I'm not completely satisfied with the explanation, but haven't had time to look into it.
Hopefully I will today

@geoand
Copy link
Contributor

geoand commented Jun 23, 2022

How would I pass JVM args or system properties to the launched application with the testing framework in use?

@Sgitario
Copy link
Contributor

You can pass system properties by adding them here: https://github.com/quarkus-qe/quarkus-test-suite/blob/1f9a89d20514c269b4e87777dfb7dc62ef613f6b/http/rest-client-reactive/src/test/java/io/quarkus/ts/http/restclient/reactive/LargeFileHandlingIT.java#L50

But JVM args are not supported. Though ping me if you really need it and I will help here.

@geoand
Copy link
Contributor

geoand commented Jun 23, 2022

Thanks

@geoand
Copy link
Contributor

geoand commented Jun 23, 2022

I was not able to reproduce this under any combination of JVM options I tried.
Also, using -Dio.netty.leakDetection.level=paranoid did not yield any leaks.
Furthermore, I saw that the Netty PoolArena was only allocating 64k and 128k buffers, so nothing unusual.

@geoand
Copy link
Contributor

geoand commented Sep 1, 2022

Closing this as we could not reproduce it.

If you have newer information to provide, we will certainly take another look

@geoand geoand closed this as not planned Won't fix, can't repro, duplicate, stale Sep 1, 2022
@geoand geoand added the triage/invalid This doesn't seem right label Sep 1, 2022
@fedinskiy
Copy link
Contributor Author

@geoand what newer information you need to reproduce this bug?
I created this PR(with a reproducer), so you can run GH actions on it quarkus-qe/quarkus-test-suite#816. Is there anything else I can help you with?

@geoand
Copy link
Contributor

geoand commented Sep 1, 2022

I had tried multiple times to reproduce but could not

@rsvoboda
Copy link
Member

rsvoboda commented Sep 1, 2022

quarkus-qe/quarkus-test-suite#816 was created ~1 hour ago, so the issue is still present.

@geoand did you try to reproduce the issue using the reproducer provided by Fedor or did you use custom app/test?

@geoand
Copy link
Contributor

geoand commented Sep 1, 2022

I'll reopen but if I can't reproduce it there is not much I can do

@geoand geoand removed the triage/invalid This doesn't seem right label Sep 1, 2022
@geoand
Copy link
Contributor

geoand commented Sep 1, 2022

did you try to reproduce the issue using the reproducer provided by Fedor or did you use custom app/test?

I was using the reproducer

@fedinskiy
Copy link
Contributor Author

@geoand how did you recreate memory-limited environment? The reproducer doesn't work on my machine, that why I had to put it into GH Actions.

@geoand
Copy link
Contributor

geoand commented Sep 1, 2022

I don't remember. I'll have to retry it next week

@rsvoboda rsvoboda reopened this Sep 1, 2022
@geoand
Copy link
Contributor

geoand commented Sep 2, 2022

I can't reproduce this.
Furthermore, although I understand why a custom testing framework is being used here, it makes it very hard for anyone outside Quarkus QE to tinkering.

I propose creating a reproducer using that standard Quarkus testing facilities - that way we will be able to try various things.

@fedinskiy
Copy link
Contributor Author

fedinskiy commented Sep 5, 2022

@geoand I updated the linked reproducer[1] so it doesn't use test framework(see module "reproducer").

Also, I have an important addition: the problem does not appear, when uploadMultipart test is run alone, but do appear, when all tests in a class are ran. Hope that helps and sorry for any inconvenience.

[1] quarkus-qe/quarkus-test-suite#816

@geoand
Copy link
Contributor

geoand commented Sep 5, 2022

Thanks for the update.

I'll have a look soon

@geoand
Copy link
Contributor

geoand commented Sep 5, 2022

I was still not able to reproduce the problem, but I'll have a closer look tomorrow.

@geoand
Copy link
Contributor

geoand commented Sep 5, 2022

I was finally able to reproduce it

@geoand
Copy link
Contributor

geoand commented Sep 7, 2022

This is a pretty weird issue - I was able to track down the root cause (I'll write it up later).

Now I need to see how it can be addressed

geoand added a commit to geoand/quarkus that referenced this issue Sep 8, 2022
@rsvoboda
Copy link
Member

rsvoboda commented Sep 8, 2022

Vert.x component issue - eclipse-vertx/vert.x#4473

geoand added a commit to geoand/quarkus that referenced this issue Sep 8, 2022
@quarkus-bot quarkus-bot bot added this to the 2.13 - main milestone Sep 9, 2022
cescoffier added a commit that referenced this issue Sep 9, 2022
Prevent possible memory leak from Reactive REST Client multipart upload usage
@gsmet gsmet modified the milestones: 2.13 - main, 2.12.2.Final Sep 12, 2022
gsmet pushed a commit to gsmet/quarkus that referenced this issue Sep 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/resteasy-reactive kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants