Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OutOfMemory error while doing a file upload of size 7GB through Netty #3559

Closed
karuturi opened this issue Apr 1, 2015 · 15 comments
Closed

Comments

@karuturi
Copy link

karuturi commented Apr 1, 2015

This is the exception I see

java.lang.OutOfMemoryError: Java heap space
    at io.netty.buffer.UnpooledHeapByteBuf.capacity(UnpooledHeapByteBuf.java:114)
    at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:251)
    at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:849)
    at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:841)
    at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:831)
    at io.netty.handler.codec.http.multipart.HttpPostMultipartRequestDecoder.offer(HttpPostMultipartRequestDecoder.java:337)
    at io.netty.handler.codec.http.multipart.HttpPostMultipartRequestDecoder.offer(HttpPostMultipartRequestDecoder.java:48)
    at io.netty.handler.codec.http.multipart.HttpPostRequestDecoder.offer(HttpPostRequestDecoder.java:236)
    at org.apache.cloudstack.storage.resource.HttpUploadServerHandler.channelRead0(HttpUploadServerHandler.java:200)
    at org.apache.cloudstack.storage.resource.HttpUploadServerHandler.channelRead0(HttpUploadServerHandler.java:65)
    at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
    at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
    at io.netty.handler.codec.MessageToMessageCodec.channelRead(MessageToMessageCodec.java:111)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:182)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
    at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
    at java.lang.Thread.run(Thread.java:745)

I see that its trying to allocate bytes for new capacity.

byte[] newArray = new byte[newCapacity];

But, why is this happening only for big files? Is it not releasing the existing memory?
Is there any max file size limit?

@rfielding
Copy link

HttpPostRequestDecoder is doing it wrong. The whole point of the interface is to iterate the metadata/data for each part in upload order, and to get a ChannelBuffer for the part rather than a byte array. But the way that the object is constructed defeats the purpose of this interface. Even for gigabyte sized files, there is no need to buffer more than a few kilobytes during the upload. i.e.: Memory usage should be proportional to the number of concurrent sessions, and be totally unrelated to the size of files being upload. There is a similar issue with realizing http responses into byte arrays.

Not only is HttpPostRequestDecoder doing it wrong, but other parts of the framework assume that the http request is parsed before it gets to your handler. Some URLs are known to upload and download large request. For those, we need to be certain that nothing beyond the http header is parsed ahead of time.

@jknack
Copy link

jknack commented Jun 7, 2016

Also ran into this problem with a 1.8GB file.

@rfielding
Copy link

@jknack I had run into not only this problem last year, but all the dependents on this code that also make this assumption (netty -> Finagle/Finatra -> user apps). I gave up on the framework eventually. This issue was closed, so you can assume that it will never be fixed.

http4s downloaded a 47GB file - though I didn't try uploading such a large file. Or you can use a raw http connection and implement your own multipart handling (it's not that hard if you get rid of the layers of abstraction). Actually, I eventually moved on to Go, where I could control the details of the http connection completely. The problem is that you only need buffers large enough to absorb incoming traffic without creating delays, which is really only a few kB, and stream them the way that http was designed.

@Scottmitch
Copy link
Member

I'm not familiar with the multipart decoder but it is concerning that we must hold so much in memory ... @normanmaurer @trustin - any thoughts?

@jknack
Copy link

jknack commented Jun 7, 2016

Here is the stacktrace for 4.1.x

Caused by: java.lang.OutOfMemoryError: Java heap space
    at io.netty.buffer.UnpooledHeapByteBuf.copy(UnpooledHeapByteBuf.java:473) ~[netty-buffer-4.1.0.CR7.jar:4.1.0.CR7]
    at io.netty.handler.codec.http.multipart.HttpPostMultipartRequestDecoder.readFileUploadByteMultipart(HttpPostMultipartRequestDecoder.java:1492) ~[netty-codec-http-4.1.0.CR7.jar:4.1.0.CR7]
    at io.netty.handler.codec.http.multipart.HttpPostMultipartRequestDecoder.getFileUpload(HttpPostMultipartRequestDecoder.java:902) ~[netty-codec-http-4.1.0.CR7.jar:4.1.0.CR7]
    at io.netty.handler.codec.http.multipart.HttpPostMultipartRequestDecoder.decodeMultipart(HttpPostMultipartRequestDecoder.java:563) ~[netty-codec-http-4.1.0.CR7.jar:4.1.0.CR7]
    at io.netty.handler.codec.http.multipart.HttpPostMultipartRequestDecoder.findMultipartDisposition(HttpPostMultipartRequestDecoder.java:804) ~[netty-codec-http-4.1.0.CR7.jar:4.1.0.CR7]
    at io.netty.handler.codec.http.multipart.HttpPostMultipartRequestDecoder.decodeMultipart(HttpPostMultipartRequestDecoder.java:501) ~[netty-codec-http-4.1.0.CR7.jar:4.1.0.CR7]
    at io.netty.handler.codec.http.multipart.HttpPostMultipartRequestDecoder.findMultipartDelimiter(HttpPostMultipartRequestDecoder.java:657) ~[netty-codec-http-4.1.0.CR7.jar:4.1.0.CR7]
    at io.netty.handler.codec.http.multipart.HttpPostMultipartRequestDecoder.decodeMultipart(HttpPostMultipartRequestDecoder.java:488) ~[netty-codec-http-4.1.0.CR7.jar:4.1.0.CR7]
    at io.netty.handler.codec.http.multipart.HttpPostMultipartRequestDecoder.parseBodyMultipart(HttpPostMultipartRequestDecoder.java:453) ~[netty-codec-http-4.1.0.CR7.jar:4.1.0.CR7]
    at io.netty.handler.codec.http.multipart.HttpPostMultipartRequestDecoder.parseBody(HttpPostMultipartRequestDecoder.java:422) ~[netty-codec-http-4.1.0.CR7.jar:4.1.0.CR7]
    at io.netty.handler.codec.http.multipart.HttpPostMultipartRequestDecoder.offer(HttpPostMultipartRequestDecoder.java:347) ~[netty-codec-http-4.1.0.CR7.jar:4.1.0.CR7]
    at io.netty.handler.codec.http.multipart.HttpPostMultipartRequestDecoder.<init>(HttpPostMultipartRequestDecoder.java:193) ~[netty-codec-http-4.1.0.CR7.jar:4.1.0.CR7]
    at io.netty.handler.codec.http.multipart.HttpPostRequestDecoder.<init>(HttpPostRequestDecoder.java:97) ~[netty-codec-http-4.1.0.CR7.jar:4.1.0.CR7]
    at io.netty.handler.codec.http.multipart.HttpPostRequestDecoder.<init>(HttpPostRequestDecoder.java:68) ~[netty-codec-http-4.1.0.CR7.jar:4.1.0.CR7]

The HttpPostRequestDecoder was created like:

HttpPostRequestDecoder decoder = new HttpPostRequestDecoder(new DefaultHttpDataFactory(true), req);

Where req is a FullHttpRequest

@rfielding
Copy link

@Scottmitch it's the general issue of holding things in byte arrays. I spent quite a few days debug-stepping through Netty+Finagle, trying to troubleshoot large file uploads and downloads. Increasing max size allowed just expands memory usage so that we run out of memory.

The first problem was trying to read entire request (head and body) into byte arrays, and the other problem was serializing objects into byte arrays before sending them out. The ChannelBuffer abstraction is supposed to prevent this, but there are locations that turn the channel buffers into byte arrays. This can easily take thousands of times more memory than is actually required.

@Scottmitch
Copy link
Member

@rfielding @jknack - Thanks for the additional info. I understand the issue but I don't have time to dig in at the moment. @rfielding - Did you come up with alternative approaches though your investigation into Netty+Finagle and do you have interest in submitting a PR?

@rfielding
Copy link

rfielding commented Jun 7, 2016

@Scottmitch I had made a change to Netty that fixed the problem for myself in Netty, but then higher layers that used Netty also presumed that they could access uploaded items in any order (which therefore presumes that they are sitting in a hashtable), which caused Finagle to crash before it even reached my handler.

So, I unfortunately had to switch frameworks to meet my requirements. I can now do multi-gigabyte file up/down (even on a RaspberryPi) and stay under a few hundred kilobytes while doing so (I did it in Go, but think I could have done it with Java.) The main thing is using ChannelBuffers to avoid holding objects in memory. HTTP bodies do not fit into memory. Multipart mime must be handled with an iterator, because the structure of the request is regular like: HTTPHEAD (MPHEAD MPBODY)* ... where MPBODY must be yield an IO handle (ChannelBuffer, etc). Similarly for returning data, there can never be places where you are expected to realize the entire object in memory before sending it back.

You can't get high concurrency if 1 session is hogging all of memory, and there is no benefit of using more memory than is required for the intermediate buffer. Java threads consume a lot of that memory already. And if it's high throughput, you can't forget that a lot of memory use is hidden in the kernel in the form of unacked packets (roughly roundtriptime*bandwidth bytes per client).

Something to note during my initial performance investigation:

  • The OS read itself almost always yields about 4k at a time(!!).
  • When there is a lot going on I have seen it yield 32k.
  • If your input rate is 3MB/s, then you need to retire that data ( to disk, S3, db, etc) at the same rate. If there is no variability, then you don't need much more than the small 32k buffer. But because of variability, you may need to keep buffering data while retiring the data is stalled.
  • There should be NO correlation at all between file size and memory consumption, but between the memory consumption and the number of concurrent sessions. At best, the correlation means when files are larger, sessions are more likely to overlap in time.
  • There will be application-specific data structures that use a lot of memory, but the framework should generally not be consuming much memory of its own.

Even though I effectively deserted Java over this (my main project is now Go), I am affected by it, because I need to write REST APIs that are called from within these frameworks. My change to HttpPostRequestDecoder was to essentially make a new copy of it and change the constructor and the implementation so that I iterated it in-order to get those ChannelBuffers and drain them out to disk.

@jknack
Copy link

jknack commented Jun 7, 2016

@rfielding agreed. My app uses http://jooby.org which has support for multiple servers including undertow, jetty and netty (netty is the default web server).

As a workaround, I switched to undertow and had no problem with big uploads.

@fredericBregier
Copy link
Member

DefaultHttpDataFactory has a specific argument:

If setting this, you shall not have any memory issue (except if there is a bug in coding), that's why there are 3 implementations of attributes (memory based - default - , disk based and mixed one).

Perhaps chaning the default behavior would be a nice thing, preventing this kind of issue?

@fredericBregier
Copy link
Member

fredericBregier commented Dec 30, 2016

Quick note: I note that you set "true" so meaning using disk based, so if there is such a memory issue, it means that there is a bug. However, looking at the code again on DiskFileUpload and its abstract AbstractDiskHttpData, it should be using File append and not memory append.

So there might be something else.

@rfielding
Copy link

That's good that at least the large files can be written to disk. (Go's framework actually works like this by default.... where files over 10MB spill out to disk to free up memory.) There is still a problem if the file is being written to disk. If the file comes in over SSL, you don't want to spill the beans by leaving the plaintext version of it sitting in a cache on the filesystem. If you just get filehandles for incoming files in the order they arrive, you can push them through the appropriate ciphers on their way to disk or S3, etc.

@normanmaurer
Copy link
Member

Let me close this as it had no activity for a long time... Please re-open if you still think there is an issue

@ysde
Copy link

ysde commented Oct 24, 2018

Hi, has this netty oom problem been fixed ?

Thank you

@hyperxpro
Copy link
Contributor

Same issue here too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants