-
-
Notifications
You must be signed in to change notification settings - Fork 15.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OutOfMemory error while doing a file upload of size 7GB through Netty #3559
Comments
HttpPostRequestDecoder is doing it wrong. The whole point of the interface is to iterate the metadata/data for each part in upload order, and to get a ChannelBuffer for the part rather than a byte array. But the way that the object is constructed defeats the purpose of this interface. Even for gigabyte sized files, there is no need to buffer more than a few kilobytes during the upload. i.e.: Memory usage should be proportional to the number of concurrent sessions, and be totally unrelated to the size of files being upload. There is a similar issue with realizing http responses into byte arrays. Not only is HttpPostRequestDecoder doing it wrong, but other parts of the framework assume that the http request is parsed before it gets to your handler. Some URLs are known to upload and download large request. For those, we need to be certain that nothing beyond the http header is parsed ahead of time. |
Also ran into this problem with a 1.8GB file. |
@jknack I had run into not only this problem last year, but all the dependents on this code that also make this assumption (netty -> Finagle/Finatra -> user apps). I gave up on the framework eventually. This issue was closed, so you can assume that it will never be fixed. http4s downloaded a 47GB file - though I didn't try uploading such a large file. Or you can use a raw http connection and implement your own multipart handling (it's not that hard if you get rid of the layers of abstraction). Actually, I eventually moved on to Go, where I could control the details of the http connection completely. The problem is that you only need buffers large enough to absorb incoming traffic without creating delays, which is really only a few kB, and stream them the way that http was designed. |
I'm not familiar with the multipart decoder but it is concerning that we must hold so much in memory ... @normanmaurer @trustin - any thoughts? |
Here is the stacktrace for 4.1.x
The HttpPostRequestDecoder was created like: HttpPostRequestDecoder decoder = new HttpPostRequestDecoder(new DefaultHttpDataFactory(true), req); Where |
@Scottmitch it's the general issue of holding things in byte arrays. I spent quite a few days debug-stepping through Netty+Finagle, trying to troubleshoot large file uploads and downloads. Increasing max size allowed just expands memory usage so that we run out of memory. The first problem was trying to read entire request (head and body) into byte arrays, and the other problem was serializing objects into byte arrays before sending them out. The ChannelBuffer abstraction is supposed to prevent this, but there are locations that turn the channel buffers into byte arrays. This can easily take thousands of times more memory than is actually required. |
@rfielding @jknack - Thanks for the additional info. I understand the issue but I don't have time to dig in at the moment. @rfielding - Did you come up with alternative approaches though your investigation into Netty+Finagle and do you have interest in submitting a PR? |
@Scottmitch I had made a change to Netty that fixed the problem for myself in Netty, but then higher layers that used Netty also presumed that they could access uploaded items in any order (which therefore presumes that they are sitting in a hashtable), which caused Finagle to crash before it even reached my handler. So, I unfortunately had to switch frameworks to meet my requirements. I can now do multi-gigabyte file up/down (even on a RaspberryPi) and stay under a few hundred kilobytes while doing so (I did it in Go, but think I could have done it with Java.) The main thing is using ChannelBuffers to avoid holding objects in memory. HTTP bodies do not fit into memory. Multipart mime must be handled with an iterator, because the structure of the request is regular like: HTTPHEAD (MPHEAD MPBODY)* ... where MPBODY must be yield an IO handle (ChannelBuffer, etc). Similarly for returning data, there can never be places where you are expected to realize the entire object in memory before sending it back. You can't get high concurrency if 1 session is hogging all of memory, and there is no benefit of using more memory than is required for the intermediate buffer. Java threads consume a lot of that memory already. And if it's high throughput, you can't forget that a lot of memory use is hidden in the kernel in the form of unacked packets (roughly roundtriptime*bandwidth bytes per client). Something to note during my initial performance investigation:
Even though I effectively deserted Java over this (my main project is now Go), I am affected by it, because I need to write REST APIs that are called from within these frameworks. My change to HttpPostRequestDecoder was to essentially make a new copy of it and change the constructor and the implementation so that I iterated it in-order to get those ChannelBuffers and drain them out to disk. |
@rfielding agreed. My app uses http://jooby.org which has support for multiple servers including undertow, jetty and netty (netty is the default web server). As a workaround, I switched to undertow and had no problem with big uploads. |
DefaultHttpDataFactory has a specific argument:
If setting this, you shall not have any memory issue (except if there is a bug in coding), that's why there are 3 implementations of attributes (memory based - default - , disk based and mixed one). Perhaps chaning the default behavior would be a nice thing, preventing this kind of issue? |
Quick note: I note that you set "true" so meaning using disk based, so if there is such a memory issue, it means that there is a bug. However, looking at the code again on DiskFileUpload and its abstract AbstractDiskHttpData, it should be using File append and not memory append. So there might be something else. |
That's good that at least the large files can be written to disk. (Go's framework actually works like this by default.... where files over 10MB spill out to disk to free up memory.) There is still a problem if the file is being written to disk. If the file comes in over SSL, you don't want to spill the beans by leaving the plaintext version of it sitting in a cache on the filesystem. If you just get filehandles for incoming files in the order they arrive, you can push them through the appropriate ciphers on their way to disk or S3, etc. |
Let me close this as it had no activity for a long time... Please re-open if you still think there is an issue |
Hi, has this netty oom problem been fixed ? Thank you |
Same issue here too. |
This is the exception I see
I see that its trying to allocate bytes for new capacity.
But, why is this happening only for big files? Is it not releasing the existing memory?
Is there any max file size limit?
The text was updated successfully, but these errors were encountered: