Minimize get byte multipart and fix buffer reuse #11001

fredericBregier · 2021-02-07T15:39:43Z

Fix HttpPostMultipartRequestDecoder buffer usages and allocations

Motivation:

Method getByte(position) is too often called within the current implementation
of the HttpPostMultipartRequestDecoder.
This implies too much activities which is visible when PARANOID mode is active.
This is also true in standard mode.

Apply the same fix on buffer from HttpPostMultipartRequestDecoder to HttpPostStandardRequestDecoder
made previously.

Finally in order to ensure we do not rewrite already decoded HttpData when decoding
next ones within multipart, we must ensure the buffers are copied and not a retained slice.

Modifications:

Use the bytesBefore(...) method instead of getByte(pos) in order to limit the external
access to the underlying buffer by retrieving iteratively the beginning of a correct start
position.
It is used to find both LF/CRLF and delimiter.
2 methods in HttpPostBodyUtil were created for that.

The undecodedChunk is copied when adding a chunk to a DataMultipart is loaded.
The same buffer is also rewritten in order to release the copied memory part.

Result:

Just for note, for both Memory or Disk or Mixed mode factories, the release has to be done as:

for (InterfaceHttpData httpData: decoder.getBodyHttpDatas()) {
    httpData.release();
    factory.removeHttpDataFromClean(request, httpData);
}
factory.cleanAllHttpData();
decoder.destroy();

The memory used is minimal in Disk or Mixed mode. In Memory mode, a big file is still
in memory but not more in the undecodedChunk but its own buffer (copied).

In terms of benchmarking, the results are:

Original code Benchmark                                                             Mode  Cnt  Score    Error   Units
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigAdvancedLevel   thrpt    6  0,152 ±  0,100  ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigDisabledLevel   thrpt    6  0,543 ±  0,218  ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigParanoidLevel   thrpt    6  0,001 ±  0,001  ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigSimpleLevel     thrpt    6  0,615 ±  0,070  ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighAdvancedLevel  thrpt    6  0,114 ±  0,063  ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighDisabledLevel  thrpt    6  0,664 ±  0,034  ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighParanoidLevel  thrpt    6  0,001 ±  0,001  ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighSimpleLevel    thrpt    6  0,620 ±  0,140  ops/ms

New code Benchmark                                                                  Mode  Cnt  Score   Error   Units
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigAdvancedLevel   thrpt    6  4,253 ± 0,333  ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigDisabledLevel   thrpt    6  4,422 ± 0,250  ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigParanoidLevel   thrpt    6  0,877 ± 0,014  ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigSimpleLevel     thrpt    6  4,151 ± 0,481  ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighAdvancedLevel  thrpt    6  2,167 ± 0,098  ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighDisabledLevel  thrpt    6  2,520 ± 0,043  ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighParanoidLevel  thrpt    6  0,177 ± 0,003  ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighSimpleLevel    thrpt    6  2,419 ± 0,061  ops/ms

In short, using big file transfers, this is about 7 times faster with new code, while
using high number of HttpData, this is about 4 times faster with new code when using Simple Level.
When using Paranoid Level, using big file transfers, this is about 800 times faster with new code, while
using high number of HttpData, this is about 170 times faster with new code.

Small improvements are also added, concerning buffer allocation and unsueful check within MixedAttribute and charset extended usage.

Add some tests to check consistency in Http Codec Multipart

Motivation:

Underlying buffer usages might be erroneous when releasing them internaly
in HttpPostMultipartRequestDecoder.

2 bugs occurs:

Final File upload seems not to be of the right size
Memory, even in Disk mode, is increasing continuously, while it shouldn't

Modification:

Add some tests to check consistency for HttpPostMultipartRequestDecoder.
Add a package protected method for testing purpose only.

Results:

Without fixes, those tests failed. With the fixes, they passed.

Motivation: Underlying buffer usages might be erroneous when releasing them internaly in HttpPostMultipartRequestDecoder. 2 bugs occurs: 1) Final File upload seems not to be of the right size 2) Memory, even in Disk mode, is increasing continuously, while it shouldn't Modification: Add some tests to check consistency for HttpPostMultipartRequestDecoder. Add a package protected method for testing purpose only.

Motivation: Method `getByte(position)` is too often called within the current implementation of the HttpPostMultipartRequestDecoder. This implies too much activities which is visible when PARANOID mode is active. This is also true in standard mode. Apply the same fix on buffer from HttpPostMultipartRequestDecoder to HttpPostStandardRequestDecoder made previously. Finally in order to ensure we do not rewrite already decoded HttpData when decoding next ones within multipart, we must ensure the buffers are copied and not a retained slice. Modifications: Use the `bytesBefore(...)` method instead of `getByte(pos)` in order to limit the external access to the underlying buffer by retrieving iteratively the beginning of a correct start position. It is used to find both LF/CRLF and delimiter. 2 methods in HttpPostBodyUtil were created for that. The undecodedChunk is copied when adding a chunk to a DataMultipart is loaded. The same buffer is also rewritten in order to release the copied memory part. Result: Just for note, for both Memory or Disk or Mixed mode factories, the release has to be done as: for (InterfaceHttpData httpData: decoder.getBodyHttpDatas()) { httpData.release(); factory.removeHttpDataFromClean(request, httpData); } factory.cleanAllHttpData(); decoder.destroy(); The memory used is minimal in Disk or Mixed mode. In Memory mode, a big file is still in memory but not more in the undecodedChunk but its own buffer (copied). In terms of benchmarking, the results are: Original code Benchmark Mode Cnt Score Error Units HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigAdvancedLevel thrpt 6 0,152 ± 0,100 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigDisabledLevel thrpt 6 0,543 ± 0,218 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigParanoidLevel thrpt 6 0,001 ± 0,001 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigSimpleLevel thrpt 6 0,615 ± 0,070 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighAdvancedLevel thrpt 6 0,114 ± 0,063 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighDisabledLevel thrpt 6 0,664 ± 0,034 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighParanoidLevel thrpt 6 0,001 ± 0,001 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighSimpleLevel thrpt 6 0,620 ± 0,140 ops/ms New code Benchmark Mode Cnt Score Error Units HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigAdvancedLevel thrpt 6 4,037 ± 0,358 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigDisabledLevel thrpt 6 4,226 ± 0,471 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigParanoidLevel thrpt 6 0,875 ± 0,029 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigSimpleLevel thrpt 6 4,346 ± 0,275 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighAdvancedLevel thrpt 6 2,044 ± 0,020 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighDisabledLevel thrpt 6 2,278 ± 0,159 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighParanoidLevel thrpt 6 0,174 ± 0,004 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighSimpleLevel thrpt 6 2,370 ± 0,065 ops/ms In short, using big file transfers, this is about 7 times faster with new code, while using high number of HttpData, this is about 4 times faster with new code when using Simple Level. When using Paranoid Level, using big file transfers, this is about 800 times faster with new code, while using high number of HttpData, this is about 170 times faster with new code.

fredericBregier · 2021-02-07T15:41:15Z

I try to rewrite from the beginning the previous proposal.
I believe it is clearer and also I found some bugs still there (decoding in error, memory allocation in error) in the current actual implementation.

franz1981 · 2021-02-07T18:34:22Z

I see that I should have used this PR benchmark to test the effectiveness of #10737 in a more realistic use case too :)

fredericBregier · 2021-02-07T19:04:53Z

I see that I should have used this PR benchmark to test the effectiveness of #10737 in a more realistic use case too :)

@franz1981
I'm not sure to what point you think it is relevant but if it is using getByte when iterating on a position to find a value is quite consuming (mainly due to access being partially or totally logged within Netty to find bad releasing, but not only limited to that point): using bytesBefore seems more efficient.
If this is this point, yes, trying to not iterate on getByte is quite a good way (as it is in "almost" every language on file for instance to not read byte after byte).

In addition, we could implement a specific method within ByteBuf that allows to search for a "serie of bytes", but I think it might be too specific in my case so I decide to let this within HttpCodec as a static helper method.

franz1981 · 2021-02-07T19:21:12Z

I'm not sure to what point you think it is relevant

If you will use bytesBefore or indexOf you will likely get an additional speedup thanks to the PR I have linked ;)

normanmaurer · 2021-02-08T08:47:12Z

@fredericBregier does this replace #10982 ?

fredericBregier · 2021-02-08T10:03:36Z

@normanmaurer yes, I forgot to close the previous one.
It is a rewritten for both original #10623 and following ones.

chrisvest

I had some comments.

codec-http/src/main/java/io/netty/handler/codec/http/multipart/HttpPostBodyUtil.java

...ttp/src/main/java/io/netty/handler/codec/http/multipart/HttpPostMultipartRequestDecoder.java

codec-http/src/test/java/io/netty/handler/codec/http/multipart/HttpPostRequestDecoderTest.java

chrisvest · 2021-02-10T14:27:47Z

I forgot to mention: the benchmarks look great, and I like how many parts of the code became simpler. Nice work!

codec-http/src/main/java/io/netty/handler/codec/http/multipart/MixedAttribute.java

chrisvest

Had another comment, but otherwise I think this looks good.

Fix typo and naming, Charset usage, while to if, improve comments, smaller tests, factorized them and split tests according to factory type, optimize buffer allocation when given buffer is empty, improve findDelimiter, remove unuseful check in MixedAttribute Still pending AbstractSearchProcessorFactory usage to check

fredericBregier · 2021-02-13T10:11:00Z

@franz1981 Done, thank you !!

chrisvest · 2021-02-26T13:25:09Z

@fredericBregier Thanks! (sorry about the delay)

Motivation: - Underlying buffer usages might be erroneous when releasing them internaly in HttpPostMultipartRequestDecoder. 2 bugs occurs: 1) Final File upload seems not to be of the right size. 2) Memory, even in Disk mode, is increasing continuously, while it shouldn't. - Method `getByte(position)` is too often called within the current implementation of the HttpPostMultipartRequestDecoder. This implies too much activities which is visible when PARANOID mode is active. This is also true in standard mode. Apply the same fix on buffer from HttpPostMultipartRequestDecoder to HttpPostStandardRequestDecoder made previously. Finally in order to ensure we do not rewrite already decoded HttpData when decoding next ones within multipart, we must ensure the buffers are copied and not a retained slice. Modification: - Add some tests to check consistency for HttpPostMultipartRequestDecoder. Add a package protected method for testing purpose only. - Use the `bytesBefore(...)` method instead of `getByte(pos)` in order to limit the external access to the underlying buffer by retrieving iteratively the beginning of a correct start position. It is used to find both LF/CRLF and delimiter. 2 methods in HttpPostBodyUtil were created for that. The undecodedChunk is copied when adding a chunk to a DataMultipart is loaded. The same buffer is also rewritten in order to release the copied memory part. Result: Just for note, for both Memory or Disk or Mixed mode factories, the release has to be done as: for (InterfaceHttpData httpData: decoder.getBodyHttpDatas()) { httpData.release(); factory.removeHttpDataFromClean(request, httpData); } factory.cleanAllHttpData(); decoder.destroy(); The memory used is minimal in Disk or Mixed mode. In Memory mode, a big file is still in memory but not more in the undecodedChunk but its own buffer (copied). In terms of benchmarking, the results are: Original code Benchmark Mode Cnt Score Error Units HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigAdvancedLevel thrpt 6 0,152 ± 0,100 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigDisabledLevel thrpt 6 0,543 ± 0,218 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigParanoidLevel thrpt 6 0,001 ± 0,001 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigSimpleLevel thrpt 6 0,615 ± 0,070 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighAdvancedLevel thrpt 6 0,114 ± 0,063 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighDisabledLevel thrpt 6 0,664 ± 0,034 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighParanoidLevel thrpt 6 0,001 ± 0,001 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighSimpleLevel thrpt 6 0,620 ± 0,140 ops/ms New code Benchmark Mode Cnt Score Error Units HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigAdvancedLevel thrpt 6 4,037 ± 0,358 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigDisabledLevel thrpt 6 4,226 ± 0,471 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigParanoidLevel thrpt 6 0,875 ± 0,029 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigSimpleLevel thrpt 6 4,346 ± 0,275 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighAdvancedLevel thrpt 6 2,044 ± 0,020 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighDisabledLevel thrpt 6 2,278 ± 0,159 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighParanoidLevel thrpt 6 0,174 ± 0,004 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighSimpleLevel thrpt 6 2,370 ± 0,065 ops/ms In short, using big file transfers, this is about 7 times faster with new code, while using high number of HttpData, this is about 4 times faster with new code when using Simple Level. When using Paranoid Level, using big file transfers, this is about 800 times faster with new code, while using high number of HttpData, this is about 170 times faster with new code.

jameskleeh · 2021-04-06T16:00:29Z

This change broke the semantics of the decoder API. Will file an issue

Motivation: - Underlying buffer usages might be erroneous when releasing them internaly in HttpPostMultipartRequestDecoder. 2 bugs occurs: 1) Final File upload seems not to be of the right size. 2) Memory, even in Disk mode, is increasing continuously, while it shouldn't. - Method `getByte(position)` is too often called within the current implementation of the HttpPostMultipartRequestDecoder. This implies too much activities which is visible when PARANOID mode is active. This is also true in standard mode. Apply the same fix on buffer from HttpPostMultipartRequestDecoder to HttpPostStandardRequestDecoder made previously. Finally in order to ensure we do not rewrite already decoded HttpData when decoding next ones within multipart, we must ensure the buffers are copied and not a retained slice. Modification: - Add some tests to check consistency for HttpPostMultipartRequestDecoder. Add a package protected method for testing purpose only. - Use the `bytesBefore(...)` method instead of `getByte(pos)` in order to limit the external access to the underlying buffer by retrieving iteratively the beginning of a correct start position. It is used to find both LF/CRLF and delimiter. 2 methods in HttpPostBodyUtil were created for that. The undecodedChunk is copied when adding a chunk to a DataMultipart is loaded. The same buffer is also rewritten in order to release the copied memory part. Result: Just for note, for both Memory or Disk or Mixed mode factories, the release has to be done as: for (InterfaceHttpData httpData: decoder.getBodyHttpDatas()) { httpData.release(); factory.removeHttpDataFromClean(request, httpData); } factory.cleanAllHttpData(); decoder.destroy(); The memory used is minimal in Disk or Mixed mode. In Memory mode, a big file is still in memory but not more in the undecodedChunk but its own buffer (copied). In terms of benchmarking, the results are: Original code Benchmark Mode Cnt Score Error Units HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigAdvancedLevel thrpt 6 0,152 ± 0,100 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigDisabledLevel thrpt 6 0,543 ± 0,218 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigParanoidLevel thrpt 6 0,001 ± 0,001 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigSimpleLevel thrpt 6 0,615 ± 0,070 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighAdvancedLevel thrpt 6 0,114 ± 0,063 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighDisabledLevel thrpt 6 0,664 ± 0,034 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighParanoidLevel thrpt 6 0,001 ± 0,001 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighSimpleLevel thrpt 6 0,620 ± 0,140 ops/ms New code Benchmark Mode Cnt Score Error Units HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigAdvancedLevel thrpt 6 4,037 ± 0,358 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigDisabledLevel thrpt 6 4,226 ± 0,471 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigParanoidLevel thrpt 6 0,875 ± 0,029 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigSimpleLevel thrpt 6 4,346 ± 0,275 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighAdvancedLevel thrpt 6 2,044 ± 0,020 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighDisabledLevel thrpt 6 2,278 ± 0,159 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighParanoidLevel thrpt 6 0,174 ± 0,004 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighSimpleLevel thrpt 6 2,370 ± 0,065 ops/ms In short, using big file transfers, this is about 7 times faster with new code, while using high number of HttpData, this is about 4 times faster with new code when using Simple Level. When using Paranoid Level, using big file transfers, this is about 800 times faster with new code, while using high number of HttpData, this is about 170 times faster with new code.

fredericBregier added 2 commits February 7, 2021 13:49

fredericBregier requested review from normanmaurer and chrisvest February 7, 2021 15:39

fredericBregier marked this pull request as ready for review February 9, 2021 08:44

chrisvest reviewed Feb 10, 2021

View reviewed changes

chrisvest reviewed Feb 12, 2021

View reviewed changes

codec-http/src/main/java/io/netty/handler/codec/http/multipart/MixedAttribute.java Outdated Show resolved Hide resolved

chrisvest approved these changes Feb 12, 2021

View reviewed changes

fredericBregier force-pushed the MinimizeGetByteMultipartFixBufferReuse branch from b2757ae to 6daeb0c Compare February 12, 2021 10:50

fredericBregier mentioned this pull request Feb 15, 2021

OutOfDirectMemoryError for large uploads using HttpPostMultipartRequestDecoder #10973

Closed

chrisvest merged commit 1529ef1 into netty:4.1 Feb 26, 2021

chrisvest mentioned this pull request Feb 26, 2021

HttpPostMultipartRequestDecoder performance regression #10508

Closed

jameskleeh mentioned this pull request Apr 6, 2021

HttpPostMultipartRequestDecoder may not add content to an existing upload after being offered data #11143

Closed

chrisvest mentioned this pull request Apr 8, 2021

HttpPostRequestDecoder may cause memory leak #11109

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minimize get byte multipart and fix buffer reuse #11001

Minimize get byte multipart and fix buffer reuse #11001

fredericBregier commented Feb 7, 2021 •

edited

fredericBregier commented Feb 7, 2021

franz1981 commented Feb 7, 2021

fredericBregier commented Feb 7, 2021

franz1981 commented Feb 7, 2021

normanmaurer commented Feb 8, 2021

fredericBregier commented Feb 8, 2021

chrisvest left a comment

chrisvest commented Feb 10, 2021

chrisvest left a comment

fredericBregier commented Feb 13, 2021

chrisvest commented Feb 26, 2021

jameskleeh commented Apr 6, 2021

Minimize get byte multipart and fix buffer reuse #11001

Minimize get byte multipart and fix buffer reuse #11001

Conversation

fredericBregier commented Feb 7, 2021 • edited

Fix HttpPostMultipartRequestDecoder buffer usages and allocations

Motivation:

Modifications:

Result:

Add some tests to check consistency in Http Codec Multipart

Motivation:

Modification:

Results:

fredericBregier commented Feb 7, 2021

franz1981 commented Feb 7, 2021

fredericBregier commented Feb 7, 2021

franz1981 commented Feb 7, 2021

normanmaurer commented Feb 8, 2021

fredericBregier commented Feb 8, 2021

chrisvest left a comment

Choose a reason for hiding this comment

chrisvest commented Feb 10, 2021

chrisvest left a comment

Choose a reason for hiding this comment

fredericBregier commented Feb 13, 2021

chrisvest commented Feb 26, 2021

jameskleeh commented Apr 6, 2021

fredericBregier commented Feb 7, 2021 •

edited