-
-
Notifications
You must be signed in to change notification settings - Fork 15.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minimize get byte multipart and fix buffer reuse #11001
Minimize get byte multipart and fix buffer reuse #11001
Conversation
Motivation: Underlying buffer usages might be erroneous when releasing them internaly in HttpPostMultipartRequestDecoder. 2 bugs occurs: 1) Final File upload seems not to be of the right size 2) Memory, even in Disk mode, is increasing continuously, while it shouldn't Modification: Add some tests to check consistency for HttpPostMultipartRequestDecoder. Add a package protected method for testing purpose only.
Motivation: Method `getByte(position)` is too often called within the current implementation of the HttpPostMultipartRequestDecoder. This implies too much activities which is visible when PARANOID mode is active. This is also true in standard mode. Apply the same fix on buffer from HttpPostMultipartRequestDecoder to HttpPostStandardRequestDecoder made previously. Finally in order to ensure we do not rewrite already decoded HttpData when decoding next ones within multipart, we must ensure the buffers are copied and not a retained slice. Modifications: Use the `bytesBefore(...)` method instead of `getByte(pos)` in order to limit the external access to the underlying buffer by retrieving iteratively the beginning of a correct start position. It is used to find both LF/CRLF and delimiter. 2 methods in HttpPostBodyUtil were created for that. The undecodedChunk is copied when adding a chunk to a DataMultipart is loaded. The same buffer is also rewritten in order to release the copied memory part. Result: Just for note, for both Memory or Disk or Mixed mode factories, the release has to be done as: for (InterfaceHttpData httpData: decoder.getBodyHttpDatas()) { httpData.release(); factory.removeHttpDataFromClean(request, httpData); } factory.cleanAllHttpData(); decoder.destroy(); The memory used is minimal in Disk or Mixed mode. In Memory mode, a big file is still in memory but not more in the undecodedChunk but its own buffer (copied). In terms of benchmarking, the results are: Original code Benchmark Mode Cnt Score Error Units HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigAdvancedLevel thrpt 6 0,152 ± 0,100 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigDisabledLevel thrpt 6 0,543 ± 0,218 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigParanoidLevel thrpt 6 0,001 ± 0,001 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigSimpleLevel thrpt 6 0,615 ± 0,070 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighAdvancedLevel thrpt 6 0,114 ± 0,063 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighDisabledLevel thrpt 6 0,664 ± 0,034 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighParanoidLevel thrpt 6 0,001 ± 0,001 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighSimpleLevel thrpt 6 0,620 ± 0,140 ops/ms New code Benchmark Mode Cnt Score Error Units HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigAdvancedLevel thrpt 6 4,037 ± 0,358 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigDisabledLevel thrpt 6 4,226 ± 0,471 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigParanoidLevel thrpt 6 0,875 ± 0,029 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigSimpleLevel thrpt 6 4,346 ± 0,275 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighAdvancedLevel thrpt 6 2,044 ± 0,020 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighDisabledLevel thrpt 6 2,278 ± 0,159 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighParanoidLevel thrpt 6 0,174 ± 0,004 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighSimpleLevel thrpt 6 2,370 ± 0,065 ops/ms In short, using big file transfers, this is about 7 times faster with new code, while using high number of HttpData, this is about 4 times faster with new code when using Simple Level. When using Paranoid Level, using big file transfers, this is about 800 times faster with new code, while using high number of HttpData, this is about 170 times faster with new code.
I try to rewrite from the beginning the previous proposal. |
I see that I should have used this PR benchmark to test the effectiveness of #10737 in a more realistic use case too :) |
@franz1981 In addition, we could implement a specific method within ByteBuf that allows to search for a "serie of bytes", but I think it might be too specific in my case so I decide to let this within HttpCodec as a static helper method. |
If you will use bytesBefore or indexOf you will likely get an additional speedup thanks to the PR I have linked ;) |
@fredericBregier does this replace #10982 ? |
@normanmaurer yes, I forgot to close the previous one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had some comments.
codec-http/src/main/java/io/netty/handler/codec/http/multipart/HttpPostBodyUtil.java
Outdated
Show resolved
Hide resolved
codec-http/src/main/java/io/netty/handler/codec/http/multipart/HttpPostBodyUtil.java
Outdated
Show resolved
Hide resolved
codec-http/src/main/java/io/netty/handler/codec/http/multipart/HttpPostBodyUtil.java
Outdated
Show resolved
Hide resolved
...ttp/src/main/java/io/netty/handler/codec/http/multipart/HttpPostMultipartRequestDecoder.java
Show resolved
Hide resolved
...ttp/src/main/java/io/netty/handler/codec/http/multipart/HttpPostMultipartRequestDecoder.java
Outdated
Show resolved
Hide resolved
...ttp/src/main/java/io/netty/handler/codec/http/multipart/HttpPostMultipartRequestDecoder.java
Outdated
Show resolved
Hide resolved
...ttp/src/main/java/io/netty/handler/codec/http/multipart/HttpPostMultipartRequestDecoder.java
Show resolved
Hide resolved
codec-http/src/test/java/io/netty/handler/codec/http/multipart/HttpPostRequestDecoderTest.java
Outdated
Show resolved
Hide resolved
codec-http/src/test/java/io/netty/handler/codec/http/multipart/HttpPostRequestDecoderTest.java
Show resolved
Hide resolved
codec-http/src/test/java/io/netty/handler/codec/http/multipart/HttpPostRequestDecoderTest.java
Outdated
Show resolved
Hide resolved
I forgot to mention: the benchmarks look great, and I like how many parts of the code became simpler. Nice work! |
codec-http/src/main/java/io/netty/handler/codec/http/multipart/MixedAttribute.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had another comment, but otherwise I think this looks good.
Fix typo and naming, Charset usage, while to if, improve comments, smaller tests, factorized them and split tests according to factory type, optimize buffer allocation when given buffer is empty, improve findDelimiter, remove unuseful check in MixedAttribute Still pending AbstractSearchProcessorFactory usage to check
b2757ae
to
6daeb0c
Compare
@franz1981 Done, thank you !! |
@fredericBregier Thanks! (sorry about the delay) |
Motivation: - Underlying buffer usages might be erroneous when releasing them internaly in HttpPostMultipartRequestDecoder. 2 bugs occurs: 1) Final File upload seems not to be of the right size. 2) Memory, even in Disk mode, is increasing continuously, while it shouldn't. - Method `getByte(position)` is too often called within the current implementation of the HttpPostMultipartRequestDecoder. This implies too much activities which is visible when PARANOID mode is active. This is also true in standard mode. Apply the same fix on buffer from HttpPostMultipartRequestDecoder to HttpPostStandardRequestDecoder made previously. Finally in order to ensure we do not rewrite already decoded HttpData when decoding next ones within multipart, we must ensure the buffers are copied and not a retained slice. Modification: - Add some tests to check consistency for HttpPostMultipartRequestDecoder. Add a package protected method for testing purpose only. - Use the `bytesBefore(...)` method instead of `getByte(pos)` in order to limit the external access to the underlying buffer by retrieving iteratively the beginning of a correct start position. It is used to find both LF/CRLF and delimiter. 2 methods in HttpPostBodyUtil were created for that. The undecodedChunk is copied when adding a chunk to a DataMultipart is loaded. The same buffer is also rewritten in order to release the copied memory part. Result: Just for note, for both Memory or Disk or Mixed mode factories, the release has to be done as: for (InterfaceHttpData httpData: decoder.getBodyHttpDatas()) { httpData.release(); factory.removeHttpDataFromClean(request, httpData); } factory.cleanAllHttpData(); decoder.destroy(); The memory used is minimal in Disk or Mixed mode. In Memory mode, a big file is still in memory but not more in the undecodedChunk but its own buffer (copied). In terms of benchmarking, the results are: Original code Benchmark Mode Cnt Score Error Units HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigAdvancedLevel thrpt 6 0,152 ± 0,100 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigDisabledLevel thrpt 6 0,543 ± 0,218 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigParanoidLevel thrpt 6 0,001 ± 0,001 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigSimpleLevel thrpt 6 0,615 ± 0,070 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighAdvancedLevel thrpt 6 0,114 ± 0,063 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighDisabledLevel thrpt 6 0,664 ± 0,034 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighParanoidLevel thrpt 6 0,001 ± 0,001 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighSimpleLevel thrpt 6 0,620 ± 0,140 ops/ms New code Benchmark Mode Cnt Score Error Units HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigAdvancedLevel thrpt 6 4,037 ± 0,358 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigDisabledLevel thrpt 6 4,226 ± 0,471 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigParanoidLevel thrpt 6 0,875 ± 0,029 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigSimpleLevel thrpt 6 4,346 ± 0,275 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighAdvancedLevel thrpt 6 2,044 ± 0,020 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighDisabledLevel thrpt 6 2,278 ± 0,159 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighParanoidLevel thrpt 6 0,174 ± 0,004 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighSimpleLevel thrpt 6 2,370 ± 0,065 ops/ms In short, using big file transfers, this is about 7 times faster with new code, while using high number of HttpData, this is about 4 times faster with new code when using Simple Level. When using Paranoid Level, using big file transfers, this is about 800 times faster with new code, while using high number of HttpData, this is about 170 times faster with new code.
This change broke the semantics of the decoder API. Will file an issue |
Motivation: - Underlying buffer usages might be erroneous when releasing them internaly in HttpPostMultipartRequestDecoder. 2 bugs occurs: 1) Final File upload seems not to be of the right size. 2) Memory, even in Disk mode, is increasing continuously, while it shouldn't. - Method `getByte(position)` is too often called within the current implementation of the HttpPostMultipartRequestDecoder. This implies too much activities which is visible when PARANOID mode is active. This is also true in standard mode. Apply the same fix on buffer from HttpPostMultipartRequestDecoder to HttpPostStandardRequestDecoder made previously. Finally in order to ensure we do not rewrite already decoded HttpData when decoding next ones within multipart, we must ensure the buffers are copied and not a retained slice. Modification: - Add some tests to check consistency for HttpPostMultipartRequestDecoder. Add a package protected method for testing purpose only. - Use the `bytesBefore(...)` method instead of `getByte(pos)` in order to limit the external access to the underlying buffer by retrieving iteratively the beginning of a correct start position. It is used to find both LF/CRLF and delimiter. 2 methods in HttpPostBodyUtil were created for that. The undecodedChunk is copied when adding a chunk to a DataMultipart is loaded. The same buffer is also rewritten in order to release the copied memory part. Result: Just for note, for both Memory or Disk or Mixed mode factories, the release has to be done as: for (InterfaceHttpData httpData: decoder.getBodyHttpDatas()) { httpData.release(); factory.removeHttpDataFromClean(request, httpData); } factory.cleanAllHttpData(); decoder.destroy(); The memory used is minimal in Disk or Mixed mode. In Memory mode, a big file is still in memory but not more in the undecodedChunk but its own buffer (copied). In terms of benchmarking, the results are: Original code Benchmark Mode Cnt Score Error Units HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigAdvancedLevel thrpt 6 0,152 ± 0,100 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigDisabledLevel thrpt 6 0,543 ± 0,218 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigParanoidLevel thrpt 6 0,001 ± 0,001 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigSimpleLevel thrpt 6 0,615 ± 0,070 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighAdvancedLevel thrpt 6 0,114 ± 0,063 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighDisabledLevel thrpt 6 0,664 ± 0,034 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighParanoidLevel thrpt 6 0,001 ± 0,001 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighSimpleLevel thrpt 6 0,620 ± 0,140 ops/ms New code Benchmark Mode Cnt Score Error Units HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigAdvancedLevel thrpt 6 4,037 ± 0,358 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigDisabledLevel thrpt 6 4,226 ± 0,471 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigParanoidLevel thrpt 6 0,875 ± 0,029 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigSimpleLevel thrpt 6 4,346 ± 0,275 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighAdvancedLevel thrpt 6 2,044 ± 0,020 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighDisabledLevel thrpt 6 2,278 ± 0,159 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighParanoidLevel thrpt 6 0,174 ± 0,004 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighSimpleLevel thrpt 6 2,370 ± 0,065 ops/ms In short, using big file transfers, this is about 7 times faster with new code, while using high number of HttpData, this is about 4 times faster with new code when using Simple Level. When using Paranoid Level, using big file transfers, this is about 800 times faster with new code, while using high number of HttpData, this is about 170 times faster with new code.
Fix HttpPostMultipartRequestDecoder buffer usages and allocations
Motivation:
Method
getByte(position)
is too often called within the current implementationof the HttpPostMultipartRequestDecoder.
This implies too much activities which is visible when PARANOID mode is active.
This is also true in standard mode.
Apply the same fix on buffer from HttpPostMultipartRequestDecoder to HttpPostStandardRequestDecoder
made previously.
Finally in order to ensure we do not rewrite already decoded HttpData when decoding
next ones within multipart, we must ensure the buffers are copied and not a retained slice.
Modifications:
Use the
bytesBefore(...)
method instead ofgetByte(pos)
in order to limit the externalaccess to the underlying buffer by retrieving iteratively the beginning of a correct start
position.
It is used to find both LF/CRLF and delimiter.
2 methods in HttpPostBodyUtil were created for that.
The undecodedChunk is copied when adding a chunk to a DataMultipart is loaded.
The same buffer is also rewritten in order to release the copied memory part.
Result:
Just for note, for both Memory or Disk or Mixed mode factories, the release has to be done as:
The memory used is minimal in Disk or Mixed mode. In Memory mode, a big file is still
in memory but not more in the undecodedChunk but its own buffer (copied).
In terms of benchmarking, the results are:
In short, using big file transfers, this is about 7 times faster with new code, while
using high number of HttpData, this is about 4 times faster with new code when using Simple Level.
When using Paranoid Level, using big file transfers, this is about 800 times faster with new code, while
using high number of HttpData, this is about 170 times faster with new code.
Small improvements are also added, concerning buffer allocation and unsueful check within MixedAttribute and charset extended usage.
Add some tests to check consistency in Http Codec Multipart
Motivation:
Underlying buffer usages might be erroneous when releasing them internaly
in HttpPostMultipartRequestDecoder.
2 bugs occurs:
Modification:
Add some tests to check consistency for HttpPostMultipartRequestDecoder.
Add a package protected method for testing purpose only.
Results:
Without fixes, those tests failed. With the fixes, they passed.