Speed-up HTTP 1.1 header and line parsing #12321

franz1981 · 2022-04-20T16:39:04Z

Motivation:

Http content parsing is making use of bi-morphic inheritance and loop fusion to improve performance: both
are not working as expected.

Modifications:

Splitting (CR)LF parsing into SWAR + copy loops ie loop fission
and saving LineParser to inherit HeaderParser

Result:
Parsing speedup

franz1981 · 2022-04-20T16:40:22Z

These are my early results, but feel free to point me any other HTTP header parsing intensive benchmark,
4.1:

Benchmark                                                                              (step)   Mode  Cnt      Score      Error  Units
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters            2  thrpt   10   7417.600 ±  585.303  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters:·asm       2  thrpt             NaN               ---
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters            4  thrpt   10  11726.992 ±  385.804  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters:·asm       4  thrpt             NaN               ---
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters            8  thrpt   10  15545.752 ±  454.380  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters:·asm       8  thrpt             NaN               ---
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters           16  thrpt   10  18464.836 ±  699.593  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters:·asm      16  thrpt             NaN               ---
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters           32  thrpt   10  19669.667 ± 3289.623  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters:·asm      32  thrpt             NaN               ---

this PR:

Benchmark                                                                              (step)   Mode  Cnt      Score      Error  Units
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters            2  thrpt   10   8596.119 ±  340.640  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters:·asm       2  thrpt             NaN               ---
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters            4  thrpt   10  12566.280 ±  383.398  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters:·asm       4  thrpt             NaN               ---
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters            8  thrpt   10  17125.243 ±  158.794  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters:·asm       8  thrpt             NaN               ---
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters           16  thrpt   10  19737.088 ± 1065.066  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters:·asm      16  thrpt             NaN               ---
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters           32  thrpt   10  21936.390 ±  438.841  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters:·asm      32  thrpt             NaN               ---

franz1981 · 2022-04-20T16:41:12Z

I need to check the ASM of the new methods (including SWAR) yet (that's why is still a DRAFT)

chrisvest · 2022-04-20T23:17:27Z

Re-running builds. Autobahn got stuck and killed after 5 hours. (Which may or may not be a bad sign… haven't seen Autobahn get stuck like that on other PRs that I recall.)

normanmaurer · 2022-04-21T07:15:41Z

Got again stuck... @franz1981 seems like there is something wrong ?

franz1981 · 2022-04-21T07:19:03Z

Got again stuck... @franz1981 seems like there is something wrong ?

Investigating right now: I've tried hard to make the new code to work exactly the same as the previous one, but found weird things related size change and I'm not familiar with the purpose of the code itself :/
Will try to get to the top of it, but not having any failed existing test means that there's something we're not covering (explicitly)

franz1981 · 2022-04-21T07:40:39Z

@normanmaurer

last build failed with jdk 11 and got cancelled here

2022-04-21T00:30:38.9418372Z 00:30:38.940 [main] INFO  i.n.c.e.EpollSocketShutdownOutputBySelfTest - Running: testShutdownOutput 1 of 2 with UnpooledByteBufAllocator
2022-04-21T00:30:38.9428094Z 00:30:38.942 [testsuite-epoll-worker-11-1] DEBUG i.n.channel.DefaultChannelPipeline - Discarded inbound message AdvancedLeakAwareByteBuf(PooledUnsafeDirectByteBuf(ridx: 0, widx: 1, cap: 64)) that reached at the tail of the pipeline. Please check your pipeline configuration.
2022-04-21T00:30:38.9429628Z 00:30:38.942 [testsuite-epoll-worker-11-1] DEBUG i.n.channel.DefaultChannelPipeline - Discarded message pipeline : [ChannelInboundHandlerAdapter#0, DefaultChannelPipeline$TailContext#0]. Channel : [id: 0x64cf054c, L:/127.0.0.1:35470 ! R:127.0.0.1/127.0.0.1:36221].
2022-04-21T00:30:40.9054161Z 00:30:40.904 [epollEventLoopGroup-34-1] DEBUG io.netty.buffer.PoolThreadCache - Freed 1 thread-local buffer(s) from thread: epollEventLoopGroup-34-1
2022-04-21T00:30:40.9057481Z 00:30:40.904 [epollEventLoopGroup-36-2] DEBUG io.netty.buffer.PoolThreadCache - Freed 1 thread-local buffer(s) from thread: epollEventLoopGroup-36-2
2022-04-21T00:30:40.9058821Z 00:30:40.904 [epollEventLoopGroup-36-1] DEBUG io.netty.buffer.PoolThreadCache - Freed 1 thread-local buffer(s) from thread: epollEventLoopGroup-36-1
2022-04-21T00:30:40.9060084Z 00:30:40.905 [epollEventLoopGroup-34-2] DEBUG io.netty.buffer.PoolThreadCache - Freed 1 thread-local buffer(s) from thread: epollEventLoopGroup-34-2
2022-04-21T00:30:40.9083363Z 00:30:40.907 [epollEventLoopGroup-38-1] DEBUG io.netty.buffer.PoolThreadCache - Freed 1 thread-local buffer(s) from thread: epollEventLoopGroup-38-1
2022-04-21T00:30:40.9159906Z 00:30:40.915 [epollEventLoopGroup-40-1] DEBUG io.netty.buffer.PoolThreadCache - Freed 1 thread-local buffer(s) from thread: epollEventLoopGroup-40-1
2022-04-21T05:22:29.3824396Z ##[error]The operation was canceled.

Let's see if I can reproduce the failure by doing the same

franz1981 · 2022-04-21T10:25:33Z

@normanmaurer running it from my branch (without leak detection) -> https://github.com/franz1981/netty/runs/6108265337?check_suite_focus=true

has worked fine with jdk 11, but I've found some weird thing in the original code and maybe you can help me to understand if I should do the same for this PR; for HeaderParser

        @Override
        public boolean process(byte value) throws Exception {
            char nextByte = (char) (value & 0xFF);
            if (nextByte == HttpConstants.LF) {
                int len = seq.length();
                // Drop CR if we had a CRLF pair
                if (len >= 1 && seq.charAtUnsafe(len - 1) == HttpConstants.CR) {
                    -- size;
                    seq.setLength(len - 1);
                }
                return false;
            }

            increaseCount();

            seq.append(nextByte);
            return true;
        }
        
        protected final void increaseCount() {
            if (++ size > maxLength) {
                // TODO: Respond with Bad Request and discard the traffic
                //    or close the connection.
                //       No need to notify the upstream handlers - just log.
                //       If decoding a response, just throw an exception.
                throw newException(maxLength);
            }
        }

Due to the check for LF before incrementing count and the check for (++ size > maxLength), if the readable bytes in a buffer are 4 and max length is 3, but the last byte (buffer offset 3) is LF the parsing won't fail with TooLongHttpHeaderException :O
Meaning that we've parsed 4 bytes, exceeding the configured limit of 3: it looks like a subtle (maybe innocuous) bug, but am I suppose to do the same? :O

franz1981 · 2022-04-21T12:15:57Z

Summoning @trustin as well for #12321 (comment)

trustin · 2022-04-21T12:24:52Z

Hey, thanks for summoning, @franz1981. It indeed looks like a bug to me and we should count CR.

normanmaurer · 2022-04-21T12:59:07Z

@franz1981 agree with @trustin here... Can you do an extra PR to fix this on the existing code for visibility ?

franz1981 · 2022-04-21T13:10:29Z

@franz1981 agree with @trustin here... Can you do an extra PR to fix this on the existing code for visibility ?

@normanmaurer
Yep, so, let me summarize in case I got it wrong: maxLength shouldn't be evaluated against the number of read bytes (ie including the trailing delimiters), but against the resulting body (after removing any trailer with \n or \r\n)

eg

 maxLength = 4096
 <body of 4096 bytes> \r\n = total 4098 bytes

and

 maxLength = 4096
 <body of 4096 bytes> \n = total 4097 bytes

should both work without throwing any exception

is it correct?

franz1981 · 2022-04-21T15:35:45Z

codec-http/src/main/java/io/netty/handler/codec/http/HttpObjectDecoder.java

+                return seq;
+            }
+            int size = this.size + newSize;
+            if (size > maxLength) {


@trustin @normanmaurer this is the part to look at

franz1981 · 2022-04-21T15:38:06Z

codec-http/src/main/java/io/netty/handler/codec/http/HttpObjectDecoder.java

+                    //       No need to notify the upstream handlers - just log.
+                    //       If decoding a response, just throw an exception.
+                    throw new TooLongHttpHeaderException("HTTP header is larger than " + maxLength + " bytes.");
+                }
                return null;


@trustin @normanmaurer if we read maxLength bytes because readableBytes are less or equal to maxLength we haven't read anything relevant

franz1981 · 2022-04-21T15:40:06Z

This is still performing as good as #12321 (comment) but I would like to run other benchmark(s) on it (end to end, possibly) @idelpivnitskiy anything in mind I can try?

franz1981 · 2022-04-21T15:40:38Z

Let's see @chrisvest if it's still failing on jdk 11 build :(

codec-http/src/main/java/io/netty/handler/codec/http/HttpObjectDecoder.java

franz1981 · 2022-04-22T09:15:41Z

Adding same treatment for line parsing as well, results:

Benchmark                                                                                (step)   Mode  Cnt      Score     Error  Units
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters              2  thrpt   20   8811.375 ± 262.989  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters              4  thrpt   20  13051.612 ±  61.193  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters              8  thrpt   20  17298.027 ±  78.159  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters             16  thrpt   20  20127.898 ± 655.783  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters             32  thrpt   20  22062.810 ± 516.576  ops/s

Slightly improved over just the header parsing improvement(s).
Now running other frameworks (based on Netty, Vert-x @vietj ) to see if I got better/similar improvements

franz1981 · 2022-04-22T09:17:32Z

@normanmaurer I've decided to both fix and improve performance directly on this one; given that the logic is now slightly more complex, it would have been a waste of energies to fix it twice IMHO

Now looking at the assembly and LineParser to check if there's anything I've left that could improve further performance

ie why using an appendable sequence while we could:

just create a slice of the origin buffer to use it temporarly to split into parts (as String): the encoding is ASCII to a byte-holder structure is enough -> no need for char sequences!
create an Ascii appendable structure based on a byte[] to transform the foreach byte -> char into a memcpy

Let me know wdyt, these could be addressed in follow-up PRs because the new logic enable a clear separation between scanning and copying/encoding

franz1981 · 2022-04-22T09:29:40Z

codec-http/src/main/java/io/netty/handler/codec/http/HttpObjectDecoder.java

                }
+                // TODO size should be increased as well in order to let super::parse


TODO here is valid: @trustin @normanmaurer

size is used to track the length of body AND read bytes, in order to throw an exception when the maxLength is exceeded.
In this PR I'm not increasing size with the amount of (already) skipped control chars, meaning that I give the next parse (starting from the first non control chars) more room to read body data.

Not sure if it's correct, but giving that I've slightly changed the original behavior and not getting any test error (until now), it means this is uncovered by tests

codec-http/src/main/java/io/netty/handler/codec/http/HttpRequestDecoder.java

chrisvest

Looks good. Up to you if you want to change the byte indexing or not.

franz1981 · 2023-01-20T13:29:12Z

Today still travelling, but will try hard to complete tests on Vertx (and its http test suite) on Monday, adding few tests to preserve the header name pooling too.

@chrisvest @normanmaurer Netty CI tests are happy in the current state?

Re Netty 5: I will port it there too :)

franz1981 · 2023-01-23T16:35:46Z

@chrisvest I will send other changes (just a microbenchmark for the pooling part) in a follow-up PR likely; looking at the assembly I see that that the validation using the offset seems to work the same, hence I haven't decided if change it or not: feel free to merge it and/or wait till the others will complete their review round (@idelpivnitskiy and/or @normanmaurer and/or @vietj )

normanmaurer · 2023-01-23T16:39:23Z

@franz1981 is "vert.x" happy as well ?

franz1981 · 2023-01-23T16:42:30Z

waiting @vietj to run vertx CI too, thanks @normanmaurer

normanmaurer · 2023-01-24T08:48:01Z

@franz1981 @vietj ping me once done

franz1981 · 2023-01-26T14:03:20Z

@normanmaurer we're ready to go, vertx CI is happy
I'll send a followup PR with an additional test, bench and improvement on the header names pooling part, but this PR is already pretty big as it is IMO

normanmaurer · 2023-01-27T07:28:58Z

@franz1981 thanks a lot for all the work you put into this!

normanmaurer · 2023-01-27T07:29:24Z

@franz1981 can you please open a PR for main as well ?

franz1981 · 2023-01-27T08:24:05Z

Yep @normanmaurer will do it today ☺️
Thanks a lot for the patience bud

Motivation: This commit introduced a regression while parsing headers that could lead to IndexOutOfBoundsException. Modifications Reverts commit 0daaec3. Result: Fixes #13213

…"" This reverts commit 9993e07.

This reverts commit 9993e07.

Motivation: With netty#12321, the HttpHeaders object returned by the standard HTTP/1 pipeline now contains AsciiStrings, not Strings. This uncovered a bug where HttpHeaders.names() did not behave properly as a Set<String> when the backing HttpHeaders contained a CharSequence type other than String. This leads to .contains incorrectly returning false when the header is present. Modification: Replace CharSequenceDelegatingStringSet with a class that delegates directly to the Headers object, instead of Headers.names(), for the .contains call. Result: - The .contains call works properly even when the backing Headers aren't Strings. - Mutation methods were removed. This is an improvement imo, names() previously returned a copy, so changes would not be reflected in the Headers anyway. - .contains is now case-insensitive.

Motivation: With #12321, the HttpHeaders object returned by the standard HTTP/1 pipeline now contains AsciiStrings, not Strings. This uncovered a bug where HttpHeaders.names() did not behave properly as a Set<String> when the backing HttpHeaders contained a CharSequence type other than String. This leads to .contains incorrectly returning false when the header is present. Modification: Replace CharSequenceDelegatingStringSet with a class that delegates directly to the Headers object, instead of Headers.names(), for the .contains call. Result: - The .contains call works properly even when the backing Headers aren't Strings. - Mutation methods were removed. This is an improvement imo, names() previously returned a copy, so changes would not be reflected in the Headers anyway. - .contains is now case-insensitive.

Fixes netty#13273) Motivation: netty#12321 introduces the assumption that whitespaces/ctrl chars are already stripped out hex chars of HTTP chunk length. It was a wrong assumption. Modifications: Correctly check for whitespace and ctrl chars and re-introduce whitespace (leading) trim Result: Fixes netty#13273

Fixes #13273) (#13274) Motivation: #12321 introduces the assumption that whitespaces/ctrl chars are already stripped out hex chars of HTTP chunk length. It was a wrong assumption. Modifications: Correctly check for whitespace and ctrl chars and re-introduce whitespace (leading) trim Result: Fixes #13273 Co-authored-by: Norman Maurer <norman_maurer@apple.com>

franz1981 force-pushed the fast_header_parse branch from 8d74473 to 2294f81 Compare April 20, 2022 16:44

franz1981 requested a review from normanmaurer April 20, 2022 18:00

normanmaurer requested review from chrisvest and idelpivnitskiy April 20, 2022 18:03

franz1981 changed the title ~~Speed-up HT P 1.1 header parsing~~ Speed-up HTTP 1.1 header parsing Apr 21, 2022

franz1981 force-pushed the fast_header_parse branch from 2294f81 to 47e0425 Compare April 21, 2022 15:33

franz1981 marked this pull request as ready for review April 21, 2022 15:34

franz1981 commented Apr 21, 2022

View reviewed changes

franz1981 force-pushed the fast_header_parse branch from 47e0425 to 2aa1cac Compare April 21, 2022 16:36

hyperxpro reviewed Apr 21, 2022

View reviewed changes

codec-http/src/main/java/io/netty/handler/codec/http/HttpObjectDecoder.java Outdated Show resolved Hide resolved

franz1981 changed the title ~~Speed-up HTTP 1.1 header parsing~~ Speed-up HTTP 1.1 header and line parsing Apr 22, 2022

franz1981 marked this pull request as draft April 22, 2022 08:11

franz1981 marked this pull request as ready for review April 22, 2022 09:16

franz1981 commented Apr 22, 2022

View reviewed changes

franz1981 commented Jan 18, 2023

View reviewed changes

codec-http/src/main/java/io/netty/handler/codec/http/HttpRequestDecoder.java Show resolved Hide resolved

Fix getChunkSize based on Chris comments

9c85564

chrisvest approved these changes Jan 19, 2023

View reviewed changes

vietj approved these changes Jan 25, 2023

View reviewed changes

normanmaurer merged commit 0daaec3 into netty:4.1 Jan 27, 2023

normanmaurer added this to the 4.1.88.Final milestone Jan 27, 2023

franz1981 mentioned this pull request Feb 13, 2023

IndexOutOfBoundsException when upgrading from 4.1.87 to 4.1.88 #13213

Closed

normanmaurer pushed a commit to franz1981/netty that referenced this pull request Feb 13, 2023

Revert "Revert "Speed-up HTTP 1.1 header and line parsing (netty#12321)…

903fdcc

…"" This reverts commit 9993e07.

normanmaurer pushed a commit that referenced this pull request Feb 13, 2023

Revert "Revert "Speed-up HTTP 1.1 header and line parsing (#12321)""

a803e10

This reverts commit 9993e07.

yawkat mentioned this pull request Feb 14, 2023

Fix HttpHeaders.names for non-String headers #13219

Merged

franz1981 mentioned this pull request Mar 13, 2023

Chunked HTTP length decoding should account for whitespaces/ctrl chars (Fixes #13273) #13274

Merged

franz1981 mentioned this pull request Jul 24, 2023

Improved HTTP 1.1 ClassRoutingHandler::handle due to un-optimized headers names lookup quarkusio/quarkus#25834

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed-up HTTP 1.1 header and line parsing #12321

Speed-up HTTP 1.1 header and line parsing #12321

franz1981 commented Apr 20, 2022

franz1981 commented Apr 20, 2022

franz1981 commented Apr 20, 2022

chrisvest commented Apr 20, 2022

normanmaurer commented Apr 21, 2022

franz1981 commented Apr 21, 2022 •

edited

Loading

franz1981 commented Apr 21, 2022

franz1981 commented Apr 21, 2022 •

edited

Loading

franz1981 commented Apr 21, 2022

trustin commented Apr 21, 2022

normanmaurer commented Apr 21, 2022

franz1981 commented Apr 21, 2022 •

edited

Loading

franz1981 Apr 21, 2022

franz1981 Apr 21, 2022 •

edited

Loading

franz1981 commented Apr 21, 2022

franz1981 commented Apr 21, 2022

franz1981 commented Apr 22, 2022 •

edited

Loading

franz1981 commented Apr 22, 2022 •

edited

Loading

franz1981 Apr 22, 2022

chrisvest left a comment

franz1981 commented Jan 20, 2023 •

edited

Loading

franz1981 commented Jan 23, 2023

normanmaurer commented Jan 23, 2023

franz1981 commented Jan 23, 2023

normanmaurer commented Jan 24, 2023

franz1981 commented Jan 26, 2023

normanmaurer commented Jan 27, 2023

normanmaurer commented Jan 27, 2023

franz1981 commented Jan 27, 2023

		}
		// TODO size should be increased as well in order to let super::parse

Speed-up HTTP 1.1 header and line parsing #12321

Speed-up HTTP 1.1 header and line parsing #12321

Conversation

franz1981 commented Apr 20, 2022

franz1981 commented Apr 20, 2022

franz1981 commented Apr 20, 2022

chrisvest commented Apr 20, 2022

normanmaurer commented Apr 21, 2022

franz1981 commented Apr 21, 2022 • edited Loading

franz1981 commented Apr 21, 2022

franz1981 commented Apr 21, 2022 • edited Loading

franz1981 commented Apr 21, 2022

trustin commented Apr 21, 2022

normanmaurer commented Apr 21, 2022

franz1981 commented Apr 21, 2022 • edited Loading

franz1981 Apr 21, 2022

Choose a reason for hiding this comment

franz1981 Apr 21, 2022 • edited Loading

Choose a reason for hiding this comment

franz1981 commented Apr 21, 2022

franz1981 commented Apr 21, 2022

franz1981 commented Apr 22, 2022 • edited Loading

franz1981 commented Apr 22, 2022 • edited Loading

franz1981 Apr 22, 2022

Choose a reason for hiding this comment

chrisvest left a comment

Choose a reason for hiding this comment

franz1981 commented Jan 20, 2023 • edited Loading

franz1981 commented Jan 23, 2023

normanmaurer commented Jan 23, 2023

franz1981 commented Jan 23, 2023

normanmaurer commented Jan 24, 2023

franz1981 commented Jan 26, 2023

normanmaurer commented Jan 27, 2023

normanmaurer commented Jan 27, 2023

franz1981 commented Jan 27, 2023

franz1981 commented Apr 21, 2022 •

edited

Loading

franz1981 commented Apr 21, 2022 •

edited

Loading

franz1981 commented Apr 21, 2022 •

edited

Loading

franz1981 Apr 21, 2022 •

edited

Loading

franz1981 commented Apr 22, 2022 •

edited

Loading

franz1981 commented Apr 22, 2022 •

edited

Loading

franz1981 commented Jan 20, 2023 •

edited

Loading