Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed-up HTTP 1.1 header and line parsing #12321

Merged
merged 6 commits into from
Jan 27, 2023

Conversation

franz1981
Copy link
Contributor

Motivation:

Http content parsing is making use of bi-morphic inheritance and loop fusion to improve performance: both
are not working as expected.

Modifications:

Splitting (CR)LF parsing into SWAR + copy loops ie loop fission
and saving LineParser to inherit HeaderParser

Result:
Parsing speedup

@franz1981
Copy link
Contributor Author

These are my early results, but feel free to point me any other HTTP header parsing intensive benchmark,
4.1:

Benchmark                                                                              (step)   Mode  Cnt      Score      Error  Units
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters            2  thrpt   10   7417.600 ±  585.303  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters:·asm       2  thrpt             NaN               ---
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters            4  thrpt   10  11726.992 ±  385.804  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters:·asm       4  thrpt             NaN               ---
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters            8  thrpt   10  15545.752 ±  454.380  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters:·asm       8  thrpt             NaN               ---
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters           16  thrpt   10  18464.836 ±  699.593  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters:·asm      16  thrpt             NaN               ---
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters           32  thrpt   10  19669.667 ± 3289.623  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters:·asm      32  thrpt             NaN               ---

this PR:

Benchmark                                                                              (step)   Mode  Cnt      Score      Error  Units
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters            2  thrpt   10   8596.119 ±  340.640  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters:·asm       2  thrpt             NaN               ---
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters            4  thrpt   10  12566.280 ±  383.398  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters:·asm       4  thrpt             NaN               ---
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters            8  thrpt   10  17125.243 ±  158.794  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters:·asm       8  thrpt             NaN               ---
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters           16  thrpt   10  19737.088 ± 1065.066  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters:·asm      16  thrpt             NaN               ---
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters           32  thrpt   10  21936.390 ±  438.841  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters:·asm      32  thrpt             NaN               ---

@franz1981
Copy link
Contributor Author

I need to check the ASM of the new methods (including SWAR) yet (that's why is still a DRAFT)

@chrisvest
Copy link
Contributor

Re-running builds. Autobahn got stuck and killed after 5 hours. (Which may or may not be a bad sign… haven't seen Autobahn get stuck like that on other PRs that I recall.)

@normanmaurer
Copy link
Member

Got again stuck... @franz1981 seems like there is something wrong ?

@franz1981
Copy link
Contributor Author

franz1981 commented Apr 21, 2022

Got again stuck... @franz1981 seems like there is something wrong ?

Investigating right now: I've tried hard to make the new code to work exactly the same as the previous one, but found weird things related size change and I'm not familiar with the purpose of the code itself :/
Will try to get to the top of it, but not having any failed existing test means that there's something we're not covering (explicitly)

@franz1981 franz1981 changed the title Speed-up HT P 1.1 header parsing Speed-up HTTP 1.1 header parsing Apr 21, 2022
@franz1981
Copy link
Contributor Author

@normanmaurer

last build failed with jdk 11 and got cancelled here

2022-04-21T00:30:38.9418372Z 00:30:38.940 [main] INFO  i.n.c.e.EpollSocketShutdownOutputBySelfTest - Running: testShutdownOutput 1 of 2 with UnpooledByteBufAllocator
2022-04-21T00:30:38.9428094Z 00:30:38.942 [testsuite-epoll-worker-11-1] DEBUG i.n.channel.DefaultChannelPipeline - Discarded inbound message AdvancedLeakAwareByteBuf(PooledUnsafeDirectByteBuf(ridx: 0, widx: 1, cap: 64)) that reached at the tail of the pipeline. Please check your pipeline configuration.
2022-04-21T00:30:38.9429628Z 00:30:38.942 [testsuite-epoll-worker-11-1] DEBUG i.n.channel.DefaultChannelPipeline - Discarded message pipeline : [ChannelInboundHandlerAdapter#0, DefaultChannelPipeline$TailContext#0]. Channel : [id: 0x64cf054c, L:/127.0.0.1:35470 ! R:127.0.0.1/127.0.0.1:36221].
2022-04-21T00:30:40.9054161Z 00:30:40.904 [epollEventLoopGroup-34-1] DEBUG io.netty.buffer.PoolThreadCache - Freed 1 thread-local buffer(s) from thread: epollEventLoopGroup-34-1
2022-04-21T00:30:40.9057481Z 00:30:40.904 [epollEventLoopGroup-36-2] DEBUG io.netty.buffer.PoolThreadCache - Freed 1 thread-local buffer(s) from thread: epollEventLoopGroup-36-2
2022-04-21T00:30:40.9058821Z 00:30:40.904 [epollEventLoopGroup-36-1] DEBUG io.netty.buffer.PoolThreadCache - Freed 1 thread-local buffer(s) from thread: epollEventLoopGroup-36-1
2022-04-21T00:30:40.9060084Z 00:30:40.905 [epollEventLoopGroup-34-2] DEBUG io.netty.buffer.PoolThreadCache - Freed 1 thread-local buffer(s) from thread: epollEventLoopGroup-34-2
2022-04-21T00:30:40.9083363Z 00:30:40.907 [epollEventLoopGroup-38-1] DEBUG io.netty.buffer.PoolThreadCache - Freed 1 thread-local buffer(s) from thread: epollEventLoopGroup-38-1
2022-04-21T00:30:40.9159906Z 00:30:40.915 [epollEventLoopGroup-40-1] DEBUG io.netty.buffer.PoolThreadCache - Freed 1 thread-local buffer(s) from thread: epollEventLoopGroup-40-1
2022-04-21T05:22:29.3824396Z ##[error]The operation was canceled.

Let's see if I can reproduce the failure by doing the same

@franz1981
Copy link
Contributor Author

franz1981 commented Apr 21, 2022

@normanmaurer running it from my branch (without leak detection) -> https://github.com/franz1981/netty/runs/6108265337?check_suite_focus=true

has worked fine with jdk 11, but I've found some weird thing in the original code and maybe you can help me to understand if I should do the same for this PR; for HeaderParser

        @Override
        public boolean process(byte value) throws Exception {
            char nextByte = (char) (value & 0xFF);
            if (nextByte == HttpConstants.LF) {
                int len = seq.length();
                // Drop CR if we had a CRLF pair
                if (len >= 1 && seq.charAtUnsafe(len - 1) == HttpConstants.CR) {
                    -- size;
                    seq.setLength(len - 1);
                }
                return false;
            }

            increaseCount();

            seq.append(nextByte);
            return true;
        }
        
        protected final void increaseCount() {
            if (++ size > maxLength) {
                // TODO: Respond with Bad Request and discard the traffic
                //    or close the connection.
                //       No need to notify the upstream handlers - just log.
                //       If decoding a response, just throw an exception.
                throw newException(maxLength);
            }
        }

Due to the check for LF before incrementing count and the check for (++ size > maxLength), if the readable bytes in a buffer are 4 and max length is 3, but the last byte (buffer offset 3) is LF the parsing won't fail with TooLongHttpHeaderException :O
Meaning that we've parsed 4 bytes, exceeding the configured limit of 3: it looks like a subtle (maybe innocuous) bug, but am I suppose to do the same? :O

@franz1981
Copy link
Contributor Author

Summoning @trustin as well for #12321 (comment)

@trustin
Copy link
Member

trustin commented Apr 21, 2022

Hey, thanks for summoning, @franz1981. It indeed looks like a bug to me and we should count CR.

@normanmaurer
Copy link
Member

@franz1981 agree with @trustin here... Can you do an extra PR to fix this on the existing code for visibility ?

@franz1981
Copy link
Contributor Author

franz1981 commented Apr 21, 2022

@franz1981 agree with @trustin here... Can you do an extra PR to fix this on the existing code for visibility ?

@normanmaurer
Yep, so, let me summarize in case I got it wrong: maxLength shouldn't be evaluated against the number of read bytes (ie including the trailing delimiters), but against the resulting body (after removing any trailer with \n or \r\n)

eg

 maxLength = 4096
 <body of 4096 bytes> \r\n = total 4098 bytes

and

 maxLength = 4096
 <body of 4096 bytes> \n = total 4097 bytes

should both work without throwing any exception

is it correct?

@franz1981 franz1981 marked this pull request as ready for review April 21, 2022 15:34
return seq;
}
int size = this.size + newSize;
if (size > maxLength) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trustin @normanmaurer this is the part to look at

// No need to notify the upstream handlers - just log.
// If decoding a response, just throw an exception.
throw new TooLongHttpHeaderException("HTTP header is larger than " + maxLength + " bytes.");
}
return null;
Copy link
Contributor Author

@franz1981 franz1981 Apr 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trustin @normanmaurer if we read maxLength bytes because readableBytes are less or equal to maxLength we haven't read anything relevant

@franz1981
Copy link
Contributor Author

This is still performing as good as #12321 (comment) but I would like to run other benchmark(s) on it (end to end, possibly) @idelpivnitskiy anything in mind I can try?

@franz1981
Copy link
Contributor Author

Let's see @chrisvest if it's still failing on jdk 11 build :(

@franz1981 franz1981 changed the title Speed-up HTTP 1.1 header parsing Speed-up HTTP 1.1 header and line parsing Apr 22, 2022
@franz1981 franz1981 marked this pull request as draft April 22, 2022 08:11
@franz1981
Copy link
Contributor Author

franz1981 commented Apr 22, 2022

Adding same treatment for line parsing as well, results:

Benchmark                                                                                (step)   Mode  Cnt      Score     Error  Units
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters              2  thrpt   20   8811.375 ± 262.989  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters              4  thrpt   20  13051.612 ±  61.193  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters              8  thrpt   20  17298.027 ±  78.159  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters             16  thrpt   20  20127.898 ± 655.783  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters             32  thrpt   20  22062.810 ± 516.576  ops/s

Slightly improved over just the header parsing improvement(s).
Now running other frameworks (based on Netty, Vert-x @vietj ) to see if I got better/similar improvements

@franz1981 franz1981 marked this pull request as ready for review April 22, 2022 09:16
@franz1981
Copy link
Contributor Author

franz1981 commented Apr 22, 2022

@normanmaurer I've decided to both fix and improve performance directly on this one; given that the logic is now slightly more complex, it would have been a waste of energies to fix it twice IMHO

Now looking at the assembly and LineParser to check if there's anything I've left that could improve further performance

ie why using an appendable sequence while we could:

  1. just create a slice of the origin buffer to use it temporarly to split into parts (as String): the encoding is ASCII to a byte-holder structure is enough -> no need for char sequences!
  2. create an Ascii appendable structure based on a byte[] to transform the foreach byte -> char into a memcpy

Let me know wdyt, these could be addressed in follow-up PRs because the new logic enable a clear separation between scanning and copying/encoding

}
// TODO size should be increased as well in order to let super::parse
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO here is valid: @trustin @normanmaurer

size is used to track the length of body AND read bytes, in order to throw an exception when the maxLength is exceeded.
In this PR I'm not increasing size with the amount of (already) skipped control chars, meaning that I give the next parse (starting from the first non control chars) more room to read body data.

Not sure if it's correct, but giving that I've slightly changed the original behavior and not getting any test error (until now), it means this is uncovered by tests

Copy link
Contributor

@chrisvest chrisvest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Up to you if you want to change the byte indexing or not.

@franz1981
Copy link
Contributor Author

franz1981 commented Jan 20, 2023

Today still travelling, but will try hard to complete tests on Vertx (and its http test suite) on Monday, adding few tests to preserve the header name pooling too.

@chrisvest @normanmaurer Netty CI tests are happy in the current state?

Re Netty 5: I will port it there too :)

@franz1981
Copy link
Contributor Author

@chrisvest I will send other changes (just a microbenchmark for the pooling part) in a follow-up PR likely; looking at the assembly I see that that the validation using the offset seems to work the same, hence I haven't decided if change it or not: feel free to merge it and/or wait till the others will complete their review round (@idelpivnitskiy and/or @normanmaurer and/or @vietj )

@normanmaurer
Copy link
Member

@franz1981 is "vert.x" happy as well ?

@franz1981
Copy link
Contributor Author

waiting @vietj to run vertx CI too, thanks @normanmaurer

@normanmaurer
Copy link
Member

@franz1981 @vietj ping me once done

@franz1981
Copy link
Contributor Author

@normanmaurer we're ready to go, vertx CI is happy
I'll send a followup PR with an additional test, bench and improvement on the header names pooling part, but this PR is already pretty big as it is IMO

@normanmaurer normanmaurer merged commit 0daaec3 into netty:4.1 Jan 27, 2023
@normanmaurer
Copy link
Member

@franz1981 thanks a lot for all the work you put into this!

@normanmaurer
Copy link
Member

@franz1981 can you please open a PR for main as well ?

@normanmaurer normanmaurer added this to the 4.1.88.Final milestone Jan 27, 2023
@franz1981
Copy link
Contributor Author

Yep @normanmaurer will do it today ☺️
Thanks a lot for the patience bud

normanmaurer added a commit that referenced this pull request Feb 13, 2023
Motivation:

This commit introduced a regression while parsing headers that could lead to IndexOutOfBoundsException.

Modifications

Reverts commit 0daaec3.

Result:

Fixes #13213
normanmaurer pushed a commit to franz1981/netty that referenced this pull request Feb 13, 2023
normanmaurer pushed a commit that referenced this pull request Feb 13, 2023
yawkat added a commit to yawkat/netty that referenced this pull request Feb 14, 2023
Motivation:

With netty#12321, the HttpHeaders object returned by the standard HTTP/1 pipeline now contains AsciiStrings, not Strings. This uncovered a bug where HttpHeaders.names() did not behave properly as a Set<String> when the backing HttpHeaders contained a CharSequence type other than String. This leads to .contains incorrectly returning false when the header is present.

Modification:

Replace CharSequenceDelegatingStringSet with a class that delegates directly to the Headers object, instead of Headers.names(), for the .contains call.

Result:

- The .contains call works properly even when the backing Headers aren't Strings.
- Mutation methods were removed. This is an improvement imo, names() previously returned a copy, so changes would not be reflected in the Headers anyway.
- .contains is now case-insensitive.
normanmaurer pushed a commit that referenced this pull request Feb 15, 2023
Motivation:

With #12321, the HttpHeaders object returned by the standard HTTP/1 pipeline now contains AsciiStrings, not Strings. This uncovered a bug where HttpHeaders.names() did not behave properly as a Set<String> when the backing HttpHeaders contained a CharSequence type other than String. This leads to .contains incorrectly returning false when the header is present.

Modification:

Replace CharSequenceDelegatingStringSet with a class that delegates directly to the Headers object, instead of Headers.names(), for the .contains call.

Result:

- The .contains call works properly even when the backing Headers aren't Strings.
- Mutation methods were removed. This is an improvement imo, names() previously returned a copy, so changes would not be reflected in the Headers anyway.
- .contains is now case-insensitive.
franz1981 added a commit to franz1981/netty that referenced this pull request Mar 13, 2023
Fixes netty#13273)

Motivation:

netty#12321 introduces the assumption that whitespaces/ctrl chars are already stripped out hex chars of HTTP chunk length.
It was a wrong assumption.

Modifications:

Correctly check for whitespace and ctrl chars and re-introduce whitespace (leading) trim

Result:

Fixes netty#13273
normanmaurer added a commit that referenced this pull request Mar 14, 2023
Fixes #13273) (#13274)


Motivation:

#12321 introduces the assumption that whitespaces/ctrl chars are already stripped out hex chars of HTTP chunk length. It was a wrong assumption.

Modifications:

Correctly check for whitespace and ctrl chars and re-introduce whitespace (leading) trim

Result:

Fixes #13273

Co-authored-by: Norman Maurer <norman_maurer@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants