HttpObjectEncoder scalability issue due to instanceof checks (Fixes #12708) #12709

franz1981 · 2022-08-17T12:22:21Z

Motivation:

Current http encoder logic cause contention over the klass field used by the JIT to speed up instanceof checks vs interfaces,
preventing scaling with multi-core machines.
See https://bugs.openjdk.org/browse/JDK-8180450 for more info.

Modifications:

Code duplication to reduce the arity of the morphism on the instanceof checks on the http encoder.
Removed using encoder inherited methods to take control of the arity of the
call-site morphism while releasing ref counted http msg types.

Result:

Scalable HTTP encoding

normanmaurer · 2022-08-17T13:03:27Z

@franz1981 can you share any numbers that "proof" the gain ?

franz1981 · 2022-08-17T13:04:37Z

Yep @normanmaurer I'm going publish both an end 2 end benchmark result and the micro-bench one I've modified in Netty, but I'm trying a different fix too, that won't requires holding any Class field nor using specific concrete types too (but requires a bit more checks/duplication)

franz1981 · 2022-08-17T13:56:50Z

@normanmaurer
this is techempower plaintext benchmark, while polluting the encode method by sending some chunked response before the test; blue is 4.1 while green is this pr, with both using full http responses on steady state (meaning that they measure the same things/same application code).

The machine is a 32 cores one, if we run the same tests with a single core, there's no slowdown (as expected, given that's a contention issue) - not really, because the "type pollution" still cause instanceof to happen for real, making it slower, but without scalability impacts.

franz1981 · 2022-08-17T18:33:48Z

I am trying hard to create a version of this fix that doesn't require any hint re the "full concrete http type" used, making it more transparent for existing users so, please, hold on :)

franz1981 · 2022-08-18T22:27:47Z

This is the result of the microbenchmark while using type pollution
(note: although the original benchmark isn't using black holes I've already verified in the assembly that dead-code-elimination isn't happening):

This PR 1 thread:

Benchmark                                   (pooledAllocator)  (typePollution)  (voidPromise)   Mode  Cnt        Score        Error  Units
HttpRequestEncoderBenchmark.chunked                      true            false           true  thrpt   10  1036698.667 ±   8107.867  ops/s
HttpRequestEncoderBenchmark.chunked                      true             true           true  thrpt   10   905807.417 ±  13772.428  ops/s
HttpRequestEncoderBenchmark.contentLength                true            false           true  thrpt   10  1561759.238 ±  10901.522  ops/s
HttpRequestEncoderBenchmark.contentLength                true             true           true  thrpt   10  1299413.850 ±  26776.253  ops/s
HttpRequestEncoderBenchmark.differentTypes               true            false           true  thrpt   10   440180.570 ±  14614.238  ops/s
HttpRequestEncoderBenchmark.differentTypes               true             true           true  thrpt   10   443385.537 ±  20332.729  ops/s
HttpRequestEncoderBenchmark.fullMessage                  true            false           true  thrpt   10  2642338.894 ± 110718.503  ops/s
HttpRequestEncoderBenchmark.fullMessage                  true             true           true  thrpt   10  2324594.524 ±  22424.194  ops/s

This PR 6 threads:

Benchmark                                   (pooledAllocator)  (typePollution)  (voidPromise)   Mode  Cnt         Score        Error  Units
HttpRequestEncoderBenchmark.chunked                      true            false           true  thrpt   10   4769551.449 ± 142854.608  ops/s
HttpRequestEncoderBenchmark.chunked                      true             true           true  thrpt   10   4477651.779 ± 419351.979  ops/s
HttpRequestEncoderBenchmark.contentLength                true            false           true  thrpt   10   7776615.466 ± 225903.978  ops/s
HttpRequestEncoderBenchmark.contentLength                true             true           true  thrpt   10   6278694.334 ± 515945.343  ops/s
HttpRequestEncoderBenchmark.differentTypes               true            false           true  thrpt   10   1964309.534 ±  69229.190  ops/s
HttpRequestEncoderBenchmark.differentTypes               true             true           true  thrpt   10   2041805.064 ±  61923.201  ops/s
HttpRequestEncoderBenchmark.fullMessage                  true            false           true  thrpt   10  10271948.355 ± 558919.801  ops/s
HttpRequestEncoderBenchmark.fullMessage                  true             true           true  thrpt   10  10328800.426 ± 754618.592  ops/s

4.1 1 thread:

Benchmark                                   (pooledAllocator)  (typePollution)  (voidPromise)   Mode  Cnt        Score       Error  Units
HttpRequestEncoderBenchmark.chunked                      true            false           true  thrpt   10   968151.152 ± 52872.773  ops/s
HttpRequestEncoderBenchmark.chunked                      true             true           true  thrpt   10   887903.816 ± 17644.514  ops/s
HttpRequestEncoderBenchmark.contentLength                true            false           true  thrpt   10  1445556.948 ± 46657.318  ops/s
HttpRequestEncoderBenchmark.contentLength                true             true           true  thrpt   10  1174235.130 ± 39772.138  ops/s
HttpRequestEncoderBenchmark.differentTypes               true            false           true  thrpt   10   377020.952 ±  5622.663  ops/s
HttpRequestEncoderBenchmark.differentTypes               true             true           true  thrpt   10   367732.657 ±  2173.944  ops/s
HttpRequestEncoderBenchmark.fullMessage                  true            false           true  thrpt   10  2467141.212 ± 99167.623  ops/s
HttpRequestEncoderBenchmark.fullMessage                  true             true           true  thrpt   10  1500541.983 ± 18486.408  ops/s

4.1 6 threads

Benchmark                                   (pooledAllocator)  (typePollution)  (voidPromise)   Mode  Cnt         Score         Error  Units
HttpRequestEncoderBenchmark.chunked                      true            false           true  thrpt   10   4246375.439 ±  238607.376  ops/s
HttpRequestEncoderBenchmark.chunked                      true             true           true  thrpt   10   2948854.203 ±  123797.386  ops/s
HttpRequestEncoderBenchmark.contentLength                true            false           true  thrpt   10   5507903.362 ±  194094.645  ops/s
HttpRequestEncoderBenchmark.contentLength                true             true           true  thrpt   10   3505617.988 ±   77168.404  ops/s
HttpRequestEncoderBenchmark.differentTypes               true            false           true  thrpt   10   1344402.158 ±  113718.579  ops/s
HttpRequestEncoderBenchmark.differentTypes               true             true           true  thrpt   10   1376816.001 ±   49867.836  ops/s
HttpRequestEncoderBenchmark.fullMessage                  true            false           true  thrpt   10  10582142.810 ± 1443464.906  ops/s
HttpRequestEncoderBenchmark.fullMessage                  true             true           true  thrpt   10   2765103.925 ±  148623.873  ops/s

Few notes:

the PR improve the single thread case too
scalability is greatly improved, especially with type pollution

There's still some work to do for HTTP but this one already improve things dramatically, see fullMessage with type pollution:

on 4.1 type pollution makes barely achieve x1.8 the single threaded throughput using 6 cores
on the new PR the single-threaded baseline is higher and using 6 cores achieves x4.5 the single threaded throughput

franz1981 · 2022-08-18T22:46:59Z

codec-http/src/main/java/io/netty/handler/codec/http/HttpObjectEncoder.java

-            return ((FileRegion) msg).count();
-        }
-        throw new IllegalStateException("unexpected message type: " + StringUtil.simpleClassName(msg));
+        return msg instanceof FullHttpMessage ||


@normanmaurer
We can add checks for existing known concrete types too (default, default full etc etc), but TBH users can do it too while extending the resp/req encoders based on how they use it ie if they have custom types or not.

note:
The sequence of checks here matches the same on encode in order to let concrete types to not get their Klass cache field invalidated (and refreshed) - this will likely cause a perf hit for the last ones checked in the chain (ByteBuf and FileRegion), because they will be evaluated after every previous checks always hitting the JIT O(n) slow path.

franz1981 · 2022-08-18T23:43:37Z

@chrisvest ready to go :)
Both end 2 end and microbenchs agree that the scalability issue is solved :)

franz1981 · 2022-08-19T14:21:41Z

weird failure on

Build PR / windows-x86_64-java11-boringssl (pull_request)

@normanmaurer @chrisvest how to re-run the validations/just that validations?

chrisvest

Had a couple of comments.

It's unfortunate we have to play tricks like this, as the control flow was already hard to follow, and now that part is even worse.

codec-http/src/main/java/io/netty/handler/codec/http/HttpObjectEncoder.java

franz1981 · 2022-08-19T19:05:50Z

It's unfortunate we have to play tricks like this, as the control flow was already hard to follow, and now that part is even worse.

Agree, and it's unfortunate that such JDK 'feature" is unlikely to be fixed but still - there's a lesson to learn here too ie using traits to guide a state machine isn't a good idea, because it relies on how good is the runtime to resolve it. With polymorphism is similar, although maybe a double dispatch/visitor pattern would save the klass caching mechanism to be triggered.

Probably a plain old int/byte field in a single common type acting as a bitset of known "traits" + some switches to choose what to do for each of the implemented traits, is more deterministic regardless what the JIT decide (meaning that its performance characteristics survive over time/JDK bugs).
Even if I've solved (alleviated, more on this next week) the scalability issue we still get a plain dead O(n) in case we check against a missing trait, that's not ideal.
I would love Netty 5 to not make uses of this mechanism and I can help (@vietj too I believe) to do things differently there

franz1981 · 2022-08-24T12:33:00Z

@normanmaurer this is ready to go, unless there are other concerns - I can add more comments to help

normanmaurer

Can you just add some link to the PR / openjdk issue somewhere so we not revert this by mistake at some point

franz1981 · 2022-08-25T12:08:38Z

Can you just add some link to the PR / openjdk issue somewhere so we not revert this by mistake at some point

Let me add some comments on the PR

…etty#12708) Motivation: Current http encoder logic cause contention over the klass field used by the JIT to speed up instanceof checks vs interfaces, preventing scaling with multi-core machines. See https://bugs.openjdk.org/browse/JDK-8180450 for more info. Modifications: Code duplication to reduce the arity of the morphism on the instanceof checks on the http encoder. Removed using encoder inherited methods to take control of the arity of the call-site morphism while releasing ref counted http msg types. Result: Scalable HTTP encoding

franz1981 · 2022-08-25T13:43:15Z

@normanmaurer comment added and the issue too is more verbose (with an example that show what's going on)

franz1981 · 2022-08-25T13:44:35Z

The benchmark I've made is naive, but is good enough for our purpose I believe - type pollution made upfront just assume LOT of things re JIT and its configuration (and which optimizations should perform)

normanmaurer · 2022-08-25T14:30:13Z

@franz1981 thanks a lot for all the hard work. I learned something new ❤️

normanmaurer · 2022-08-25T14:33:56Z

@franz1981 can you also do a PR for main ?

franz1981 · 2022-08-25T15:04:59Z

@normanmaurer doing it for main too ;)
Although I would later change it there TBH

Let's chat on the pr

normanmaurer · 2022-08-25T15:05:50Z

@normanmaurer doing it for main too ;)

Although I would later change it there TBH

Let's chat on the pr

Agree... but better to be consistent until we d have a better way in main

…tly (Fixes netty#12750) Motivation: Changes due to netty#12709 have added a code path. Modifications: Restore the "just http message" case as separated by the http content ones Result: Same encoding as original code

Motivation: netty#12709 changed HttpObjectEncoder to override the write method of MessageToMessageEncoder, with slightly changed semantics: The `msg` argument to `encode` is not released anymore. To accommodate this change, netty#12709 also update `HttpObjectEncoder.encode` to release the `msg`. However, `HttpClientCodec.Encoder` overrides `encode` and simply forwards the message if a HTTP upgrade has been completed. This code path was not updated to release the input message. This leads to a memory leak. Modifications: Changed the `encode` implementation to not retain the message that is forwarded. Added a test case to verify that the refCnt to the data passed through is unchanged. Result: The buffer retains its correct refCnt and will be released properly.

Motivation: #12709 changed HttpObjectEncoder to override the write method of MessageToMessageEncoder, with slightly changed semantics: The `msg` argument to `encode` is not released anymore. To accommodate this change, #12709 also update `HttpObjectEncoder.encode` to release the `msg`. However, `HttpClientCodec.Encoder` overrides `encode` and simply forwards the message if a HTTP upgrade has been completed. This code path was not updated to release the input message. This leads to a memory leak. Modifications: Changed the `encode` implementation to not retain the message that is forwarded. Added a test case to verify that the refCnt to the data passed through is unchanged. Result: The buffer retains its correct refCnt and will be released properly.

…tly (Fixes #12750) (#12751) Motivation: Changes due to #12709 have added a code path. Modifications: Restore the "just http message" case as separated by the http content ones Result: Same encoding as original code Co-authored-by: Chris Vest <christianvest_hansen@apple.com>

franz1981 requested review from chrisvest, normanmaurer and Scottmitch August 17, 2022 12:22

franz1981 self-assigned this Aug 17, 2022

franz1981 added defect improvement benchmark Affect the JMH benchmarks important labels Aug 17, 2022

franz1981 force-pushed the netty-4.1.74.Final branch from 69a3fd4 to 34eff3c Compare August 17, 2022 12:42

franz1981 force-pushed the netty-4.1.74.Final branch from 34eff3c to b108ec2 Compare August 18, 2022 22:29

franz1981 commented Aug 18, 2022

View reviewed changes

franz1981 force-pushed the netty-4.1.74.Final branch 3 times, most recently from 8381793 to 88e2c30 Compare August 19, 2022 09:09

chrisvest reviewed Aug 19, 2022

View reviewed changes

franz1981 force-pushed the netty-4.1.74.Final branch 2 times, most recently from 97e1c1e to 200cbd2 Compare August 24, 2022 12:30

chrisvest approved these changes Aug 24, 2022

View reviewed changes

normanmaurer approved these changes Aug 25, 2022

View reviewed changes

normanmaurer mentioned this pull request Aug 25, 2022

Migrate HTTP/1 and HTTP/2 to the new headers/cookies API #12735

Merged

franz1981 force-pushed the netty-4.1.74.Final branch from 200cbd2 to 5244ca7 Compare August 25, 2022 13:38

normanmaurer approved these changes Aug 25, 2022

View reviewed changes

normanmaurer merged commit 423a385 into netty:4.1 Aug 25, 2022

normanmaurer added this to the 4.1.80.Final milestone Aug 25, 2022

franz1981 mentioned this pull request Aug 29, 2022

HttpMessages implementing HttpContent types too aren't handled correctly #12750

Closed

franz1981 mentioned this pull request Aug 29, 2022

HttpMessages implementing HttpContent types too aren't handled correctly (Fixes #12750) #12751

Merged

yawkat mentioned this pull request Sep 1, 2022

Fix buffer leak regression in HttpClientCodec #12762

Merged

This was referenced Sep 8, 2022

Vert.x 4.3.3 fails if assertions are enabled with Netty 4.1.80.Final eclipse-vertx/vert.x#4477

Closed

Bump version.io.netty.netty4 from 4.1.79.Final to 4.1.80.Final resteasy/resteasy#3231

Closed

This was referenced Sep 9, 2022

Major performance bottleneck in scala.collection.mutable.Builder scala/bug#9823

Closed

SI-9823 Collections perf: favor virtual call over instanceof scala/scala#5364

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HttpObjectEncoder scalability issue due to instanceof checks (Fixes #12708) #12709

HttpObjectEncoder scalability issue due to instanceof checks (Fixes #12708) #12709

franz1981 commented Aug 17, 2022 •

edited

Loading

normanmaurer commented Aug 17, 2022

franz1981 commented Aug 17, 2022 •

edited

Loading

franz1981 commented Aug 17, 2022 •

edited

Loading

franz1981 commented Aug 17, 2022

franz1981 commented Aug 18, 2022 •

edited

Loading

franz1981 Aug 18, 2022 •

edited

Loading

franz1981 commented Aug 18, 2022

franz1981 commented Aug 19, 2022

chrisvest left a comment

franz1981 commented Aug 19, 2022 •

edited

Loading

franz1981 commented Aug 24, 2022

normanmaurer left a comment

franz1981 commented Aug 25, 2022

franz1981 commented Aug 25, 2022

franz1981 commented Aug 25, 2022

normanmaurer commented Aug 25, 2022

normanmaurer commented Aug 25, 2022

franz1981 commented Aug 25, 2022 •

edited

Loading

normanmaurer commented Aug 25, 2022

HttpObjectEncoder scalability issue due to instanceof checks (Fixes #12708) #12709

HttpObjectEncoder scalability issue due to instanceof checks (Fixes #12708) #12709

Conversation

franz1981 commented Aug 17, 2022 • edited Loading

normanmaurer commented Aug 17, 2022

franz1981 commented Aug 17, 2022 • edited Loading

franz1981 commented Aug 17, 2022 • edited Loading

franz1981 commented Aug 17, 2022

franz1981 commented Aug 18, 2022 • edited Loading

franz1981 Aug 18, 2022 • edited Loading

Choose a reason for hiding this comment

franz1981 commented Aug 18, 2022

franz1981 commented Aug 19, 2022

chrisvest left a comment

Choose a reason for hiding this comment

franz1981 commented Aug 19, 2022 • edited Loading

franz1981 commented Aug 24, 2022

normanmaurer left a comment

Choose a reason for hiding this comment

franz1981 commented Aug 25, 2022

franz1981 commented Aug 25, 2022

franz1981 commented Aug 25, 2022

normanmaurer commented Aug 25, 2022

normanmaurer commented Aug 25, 2022

franz1981 commented Aug 25, 2022 • edited Loading

normanmaurer commented Aug 25, 2022

franz1981 commented Aug 17, 2022 •

edited

Loading

franz1981 commented Aug 17, 2022 •

edited

Loading

franz1981 commented Aug 17, 2022 •

edited

Loading

franz1981 commented Aug 18, 2022 •

edited

Loading

franz1981 Aug 18, 2022 •

edited

Loading

franz1981 commented Aug 19, 2022 •

edited

Loading

franz1981 commented Aug 25, 2022 •

edited

Loading