Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly debounce wakeups #9191

Merged
merged 1 commit into from Jun 4, 2019

Conversation

Projects
None yet
7 participants
@carl-mastrangelo
Copy link
Member

commented May 27, 2019

Motivation:
The wakeup logic in EpollEventLoop is overly complex

Modification:

  • Simplify the race to wakeup the loop
  • Dont let the event loop wake up itself (it's already awake!)
  • Make event loop check if there are any more tasks after preparing to
    sleep. There is small window where the non-eventloop writers can issue
    eventfd writes here, but that is okay.

Result:
Cleaner wakeup logic.

Benchmarks:

BEFORE
Benchmark                                   Mode  Cnt       Score      Error  Units
EpollSocketChannelBenchmark.executeMulti   thrpt   20  408381.411 ± 2857.498  ops/s
EpollSocketChannelBenchmark.executeSingle  thrpt   20  157022.360 ± 1240.573  ops/s
EpollSocketChannelBenchmark.pingPong       thrpt   20   60571.704 ±  331.125  ops/s

Benchmark                                   Mode  Cnt       Score      Error  Units
EpollSocketChannelBenchmark.executeMulti   thrpt   20  440546.953 ± 1652.823  ops/s
EpollSocketChannelBenchmark.executeSingle  thrpt   20  168114.751 ± 1176.609  ops/s
EpollSocketChannelBenchmark.pingPong       thrpt   20   61231.878 ±  520.108  ops/s
@carl-mastrangelo

This comment has been minimized.

Copy link
Member Author

commented May 27, 2019

cc: @johnou

@carl-mastrangelo

This comment has been minimized.

Copy link
Member Author

commented May 27, 2019

Oops, sorry about the formatting change. Let me see if I can coax intellij into not screwing with things.

@netty-bot

This comment has been minimized.

Copy link

commented May 27, 2019

Can one of the admins verify this patch?

@carl-mastrangelo carl-mastrangelo force-pushed the carl-mastrangelo:debounceread branch from 1441ec9 to 64de461 May 27, 2019

@normanmaurer

This comment has been minimized.

Copy link
Member

commented May 27, 2019

@netty-bot test this please

@normanmaurer normanmaurer requested review from ejona86, trustin and njhill May 27, 2019

@johnou

This comment has been minimized.

Copy link
Contributor

commented May 27, 2019

@njhill

njhill approved these changes May 28, 2019

Copy link
Member

left a comment

@carl-mastrangelo this is awesome, I had suspected this logic was unnecessarily convoluted but never looked at it closely enough.

We can/should apply the same simplification to NioEventLoop and KQueueEventLoop.

@carl-mastrangelo carl-mastrangelo force-pushed the carl-mastrangelo:debounceread branch from 64de461 to 437f91a May 28, 2019

@carl-mastrangelo

This comment has been minimized.

Copy link
Member Author

commented May 28, 2019

@njhill I took a crack at NioEventLoop, but it was more complicated, and I couldn't convince myself the change was correct. I figured I would tackle this easier case first. I can do the KQueue version, but alas I don't have a mac to test it on.

We should really have a JCStress test for these classes.

@normanmaurer

This comment has been minimized.

Copy link
Member

commented May 28, 2019

@netty-bot test this please

@normanmaurer

This comment has been minimized.

Copy link
Member

commented May 28, 2019

@carl-mastrangelo I think it is fine to just do it for epoll first and then port it.

@franz1981

This comment has been minimized.

Copy link
Contributor

commented May 28, 2019

Need some time to look at the code , but according to some of the things I've found on the wake-up mechanism, I'm already +1 to this :)
Just curious...we now how have 2 different benchs to measure ping pong?
If yes maybe could worth to drop one..

@carl-mastrangelo

This comment has been minimized.

Copy link
Member Author

commented May 28, 2019

@franz1981 what other ping pong benchmark?

@franz1981

This comment has been minimized.

Copy link
Contributor

commented May 29, 2019

@carl-mastrangelo My, bad, I've confused it with https://github.com/netty/netty/blob/cd3254df88b60476dc04b39915d3d70c200eb6f4/microbench/src/main/java/io/netty/microbench/concurrent/BurstCostExecutorsBenchmark.java

Anyway I think that it could help to measure the impact of these changes from a different perspective, If you wanna give it a try ;)

@franz1981
Copy link
Contributor

left a comment

LGTM, but I would appreciate a run of the BurstCostExecutorsBenchmark to check impacts, if possible 👍

@ejona86
Copy link
Member

left a comment

Very nice.

@carl-mastrangelo

This comment has been minimized.

Copy link
Member Author

commented May 30, 2019

@normanmaurer friendly ping on this

@johnou

This comment has been minimized.

Copy link
Contributor

commented May 30, 2019

@carl-mastrangelo did you get a chance to run BurstCostExecutorsBenchmark?

@carl-mastrangelo

This comment has been minimized.

Copy link
Member Author

commented May 30, 2019

@franz1981 I did try to run BurstCostExecutorsBenchmark several times, but was stymied by maven. The benchmark seemingly doesn't have a dependency on epoll, making the benchmark fail. I don't know enough about maven to fix this.

The default warmup time is like 10s, with 10 iterations, meaning the whole benchmark takes 8 hours to run (sorry, I don't have that much CPU time to spend on this). It doesn't seem possible to pass in an include argument to JMH on the command line using the maven plugin. Also, running from intellij doesn't allow setting the benchmark/timeout args, because it counts as a JUnit test. Nothing is easy.

@carl-mastrangelo

This comment has been minimized.

Copy link
Member Author

commented May 30, 2019

@johnou & @franz1981 I was able to get a run of the test. My change is better for all cases except the last one.

AFTER

Benchmark                                                          (burstLength)  (executorType)  (work)    Mode      Cnt         Score    Error  Units
BurstCostExecutorsBenchmark.test1Producer                                      1  epollEventLoop       0  sample   485920      2579.534 ±  2.758  ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.00                  1  epollEventLoop       0  sample                372.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.50                  1  epollEventLoop       0  sample               2488.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.90                  1  epollEventLoop       0  sample               2680.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.95                  1  epollEventLoop       0  sample               2712.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.99                  1  epollEventLoop       0  sample               3152.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.999                 1  epollEventLoop       0  sample              13344.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.9999                1  epollEventLoop       0  sample              20224.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p1.00                  1  epollEventLoop       0  sample              51968.000           ns/op
BurstCostExecutorsBenchmark.test1Producer                                      1  epollEventLoop      10  sample   495725      2544.146 ±  2.950  ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.00                  1  epollEventLoop      10  sample                602.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.50                  1  epollEventLoop      10  sample               2480.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.90                  1  epollEventLoop      10  sample               2636.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.95                  1  epollEventLoop      10  sample               2680.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.99                  1  epollEventLoop      10  sample               3112.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.999                 1  epollEventLoop      10  sample              13456.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.9999                1  epollEventLoop      10  sample              18656.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p1.00                  1  epollEventLoop      10  sample              48960.000           ns/op
BurstCostExecutorsBenchmark.test1Producer                                     10  epollEventLoop       0  sample   474050      2686.776 ±  4.424  ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.00                 10  epollEventLoop       0  sample               1036.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.50                 10  epollEventLoop       0  sample               2608.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.90                 10  epollEventLoop       0  sample               2760.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.95                 10  epollEventLoop       0  sample               2796.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.99                 10  epollEventLoop       0  sample               4488.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.999                10  epollEventLoop       0  sample              17440.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.9999               10  epollEventLoop       0  sample              22425.110           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p1.00                 10  epollEventLoop       0  sample             146432.000           ns/op
BurstCostExecutorsBenchmark.test1Producer                                     10  epollEventLoop      10  sample   460969      2724.390 ±  4.301  ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.00                 10  epollEventLoop      10  sample                804.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.50                 10  epollEventLoop      10  sample               2652.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.90                 10  epollEventLoop      10  sample               2788.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.95                 10  epollEventLoop      10  sample               2872.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.99                 10  epollEventLoop      10  sample               3520.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.999                10  epollEventLoop      10  sample              13760.480           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.9999               10  epollEventLoop      10  sample              19633.440           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p1.00                 10  epollEventLoop      10  sample             244992.000           ns/op
BurstCostExecutorsBenchmark.test2Producers                                     1  epollEventLoop       0  sample   954512      2659.576 ±  1.989  ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.00                1  epollEventLoop       0  sample                330.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.50                1  epollEventLoop       0  sample               2608.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.90                1  epollEventLoop       0  sample               2764.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.95                1  epollEventLoop       0  sample               2796.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.99                1  epollEventLoop       0  sample               3500.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.999               1  epollEventLoop       0  sample              11056.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.9999              1  epollEventLoop       0  sample              16800.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p1.00                1  epollEventLoop       0  sample             148480.000           ns/op
BurstCostExecutorsBenchmark.test2Producers                                     1  epollEventLoop      10  sample   967439      2649.760 ±  7.318  ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.00                1  epollEventLoop      10  sample                293.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.50                1  epollEventLoop      10  sample               2572.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.90                1  epollEventLoop      10  sample               2740.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.95                1  epollEventLoop      10  sample               2820.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.99                1  epollEventLoop      10  sample               3726.400           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.999               1  epollEventLoop      10  sample              12512.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.9999              1  epollEventLoop      10  sample              27048.192           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p1.00                1  epollEventLoop      10  sample            1044480.000           ns/op
BurstCostExecutorsBenchmark.test2Producers                                    10  epollEventLoop       0  sample   949819      2586.630 ±  2.768  ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.00               10  epollEventLoop       0  sample                551.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.50               10  epollEventLoop       0  sample               2696.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.90               10  epollEventLoop       0  sample               3196.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.95               10  epollEventLoop       0  sample               3680.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.99               10  epollEventLoop       0  sample               4424.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.999              10  epollEventLoop       0  sample              10018.880           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.9999             10  epollEventLoop       0  sample              15728.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p1.00               10  epollEventLoop       0  sample              79616.000           ns/op
BurstCostExecutorsBenchmark.test2Producers                                    10  epollEventLoop      10  sample  1007533      2488.526 ±  4.237  ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.00               10  epollEventLoop      10  sample                438.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.50               10  epollEventLoop      10  sample               2464.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.90               10  epollEventLoop      10  sample               3108.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.95               10  epollEventLoop      10  sample               3396.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.99               10  epollEventLoop      10  sample               4376.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.999              10  epollEventLoop      10  sample              10343.456           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.9999             10  epollEventLoop      10  sample              18951.891           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p1.00               10  epollEventLoop      10  sample             823296.000           ns/op
BurstCostExecutorsBenchmark.test3Producers                                     1  epollEventLoop       0  sample  1641703      2500.147 ± 11.419  ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.00                1  epollEventLoop       0  sample                156.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.50                1  epollEventLoop       0  sample               2600.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.90                1  epollEventLoop       0  sample               2832.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.95                1  epollEventLoop       0  sample               2872.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.99                1  epollEventLoop       0  sample               3364.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.999               1  epollEventLoop       0  sample               9520.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.9999              1  epollEventLoop       0  sample              14557.274           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p1.00                1  epollEventLoop       0  sample            5603328.000           ns/op
BurstCostExecutorsBenchmark.test3Producers                                     1  epollEventLoop      10  sample  2001874      2043.121 ±  2.405  ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.00                1  epollEventLoop      10  sample                119.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.50                1  epollEventLoop      10  sample               2376.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.90                1  epollEventLoop      10  sample               2752.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.95                1  epollEventLoop      10  sample               2804.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.99                1  epollEventLoop      10  sample               3212.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.999               1  epollEventLoop      10  sample               9984.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.9999              1  epollEventLoop      10  sample              18690.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p1.00                1  epollEventLoop      10  sample             712704.000           ns/op
BurstCostExecutorsBenchmark.test3Producers                                    10  epollEventLoop       0  sample  2006863      3458.703 ±  5.628  ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.00               10  epollEventLoop       0  sample                487.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.50               10  epollEventLoop       0  sample               3396.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.90               10  epollEventLoop       0  sample               4392.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.95               10  epollEventLoop       0  sample               4784.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.99               10  epollEventLoop       0  sample               5760.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.999              10  epollEventLoop       0  sample              10944.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.9999             10  epollEventLoop       0  sample              25404.211           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p1.00               10  epollEventLoop       0  sample            1980416.000           ns/op
BurstCostExecutorsBenchmark.test3Producers                                    10  epollEventLoop      10  sample  1735135      3534.582 ± 46.035  ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.00               10  epollEventLoop      10  sample                431.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.50               10  epollEventLoop      10  sample               3532.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.90               10  epollEventLoop      10  sample               4200.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.95               10  epollEventLoop      10  sample               4456.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.99               10  epollEventLoop      10  sample               5560.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.999              10  epollEventLoop      10  sample              12304.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.9999             10  epollEventLoop      10  sample              20383.130           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p1.00               10  epollEventLoop      10  sample           12517376.000           ns/op


BEFORE:



Benchmark                                                          (burstLength)  (executorType)  (work)    Mode      Cnt        Score    Error  Units
BurstCostExecutorsBenchmark.test1Producer                                      1  epollEventLoop       0  sample   745106     3370.669 ±  3.309  ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.00                  1  epollEventLoop       0  sample               384.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.50                  1  epollEventLoop       0  sample              3324.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.90                  1  epollEventLoop       0  sample              3376.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.95                  1  epollEventLoop       0  sample              3392.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.99                  1  epollEventLoop       0  sample              4104.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.999                 1  epollEventLoop       0  sample             16768.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.9999                1  epollEventLoop       0  sample             21056.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p1.00                  1  epollEventLoop       0  sample            505856.000           ns/op
BurstCostExecutorsBenchmark.test1Producer                                      1  epollEventLoop      10  sample   523081     3255.676 ±  3.375  ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.00                  1  epollEventLoop      10  sample               530.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.50                  1  epollEventLoop      10  sample              3276.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.90                  1  epollEventLoop      10  sample              3372.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.95                  1  epollEventLoop      10  sample              3392.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.99                  1  epollEventLoop      10  sample              4208.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.999                 1  epollEventLoop      10  sample             17184.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.9999                1  epollEventLoop      10  sample             21420.275           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p1.00                  1  epollEventLoop      10  sample             71680.000           ns/op
BurstCostExecutorsBenchmark.test1Producer                                     10  epollEventLoop       0  sample   416887     3233.760 ±  4.017  ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.00                 10  epollEventLoop       0  sample              1008.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.50                 10  epollEventLoop       0  sample              3172.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.90                 10  epollEventLoop       0  sample              3352.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.95                 10  epollEventLoop       0  sample              3432.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.99                 10  epollEventLoop       0  sample              3968.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.999                10  epollEventLoop       0  sample             17571.584           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.9999               10  epollEventLoop       0  sample             22131.917           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p1.00                 10  epollEventLoop       0  sample            219136.000           ns/op
BurstCostExecutorsBenchmark.test1Producer                                     10  epollEventLoop      10  sample   470314     3417.301 ±  3.763  ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.00                 10  epollEventLoop      10  sample               977.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.50                 10  epollEventLoop      10  sample              3308.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.90                 10  epollEventLoop      10  sample              3604.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.95                 10  epollEventLoop      10  sample              3632.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.99                 10  epollEventLoop      10  sample              4432.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.999                10  epollEventLoop      10  sample             17408.000           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p0.9999               10  epollEventLoop      10  sample             22334.992           ns/op
BurstCostExecutorsBenchmark.test1Producer:test1Producer·p1.00                 10  epollEventLoop      10  sample            109056.000           ns/op
BurstCostExecutorsBenchmark.test2Producers                                     1  epollEventLoop       0  sample   801320     3150.514 ±  2.074  ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.00                1  epollEventLoop       0  sample               161.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.50                1  epollEventLoop       0  sample              3108.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.90                1  epollEventLoop       0  sample              3172.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.95                1  epollEventLoop       0  sample              3204.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.99                1  epollEventLoop       0  sample              3916.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.999               1  epollEventLoop       0  sample             10848.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.9999              1  epollEventLoop       0  sample             17147.773           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p1.00                1  epollEventLoop       0  sample            149504.000           ns/op
BurstCostExecutorsBenchmark.test2Producers                                     1  epollEventLoop      10  sample   907114     3241.515 ±  4.921  ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.00                1  epollEventLoop      10  sample               318.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.50                1  epollEventLoop      10  sample              3128.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.90                1  epollEventLoop      10  sample              3396.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.95                1  epollEventLoop      10  sample              3428.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.99                1  epollEventLoop      10  sample              4264.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.999               1  epollEventLoop      10  sample             12048.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.9999              1  epollEventLoop      10  sample             25001.232           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p1.00                1  epollEventLoop      10  sample            687104.000           ns/op
BurstCostExecutorsBenchmark.test2Producers                                    10  epollEventLoop       0  sample   892967     2871.753 ±  3.487  ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.00               10  epollEventLoop       0  sample               560.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.50               10  epollEventLoop       0  sample              3120.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.90               10  epollEventLoop       0  sample              3416.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.95               10  epollEventLoop       0  sample              3500.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.99               10  epollEventLoop       0  sample              4088.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.999              10  epollEventLoop       0  sample             11664.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.9999             10  epollEventLoop       0  sample             19962.010           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p1.00               10  epollEventLoop       0  sample            220928.000           ns/op
BurstCostExecutorsBenchmark.test2Producers                                    10  epollEventLoop      10  sample   974951     2609.336 ±  6.262  ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.00               10  epollEventLoop      10  sample               447.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.50               10  epollEventLoop      10  sample              2460.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.90               10  epollEventLoop      10  sample              3524.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.95               10  epollEventLoop      10  sample              3692.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.99               10  epollEventLoop      10  sample              4320.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.999              10  epollEventLoop      10  sample             10656.000           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p0.9999             10  epollEventLoop      10  sample             17952.307           ns/op
BurstCostExecutorsBenchmark.test2Producers:test2Producers·p1.00               10  epollEventLoop      10  sample           1650688.000           ns/op
BurstCostExecutorsBenchmark.test3Producers                                     1  epollEventLoop       0  sample  1281593     3127.447 ± 15.492  ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.00                1  epollEventLoop       0  sample               160.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.50                1  epollEventLoop       0  sample              3188.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.90                1  epollEventLoop       0  sample              3492.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.95                1  epollEventLoop       0  sample              3544.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.99                1  epollEventLoop       0  sample              4240.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.999               1  epollEventLoop       0  sample             10736.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.9999              1  epollEventLoop       0  sample             19872.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p1.00                1  epollEventLoop       0  sample           4173824.000           ns/op
BurstCostExecutorsBenchmark.test3Producers                                     1  epollEventLoop      10  sample  1613861     2599.290 ±  4.250  ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.00                1  epollEventLoop      10  sample               119.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.50                1  epollEventLoop      10  sample              3140.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.90                1  epollEventLoop      10  sample              3508.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.95                1  epollEventLoop      10  sample              3568.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.99                1  epollEventLoop      10  sample              4056.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.999               1  epollEventLoop      10  sample             11136.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.9999              1  epollEventLoop      10  sample             57895.283           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p1.00                1  epollEventLoop      10  sample            230656.000           ns/op
BurstCostExecutorsBenchmark.test3Producers                                    10  epollEventLoop       0  sample  1892544     3704.901 ±  2.611  ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.00               10  epollEventLoop       0  sample               371.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.50               10  epollEventLoop       0  sample              3736.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.90               10  epollEventLoop       0  sample              4552.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.95               10  epollEventLoop       0  sample              4848.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.99               10  epollEventLoop       0  sample              5848.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.999              10  epollEventLoop       0  sample             11936.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.9999             10  epollEventLoop       0  sample             20160.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p1.00               10  epollEventLoop       0  sample            390656.000           ns/op
BurstCostExecutorsBenchmark.test3Producers                                    10  epollEventLoop      10  sample  1342565     3226.210 ±  7.763  ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.00               10  epollEventLoop      10  sample               493.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.50               10  epollEventLoop      10  sample              3164.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.90               10  epollEventLoop      10  sample              4076.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.95               10  epollEventLoop      10  sample              4392.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.99               10  epollEventLoop      10  sample              5392.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.999              10  epollEventLoop      10  sample             11552.000           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p0.9999             10  epollEventLoop      10  sample             29733.677           ns/op
BurstCostExecutorsBenchmark.test3Producers:test3Producers·p1.00               10  epollEventLoop      10  sample            775168.000           ns/op

@normanmaurer
Copy link
Member

left a comment

@carl-mastrangelo just did another review... please check my comment as I have some concerns here.

@carl-mastrangelo carl-mastrangelo force-pushed the carl-mastrangelo:debounceread branch from 437f91a to b71d4cd May 31, 2019

@johnou

johnou approved these changes May 31, 2019

@normanmaurer

This comment has been minimized.

Copy link
Member

commented May 31, 2019

@carl-mastrangelo sorry for be a PITA but could you re-run the benchmark ?

@carl-mastrangelo carl-mastrangelo force-pushed the carl-mastrangelo:debounceread branch from b71d4cd to 799c9a9 May 31, 2019

@carl-mastrangelo

This comment has been minimized.

Copy link
Member Author

commented May 31, 2019

@normanmaurer I re ran the EpollSocketChannelBenchmark and updated my comment (and commit). I dont have access to the original machine so I can't run the EpollSocketChannelBenchmark again).

Overall stats are a little worse, but it should avoid the starvation.

Properly debounce wakeups
Motivation:
The wakeup logic in EpollEventLoop is overly complex

Modification:
* Simplify the race to wakeup the loop
* Dont let the event loop wake up itself (it's already awake!)
* Make event loop check if there are any more tasks after preparing to
sleep.  There is small window where the non-eventloop writers can issue
eventfd writes here, but that is okay.

Result:
Cleaner wakeup logic.

Benchmarks:

```
BEFORE
Benchmark                                   Mode  Cnt       Score      Error  Units
EpollSocketChannelBenchmark.executeMulti   thrpt   20  408381.411 ± 2857.498  ops/s
EpollSocketChannelBenchmark.executeSingle  thrpt   20  157022.360 ± 1240.573  ops/s
EpollSocketChannelBenchmark.pingPong       thrpt   20   60571.704 ±  331.125  ops/s

Benchmark                                   Mode  Cnt       Score      Error  Units
EpollSocketChannelBenchmark.executeMulti   thrpt   20  440546.953 ± 1652.823  ops/s
EpollSocketChannelBenchmark.executeSingle  thrpt   20  168114.751 ± 1176.609  ops/s
EpollSocketChannelBenchmark.pingPong       thrpt   20   61231.878 ±  520.108  ops/s
```

@carl-mastrangelo carl-mastrangelo force-pushed the carl-mastrangelo:debounceread branch from 799c9a9 to 8f6affe Jun 3, 2019

@normanmaurer

This comment has been minimized.

Copy link
Member

commented Jun 3, 2019

@netty-bot test this please

@normanmaurer normanmaurer added this to the 4.1.37.Final milestone Jun 4, 2019

@normanmaurer normanmaurer merged commit 9abeaf1 into netty:4.1 Jun 4, 2019

3 checks passed

pull request validation (centos6-java11) Build finished.
Details
pull request validation (centos6-java12) Build finished.
Details
pull request validation (centos6-java8) Build finished.
Details
@normanmaurer

This comment has been minimized.

Copy link
Member

commented Jun 4, 2019

normanmaurer added a commit that referenced this pull request Jun 4, 2019

Properly debounce wakeups (#9191)
Motivation:
The wakeup logic in EpollEventLoop is overly complex

Modification:
* Simplify the race to wakeup the loop
* Dont let the event loop wake up itself (it's already awake!)
* Make event loop check if there are any more tasks after preparing to
sleep.  There is small window where the non-eventloop writers can issue
eventfd writes here, but that is okay.

Result:
Cleaner wakeup logic.

Benchmarks:

```
BEFORE
Benchmark                                   Mode  Cnt       Score      Error  Units
EpollSocketChannelBenchmark.executeMulti   thrpt   20  408381.411 ± 2857.498  ops/s
EpollSocketChannelBenchmark.executeSingle  thrpt   20  157022.360 ± 1240.573  ops/s
EpollSocketChannelBenchmark.pingPong       thrpt   20   60571.704 ±  331.125  ops/s

Benchmark                                   Mode  Cnt       Score      Error  Units
EpollSocketChannelBenchmark.executeMulti   thrpt   20  440546.953 ± 1652.823  ops/s
EpollSocketChannelBenchmark.executeSingle  thrpt   20  168114.751 ± 1176.609  ops/s
EpollSocketChannelBenchmark.pingPong       thrpt   20   61231.878 ±  520.108  ops/s
```

@carl-mastrangelo carl-mastrangelo deleted the carl-mastrangelo:debounceread branch Jun 4, 2019

njhill added a commit to njhill/netty that referenced this pull request Jun 4, 2019

Minimize task queue access in epoll event loop
Motivation

This is a follow-on change to netty#9191 which simplified the event loop task
wakeup coordination. There are still redundant checks for task
existence, which aren't free since they each involve 4 separate volatile
reads, at least one of which is possibly contended.

Modifications
Add a variant of SingleThreadEventExecutor#runAllTasks(long) whose
return value indicates whether any tasks still remain, rather than
whether any were run. This is used to avoid immediately re-checking the
queue after it's just been processed. Add a local hasTasks variable in
the event loop run() method to keep track of whether or not outstanding
tasks are known to be in the queue.

Result

Less coordination overhead in epoll event loop
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.