Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid unnecessary event loop wakeups #9605

Merged
merged 1 commit into from Oct 12, 2019

Conversation

njhill
Copy link
Member

@njhill njhill commented Sep 26, 2019

Motivation

The recently-introduced event loop scheduling hooks can be exploited by
the epoll transport to avoid waking the event loop when scheduling
future tasks if there is a timer already set to wake up sooner.

There is also a "default" timeout which will wake the event
loop after 1 second if there are no pending future tasks. The
performance impact of these wakeups themselves is likely negligible but
there's significant overhead in having to re-arm the timer every time
the event loop goes to sleep (see #7816). It's not 100% clear why this
timeout was there originally but we're sure it's no longer needed.

Modification

Combine the existing volatile wakenUp and non-volatile prevDeadlineNanos
fields into a single AtomicLong that stores the next scheduled wakeup
time while the event loop is in epoll_wait, and is -1 while it is awake.

Use this as a guard to debounce wakeups from both immediate scheduled
tasks and future scheduled tasks, the latter using the new
before/afterScheduledTaskSubmitted overrides and based on whether the
new deadline occurs prior to an already-scheduled timer.

A similar optimization was already added to NioEventLoop, but it still
uses two separate volatiles. We should consider similar streamlining of
that in a future update.

Result

Fewer event loop wakeups when scheduling future tasks, greatly reduced
overhead when no future tasks are scheduled.

@njhill
Copy link
Member Author

njhill commented Sep 26, 2019

In hindsight would have been nice to have had this in the last release!

@netty-bot
Copy link

Can one of the admins verify this patch?

@normanmaurer
Copy link
Member

@netty-bot test this please

@normanmaurer normanmaurer added this to the 4.1.43.Final milestone Sep 26, 2019
@njhill
Copy link
Member Author

njhill commented Sep 26, 2019

@normanmaurer I noticed a mistake in the original commit, the task queue must be checked after the volatile deadline is updated to avoid possible race condition, now fixed. This was careless of me, apologies. I also verified that it's not a problem in the NioEventLoop case.

@normanmaurer
Copy link
Member

@netty-bot test this please

@njhill
Copy link
Member Author

njhill commented Sep 26, 2019

@normanmaurer sorry for the churn, I just pushed another commit to eliminate the infamous 1 second default delay when there's no future tasks scheduled. I looked through the history of this and am fairly certain it serves no purpose now. Let me know if you're not convinced and I can try to convince you :)

In fact the way it was implemented meant that if scheduled tasks were not being used, the timerfd was being re-armed every time the EL went to sleep, which is a significant overhead as was demonstrated last year in the context of @carl-mastrangelo's re-arm skipping optimization.

@njhill
Copy link
Member Author

njhill commented Sep 27, 2019

Benchmark results to illustrate the above:

Before:

Benchmark                                               Mode  Cnt      Score     Error  Units
EpollSocketChannelBenchmark.pingPong  (no sched tasks) thrpt   20  23927.536 ± 510.134  ops/s
EpollSocketChannelBenchmark.pingPong  (1 sched task)   thrpt   20  26706.498 ± 251.527  ops/s

After:

Benchmark                                               Mode  Cnt      Score     Error  Units
EpollSocketChannelBenchmark.pingPong  (no sched tasks) thrpt   20  26793.824 ± 337.671  ops/s
EpollSocketChannelBenchmark.pingPong  (1 sched task)   thrpt   20  26885.536 ± 177.165  ops/s

@normanmaurer
Copy link
Member

@njhill so here comes the "funny thing" ... With this PR applied I see the same problem in our internal test suite as with #9590

@njhill
Copy link
Member Author

njhill commented Sep 27, 2019

@normanmaurer interestingly while playing with some benchmarks of this I also saw some hangs at the end of the test i.e. during shutdown. Digging further I found it was due to a kind of indirectly related issue ... opened #9616 with fix for this. Hope it's the same root cause as what you're encountering... (though not holding my breath!)

@normanmaurer
Copy link
Member

@njhill was just about to write that it is related to shutdown as I the future returned by shutdownGracefully never completed in some cases (noticed while taking some jstacks). Let me apply your fix as well and see how it goes :)

@normanmaurer
Copy link
Member

@njhill yes this fixes the issue ... Good job 🎉

@njhill
Copy link
Member Author

njhill commented Sep 27, 2019

@normanmaurer I pushed another commit which indirectly addresses your comments :) Realized that it's straightforward to unify the deadline and wakenUp volatiles, and the end result is cleaner and more elegant imho.

We might want to reevaluate the net benefit of allowing timerfd updates from outside of the EL (i.e. #9590), since this PR provides a lot of the benefit with less synchronization complexity/cost. In any case this seems like it would be a nicer incremental change to make first, WDYT? (there are some orthogonal tweaks in the other PR that would still be nice to include regardless, e.g. processing of scheduled task queue directly rather than copying to the main taskQueue first)

@normanmaurer
Copy link
Member

@njhill can you rebase please ?

@njhill
Copy link
Member Author

njhill commented Oct 7, 2019

@normanmaurer done

@normanmaurer
Copy link
Member

@netty-bot test this please

1 similar comment
@normanmaurer
Copy link
Member

@netty-bot test this please

@normanmaurer
Copy link
Member

@njhill sorry but please rebase one last time :)

@njhill
Copy link
Member Author

njhill commented Oct 10, 2019

@normanmaurer done

@normanmaurer
Copy link
Member

@netty-bot test this please

Copy link
Member

@normanmaurer normanmaurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me test this one today

@normanmaurer
Copy link
Member

@njhill want to squash and adjust commit message ?

@normanmaurer
Copy link
Member

@njhill all tests pass in our internal test suite... This is good to go. Good work!

Motivation

The recently-introduced event loop scheduling hooks can be exploited by
the epoll transport to avoid waking the event loop when scheduling
future tasks if there is a timer already set to wake up sooner.

There is also a "default" timeout which will wake the event
loop after 1 second if there are no pending future tasks. The
performance impact of these wakeups themselves is likely negligible but
there's significant overhead in having to re-arm the timer every time
the event loop goes to sleep (see netty#7816). It's not 100% clear why this
timeout was there originally but we're sure it's no longer needed.

Modification

Combine the existing volatile wakenUp and non-volatile prevDeadlineNanos
fields into a single AtomicLong that stores the next scheduled wakeup
time while the event loop is in epoll_wait, and is -1 while it is awake.

Use this as a guard to debounce wakeups from both immediate scheduled
tasks and future scheduled tasks, the latter using the new
before/afterScheduledTaskSubmitted overrides and based on whether the
new deadline occurs prior to an already-scheduled timer.

A similar optimization was already added to NioEventLoop, but it still
uses two separate volatiles. We should consider similar streamlining of
that in a future update.

Result

Fewer event loop wakeups when scheduling future tasks, greatly reduced
overhead when no future tasks are scheduled.
@njhill njhill changed the title Avoid unnecessary epoll event loop wakeups for scheduled tasks Avoid unnecessary event loop wakeups Oct 12, 2019
@njhill
Copy link
Member Author

njhill commented Oct 12, 2019

@normanmaurer I squashed and reworded the commit. I will update the PR wording to match when next in front of my comp.

@normanmaurer
Copy link
Member

@netty-bot test this please

@normanmaurer normanmaurer merged commit 166caf9 into netty:4.1 Oct 12, 2019
@normanmaurer
Copy link
Member

@njhill thanks a lot! Can you please take a look how to port this to master ?

@njhill njhill deleted the epoll-avoid-sched-wake branch October 13, 2019 02:13
@njhill njhill mentioned this pull request Nov 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants