New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EPOLL - decouple schedule tasks from epoll_wait life cycle #7834
Conversation
more testing / analysis is required, but figured I would push this upstream to see if everyone is on board. @carl-mastrangelo - would you mind running the same benchmarks you ran for PR #7816? |
/** | ||
* Represents a task that is scheduled to expire some time in the future. | ||
*/ | ||
public interface ScheduledTask { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we just use Delayed
?
https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Delayed.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to get the deadline
, not the delay
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Scottmitch that said should we just make this part of the Abstract base class and mark as unstable ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What abstract base class? The issue is we need it to be accessible outside the package (PromiseTask
and ScheduledFutureTask
are package private). I considered putting a protected interface in AbstractScheduledEventExecutor
and can do this to reduce the exposure of this API. I will push this change and we can discuss more if necessary.
On a related note ... the relationship between ScheduledFutureTask
and AbstractScheduledEventExecutor
is very tightly coupled. I would like to tease these two apart in general as there a assumptions needed about using the "correct" nanoTime
that may be made less "leaky" for the next major release (I added something to the doc).
transport/src/main/java/io/netty/channel/SingleThreadEventLoop.java
Outdated
Show resolved
Hide resolved
@@ -102,23 +102,23 @@ public static FileDescriptor newTimerFd() { | |||
public static native void eventFdWrite(int fd, long value); | |||
public static native void eventFdRead(int fd); | |||
static native void timerFdRead(int fd); | |||
static native void timerFdSetTime(int fd, int sec, int nsec); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have a throws
as it may throw a ClosedChannelException
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch (its actually ChannelException
at the moment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However this is a runtime exception ... would you prefer a IOException
using the errno
like in epollWait
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I think this would be better (using NativeIOException)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just went with IOException
for now ... avoids additional conditional checking and the string description of the error is already included in the IoExecution
thrown by the JNI code.
* with the time source (e.g. calling System.nanoTime()) which can be expensive. | ||
*/ | ||
private final AtomicLong nextDeadlineNanos = new AtomicLong(MAXIMUM_DEADLINE); | ||
private final AtomicInteger wakenUp = new AtomicInteger(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use an AtomicBoolean
as we use it as a boolean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered this, however it involves additional conditional behind the scenes. Since this is internal I think we can just use AtomicInteger
.
I also moved away from the AtomicFieldUpdater because:
- the excepted number of instances for this class is low
- the atomic field updaters have an additional "instance check" which can be avoided.
@Scottmitch Thanks for this... in general this looks ok. That said I would be interested to see how it compares to #7816 and if it worth it compared to the "easier" solution that was used in #7816 |
Agreed more analysis is necessary. This PR does a lot of things PR #7816 doesn't do, so lets see if I can quantify it some how :) |
} | ||
|
||
static long deadlineToDelayNanos(long deadlineNanos) { | ||
return Math.max(0, deadlineNanos - nanoTime()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think you need to allow negative values here. If you were sorting task based on their delay rather than their deadline, you still need to keep track of which task is most tardy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this could likely be changed, but is pre-existing behavior and doesn't impact this PR. Note that we are sorting, and determining "next to expire", based upon deadline (not delay).
Lets discuss in a followup issue: #7840
common/src/main/java/io/netty/util/concurrent/SingleThreadEventExecutor.java
Outdated
Show resolved
Hide resolved
common/src/main/java/io/netty/util/concurrent/SingleThreadEventExecutor.java
Outdated
Show resolved
Hide resolved
transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollEventLoop.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, no other concerns from me.
@carl-mastrangelo - thanks for review! would you be able to run the same benchmark you did in PR #7816? I'm also planning on doing some analysis, but the more data points the better. |
659ab6c
to
b363996
Compare
Sorry I didn't have a chance to review your PR yet, @Scottmitch. Will find some time soon. |
How can we get this artifact jar or artificatId for a maven dependency? |
@M-AJ I can build one for you if you are interested otherwise you would just apply the patch and build it yourself. |
@normanmaurer If you can build one, that'd be great! Really appreciate it! I've tried building it locally but the process takes a while for me. |
b363996
to
5e74a0d
Compare
I just rebased to make getting recent fixes easier ... sorry super busy at the moment ... this is still on the backlog and hopefully will get to this soon. However it would be much appreciated if others could also try this out though 👍 |
@M-AJ you want a netty-all jar or just the epoll jar ? |
@normanmaurer netty-all would be perfect. Appreciate it! |
5e74a0d
to
9c380c4
Compare
Hey @normanmaurer, any update on the build? Can't wait to try it out and see how it performs! |
Sorry it took me some time :( Try this one: |
@normanmaurer - I updated the commit message. let me know if it makes more sense now. |
// is being shutdown, so the timerFd and associated polling mechanism will be destroyed anyways. | ||
} | ||
|
||
protected final ScheduledTask peekScheduledTaskDelayNanos() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing Javadoc?
* Get the time stamp relative to some monotonically increasing time source for which this {@link ScheduledTask} | ||
* expires. | ||
* @return the time stamp relative to some monotonically increasing time source for which this | ||
* {@link ScheduledTask} expires. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: indentation
protected interface ScheduledTaskRunnable extends Runnable, ScheduledTask { | ||
/** | ||
* Determine if this object is the result of adding or removing a scheduled task. | ||
* @return {@code true} if this object is the result of adding or removing a scheduled task. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this method return false
on removal?
* occur on the event loop thread do not interact with this method. | ||
* @param runnable The {@link ScheduledTaskRunnable} to execute. | ||
*/ | ||
protected void executeScheduledRunnable(ScheduledTaskRunnable runnable) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious if the parameter type of this method can be widen to ScheduledTask
or even Runnable
. If so, we could hide ScheduledTaskRunnable
(and even ScheduledTask
) completely from the public API, i.e. less maintenance burden.
If a subclass needs the information like deadline, we could just add it as an additional parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way this suggestion was implemented makes things more complex IMHO. We are trying to communicate from here that the type of task is for scheduled execution, and this information is used conditionally at higher layers (e.g. should we wake up). This has been changed to introduce 3 methods (e.g. beforeScheduledTaskSubmitted
, afterScheduledTaskSubmitted
, wakesUpForScheduledRunnable
) and 1 additional type is exposed NonWakeupRunnable
publicly which may require conditional allocation/wrapping at higher layers. IOW we are losing information by just allocating a Runnable
here and have to compensate by complicating method signatures and type hierarchy at other layers.
I'll submit a followup PR (since this plus an additional PR was recently reverted) so we can take a look. /cc @normanmaurer @njhill
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Scottmitch. I gave this quite a lot of thought so would be good to discuss more. A few initial responses though:
- The prior communication depended on the notion of an intermediate task which happens to involve enqueing some other scheduled task onto the internal scheduled-task queue (i.e. not the scheduled task itself), which is something we should not be exposing at all imho... that seems like an internal detail which EL impls shouldn't be aware of.
- I don't see
wakesUpForScheduledRunnable
that you referenced, do you meanwakesUpForRunnable
orexecuteScheduledRunnable
? - The only reason I left
wakesUpForRunnable
there is that it's protected and so would be a breaking change to remove it, I'd suggest that we deprecate it. A publicNonWakeupRunnable
interface serves a similar purpose but makes much more sense imho. Whether a given task is required to be run immediately is in general a property of that task and should be independent of the executor/EL implementation. We could consider alazyExecute(Runnable)
method in addition to this (might be be nice to have on theEventExecutor
interface in netty 5). - I actually think we should remove
executeScheduledRunnable
altogether (possible since it's package-private), and instead just overrideschedule(...)
inSingleThreadEventExecutor
. This would further simplify things and avoid the doubleRunnable
wrapping. Not sure it's also what you had in mind but it's a small change, I'll also push an example of that - Only the changes to
EpollEventLoop
itself were reverted for the release, the reworked superclass abstractions are still there. I'm still very interested to find out what the issues were ... @normanmaurer any luck with a reproducer or more clues? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Scottmitch I made the changes alluded to above here 3923176 but took things a bit further still, uplifting the before/after hooks to AbstractScheduledEventExecutor
and introducing higher level idea of lazyExecute
/ LazyRunnable
which could have general applicability.
I really think this is both simpler to exploit and more generally useful for subclasses (i.e. event loop impls).
/** | ||
* The next task to expire (e.g. minimum delay) has been removed from the scheduled priority queue. | ||
*/ | ||
protected void minimumDelayScheduledTaskRemoved(@SuppressWarnings("unused") ScheduledTask task) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto - can ScheduledTask
be widen to Runnable
?
@netty-bot test this please |
d23b852
to
dc2cddf
Compare
Motivation: EPOLL supports decoupling the timed wakeup mechanism from the selector call. The EPOLL transport takes advantage of this in order to offer more fine grained timer resolution. However we are current calling timerfd_settime on each call to epoll_wait and this is expensive. We don't have to re-arm the timer on every call to epoll_wait and instead only have to arm the timer when a task is scheduled with an earlier expiration than any other existing scheduled task. Modifications: - Before scheduled tasks are added to the task queue, we determine if the new duration is the soonest to expire, and if so update with timerfd_settime. We also drain all the tasks at the end of the event loop to make sure we service any expired tasks and get an accurate next time delay. - EpollEventLoop maintains a volatile variable which represents the next deadline to expire. This variable is modified inside the event loop thread (before calling epoll_wait) and out side the event loop thread (immediately to ensure proper wakeup time). - Execute the task queue before the schedule task priority queue. This means we may delay the processing of scheduled tasks but it ensures we transfer all pending tasks from the task queue to the scheduled priority queue to run the soonest to expire scheduled task first. - Deprecate IORatio on EpollEventLoop, and drain the executor and scheduled queue on each event loop wakeup. Coupling the amount of time we are allowed to drain the executor queue to a proportion of time we process inbound IO may lead to unbounded queue sizes and unpredictable latency. Result: Fixes netty#7829 - In most cases this results in less calls to timerfd_settime - Less event loop wakeups just to check for scheduled tasks executed outside the event loop - More predictable executor queue and scheduled task queue draining - More accurate and responsive scheduled task execution
dc4ceba
to
834763e
Compare
This is sitting here for a long time now... Let me merge this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Scottmitch @normanmaurer apologies for the belated review of this. I think the general change is great, as well as much of the impl.
I made a few comments inline but in general have some reservations regarding the new abstractions exposed by the superclasses to allow EventLoop impls to integrate with the common timed-task scheduling logic.
I'll put more detail as a comment directly in the PR but my immediate concern is that once these APIs ship we can't easily change them, and I think they could be simpler.
// Just ignore as we use ET mode for the eventfd and timerfd. | ||
// | ||
// See also https://stackoverflow.com/a/12492308/1074097 | ||
} else if (fd == timerFd.intValue()) { | ||
// consume wakeup event, necessary because the timer is added with ET mode. | ||
Native.timerFdRead(fd); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be my misunderstanding, but wouldn't ET imply this read isn't needed?
* @return {@code true} if at least {@link Runnable#run()} was called. | ||
*/ | ||
private boolean runExistingTasksFrom(Queue<Runnable> taskQueue) { | ||
return taskQueue.offer(BOOKEND_TASK) ? runExistingTasksUntilBookend(taskQueue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking the same as @carl-mastrangelo's earlier comment here - that the bookend could be easily avoided by just polling size()
times. It would be cheaper and simpler, e.g. there's no need to deal specially with the bounded/full case. The JCTools impls won't involve any iteration when called from the consumer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea! calling from consumer thread for MPSC shouldn't require looping.
@@ -186,6 +186,31 @@ static jint netty_epoll_native_epollCreate(JNIEnv* env, jclass clazz) { | |||
return efd; | |||
} | |||
|
|||
static void netty_epoll_native_timerFdSetTime(JNIEnv* env, jclass clazz, jint timerFd, jint tvSec, jint tvNsec) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you ever consider changing to use TFD_TIMER_ABSTIME
? We could just measure/store the offset of System.nanoTime()
from the system clock during initialization and then use deadlines rather than delays across the board to further avoid continual absolute/relative conversion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I didn't consider this.
// lower value. | ||
nextDeadline = nextDeadlineNanos.get(); | ||
if (nextDeadline - candidateNextDeadline < 0) { | ||
setTimerFd(deadlineToDelayNanos(nextDeadline)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that this fully addresses the race condition. I think access to setTimerFd
probably needs to be synchronized.
@Override | ||
public void run() { | ||
try { | ||
assertEquals(1, Native.epollWait(epoll, eventArray, timerFd, -1, -1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of removing this test why not just adjust it to use the new equivalent Native.epollWaitNoTimeout
method?
Want to do a PR to show the ideas ?
… Am 15.08.2019 um 22:48 schrieb Nick Hill ***@***.***>:
@njhill commented on this pull request.
@Scottmitch @normanmaurer apologies for the belated review of this. I think the general change is great, as well as much of the impl.
I made a few comments inline but in general have some reservations regarding the new abstractions exposed by the superclasses to allow EventLoop impls to integrate with the common timed-task scheduling logic.
I'll put more detail as a comment directly in the PR but my immediate concern is that once these APIs ship we can't easily change them, and I think they could be simpler.
In transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollEventLoop.java:
> // Just ignore as we use ET mode for the eventfd and timerfd.
//
// See also https://stackoverflow.com/a/12492308/1074097
+ } else if (fd == timerFd.intValue()) {
+ // consume wakeup event, necessary because the timer is added with ET mode.
+ Native.timerFdRead(fd);
May be my misunderstanding, but wouldn't ET imply this read isn't needed?
In common/src/main/java/io/netty/util/concurrent/SingleThreadEventExecutor.java:
> @@ -397,6 +455,55 @@ protected final boolean runAllTasksFrom(Queue<Runnable> taskQueue) {
}
}
+ /**
+ * What ever tasks are present in ***@***.*** taskQueue} when this method is invoked will be ***@***.*** Runnable#run()}.
+ * @param taskQueue the task queue to drain.
+ * @return ***@***.*** true} if at least ***@***.*** Runnable#run()} was called.
+ */
+ private boolean runExistingTasksFrom(Queue<Runnable> taskQueue) {
+ return taskQueue.offer(BOOKEND_TASK) ? runExistingTasksUntilBookend(taskQueue)
I was thinking the same as @carl-mastrangelo's earlier comment here - that the bookend could be easily avoided by just polling size() times. It would be cheaper and simpler, e.g. there's no need to deal specially with the bounded/full case. The JCTools impls won't involve any iteration when called from the consumer.
In transport-native-epoll/src/main/c/netty_epoll_native.c:
> @@ -186,6 +186,31 @@ static jint netty_epoll_native_epollCreate(JNIEnv* env, jclass clazz) {
return efd;
}
+static void netty_epoll_native_timerFdSetTime(JNIEnv* env, jclass clazz, jint timerFd, jint tvSec, jint tvNsec) {
Did you ever consider changing to use TFD_TIMER_ABSTIME? We could just measure/store the offset of System.nanoTime() from the system clock during initialization and then use deadlines rather than delays across the board to further avoid continual absolute/relative conversion.
In transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollEventLoop.java:
> + }
+
+ private void trySetTimerFd(long candidateNextDeadline) throws IOException {
+ for (;;) {
+ long nextDeadline = nextDeadlineNanos.get();
+ if (nextDeadline - candidateNextDeadline <= 0) {
+ break;
+ }
+ if (nextDeadlineNanos.compareAndSet(nextDeadline, candidateNextDeadline)) {
+ setTimerFd(deadlineToDelayNanos(candidateNextDeadline));
+ // We are setting the timerFd outside of the EventLoop so it is possible that we raced with another call
+ // to set the timer and temporarily increased the value, in which case we should set it back to the
+ // lower value.
+ nextDeadline = nextDeadlineNanos.get();
+ if (nextDeadline - candidateNextDeadline < 0) {
+ setTimerFd(deadlineToDelayNanos(nextDeadline));
I'm not sure that this fully addresses the race condition. I think access to setTimerFd probably needs to be synchronized.
In transport-native-epoll/src/test/java/io/netty/channel/epoll/EpollTest.java:
> - @test(timeout = 5000)
- public void testEpollWaitWithTimeOutMinusOne() throws Exception {
- final EpollEventArray eventArray = new EpollEventArray(8);
- try {
- final FileDescriptor epoll = Native.newEpollCreate();
- final FileDescriptor timerFd = Native.newTimerFd();
- final FileDescriptor eventfd = Native.newEventFd();
- Native.epollCtlAdd(epoll.intValue(), timerFd.intValue(), Native.EPOLLIN);
- Native.epollCtlAdd(epoll.intValue(), eventfd.intValue(), Native.EPOLLIN);
-
- final AtomicReference<Throwable> ref = new AtomicReference<Throwable>();
- Thread t = new Thread(new Runnable() {
- @OverRide
- public void run() {
- try {
- assertEquals(1, Native.epollWait(epoll, eventArray, timerFd, -1, -1));
Instead of removing this test why not just adjust it to use the new equivalent Native.epollWaitNoTimeout method?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@normanmaurer sure, stand by :) |
@normanmaurer see #9470 |
if (scheduledTask.deadlineNanos() <= nanoTime) { | ||
scheduledTaskQueue.remove(); | ||
ScheduledFutureTask<?> scheduledTask = scheduledTaskQueue.peek(); | ||
if (scheduledTask != null && scheduledTask.deadlineNanos() <= nanoTime) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace direct comparison using nanoTime with differences ie scheduledTask.deadlineNanos() - nanoTime <= 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Netty's ScheduledFutureTask offsets all time stamps and they are generally compared using equality for deadlines while giving up half of the domain (e.g. overflow earlier). I agree it is confusing relative to typical use of System.nanoTime()
and we may want to revisit how time duration/expiration is done in Netty 5 in this area.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"earlier" being after only 300 years, rather than 600 :)
Motivation The epoll transport was updated in netty#7834 to decouple setting of the timerFd from the event loop, so that scheduling delayed tasks does not require waking up epoll_wait. To achieve this, new overridable hooks were added in the AbstractScheduledEventExecutor and SingleThreadEventExecutor superclasses. However, the minimumDelayScheduledTaskRemoved hook has no current purpose and I can't envisage a _practical_ need for it. Removing it would reduce complexity and avoid supporting this specific API indefinitely. We can add something similar later if needed but the opposite is not true. There also isn't a _nice_ way to use the abstractions for wakeup-avoidance optimizations in other EventLoops that don't have a decoupled timer. This PR replaces executeScheduledRunnable and wakesUpForScheduledRunnable with two new methods before/afterFutureTaskScheduled that have slightly different semantics: - They only apply to additions; given the current internals there's no practical use for removals - They allow per-submission wakeup decisions via a boolean return val, which makes them easier to exploit from other existing EL impls (e.g. NIO/KQueue) - They are subjectively "cleaner", taking just the deadline parameter and not exposing Runnables - For current EL/queue impls, only the "after" hook is really needed, but specialized blocking queue impls can conditionally wake on task submission (I have one lined up) Also included are further optimization/simplification/fixes to the timerFd manipulation logic. Modifications - Remove AbstractScheduledEventExecutor#minimumDelayScheduledTaskRemoved() and supporting methods - Uplift NonWakeupRunnable and corresponding default wakesUpForTask() impl from SingleThreadEventLoop to SingleThreadEventExecutor - Change executeScheduledRunnable() to be package-private, and have a final impl in SingleThreadEventExecutor which triggers new overridable hooks before/afterFutureTaskScheduled() - Remove unnecessary use of bookend tasks while draining the task queue - Use new hooks to add simpler wake-up avoidance optimization to NioEventLoop (primarily to demonstrate utility/simplicity) - Reinstate removed EpollTest class In EpollEventLoop: - Refactor to use only the new afterFutureTaskScheduled() hook for updating timerFd - Fix setTimerFd race condition using a monitor - Set nextDeadlineNanos to a negative value while the EL is awake and use this to block timer changes from outside the EL. Restore the known-set value prior to sleeping, updating timerFd first if necessary - Don't read from timerFd when processing expiry event Result - Cleaner API for integrating with different EL/queue timing impls - Fixed race condition to avoid missing scheduled wakeups - Eliminate unnecessary timerFd updates while EL is awake, and unnecessary expired timerFd reads - Avoid unnecessary scheduled-task wakeups when using NIO transport I did not yet further explore the suggestion of using TFD_TIMER_ABSTIME for the timerFd.
Motivation The epoll transport was updated in #7834 to decouple setting of the timerFd from the event loop, so that scheduling delayed tasks does not require waking up epoll_wait. To achieve this, new overridable hooks were added in the AbstractScheduledEventExecutor and SingleThreadEventExecutor superclasses. However, the minimumDelayScheduledTaskRemoved hook has no current purpose and I can't envisage a _practical_ need for it. Removing it would reduce complexity and avoid supporting this specific API indefinitely. We can add something similar later if needed but the opposite is not true. There also isn't a _nice_ way to use the abstractions for wakeup-avoidance optimizations in other EventLoops that don't have a decoupled timer. This PR replaces executeScheduledRunnable and wakesUpForScheduledRunnable with two new methods before/afterFutureTaskScheduled that have slightly different semantics: - They only apply to additions; given the current internals there's no practical use for removals - They allow per-submission wakeup decisions via a boolean return val, which makes them easier to exploit from other existing EL impls (e.g. NIO/KQueue) - They are subjectively "cleaner", taking just the deadline parameter and not exposing Runnables - For current EL/queue impls, only the "after" hook is really needed, but specialized blocking queue impls can conditionally wake on task submission (I have one lined up) Also included are further optimization/simplification/fixes to the timerFd manipulation logic. Modifications - Remove AbstractScheduledEventExecutor#minimumDelayScheduledTaskRemoved() and supporting methods - Uplift NonWakeupRunnable and corresponding default wakesUpForTask() impl from SingleThreadEventLoop to SingleThreadEventExecutor - Change executeScheduledRunnable() to be package-private, and have a final impl in SingleThreadEventExecutor which triggers new overridable hooks before/afterFutureTaskScheduled() - Remove unnecessary use of bookend tasks while draining the task queue - Use new hooks to add simpler wake-up avoidance optimization to NioEventLoop (primarily to demonstrate utility/simplicity) - Reinstate removed EpollTest class In EpollEventLoop: - Refactor to use only the new afterFutureTaskScheduled() hook for updating timerFd - Fix setTimerFd race condition using a monitor - Set nextDeadlineNanos to a negative value while the EL is awake and use this to block timer changes from outside the EL. Restore the known-set value prior to sleeping, updating timerFd first if necessary - Don't read from timerFd when processing expiry event Result - Cleaner API for integrating with different EL/queue timing impls - Fixed race condition to avoid missing scheduled wakeups - Eliminate unnecessary timerFd updates while EL is awake, and unnecessary expired timerFd reads - Avoid unnecessary scheduled-task wakeups when using NIO transport I did not yet further explore the suggestion of using TFD_TIMER_ABSTIME for the timerFd.
Motivation:
EPOLL supports decoupling the timed wakeup mechanism from the selector call. The EPOLL transport takes advantage of this in order to offer more fine grained timer resolution. However we are current calling timerfd_settime on each call to epoll_wait and this is expensive. We don't have to re-arm the timer on every call to epoll_wait and instead only have to arm the timer when a task is scheduled with an earlier expiration than any other existing scheduled task.
Modifications:
duration is the soonest to expire, and if so update with timerfd_settime. We
also drain all the tasks at the end of the event loop to make sure we service
any expired tasks and get an accurate next time delay.
may delay the processing of scheduled tasks but it ensures we transfer all
pending tasks from the task queue to the scheduled priority queue to run the
soonest to expire scheduled task first.
Result:
Fixes #7829