EPOLL - decouple schedule tasks from epoll_wait life cycle #7834

Scottmitch · 2018-04-03T03:02:40Z

Motivation:
EPOLL supports decoupling the timed wakeup mechanism from the selector call. The EPOLL transport takes advantage of this in order to offer more fine grained timer resolution. However we are current calling timerfd_settime on each call to epoll_wait and this is expensive. We don't have to re-arm the timer on every call to epoll_wait and instead only have to arm the timer when a task is scheduled with an earlier expiration than any other existing scheduled task.

Modifications:

Before scheduled tasks are added to the task queue, we determine if the new
duration is the soonest to expire, and if so update with timerfd_settime. We
also drain all the tasks at the end of the event loop to make sure we service
any expired tasks and get an accurate next time delay.
EpollEventLoop maintains a volatile variable which represents the next deadline to expire. This variable is modified inside the event loop thread (before calling epoll_wait) and out side the event loop thread (immediately to ensure proper wakeup time).
Execute the task queue before the schedule task priority queue. This means we
may delay the processing of scheduled tasks but it ensures we transfer all
pending tasks from the task queue to the scheduled priority queue to run the
soonest to expire scheduled task first.
Deprecate IORatio on EpollEventLoop, and drain the executor and scheduled queue on each event loop wakeup. Coupling the amount of time we are allowed to drain the executor queue to a proportion of time we process inbound IO may lead to unbounded queue sizes and unpredictable latency.

Result:
Fixes #7829

In most cases this results in less calls to timerfd_settime
Less event loop wakeups just to check for scheduled tasks executed outside the event loop
More predictable executor queue and scheduled task queue draining
More accurate and responsive scheduled task execution

Scottmitch · 2018-04-03T03:06:05Z

more testing / analysis is required, but figured I would push this upstream to see if everyone is on board.

@carl-mastrangelo - would you mind running the same benchmarks you ran for PR #7816?

normanmaurer · 2018-04-03T06:25:01Z

common/src/main/java/io/netty/util/concurrent/ScheduledTask.java

+/**
+ * Represents a task that is scheduled to expire some time in the future.
+ */
+public interface ScheduledTask {


Can't we just use Delayed ?

https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Delayed.html

We want to get the deadline, not the delay.

@Scottmitch that said should we just make this part of the Abstract base class and mark as unstable ?

What abstract base class? The issue is we need it to be accessible outside the package (PromiseTask and ScheduledFutureTask are package private). I considered putting a protected interface in AbstractScheduledEventExecutor and can do this to reduce the exposure of this API. I will push this change and we can discuss more if necessary.

On a related note ... the relationship between ScheduledFutureTask and AbstractScheduledEventExecutor is very tightly coupled. I would like to tease these two apart in general as there a assumptions needed about using the "correct" nanoTime that may be made less "leaky" for the next major release (I added something to the doc).

transport/src/main/java/io/netty/channel/SingleThreadEventLoop.java

normanmaurer · 2018-04-03T06:32:22Z

transport-native-epoll/src/main/java/io/netty/channel/epoll/Native.java

@@ -102,23 +102,23 @@ public static FileDescriptor newTimerFd() {
    public static native void eventFdWrite(int fd, long value);
    public static native void eventFdRead(int fd);
    static native void timerFdRead(int fd);
+    static native void timerFdSetTime(int fd, int sec, int nsec);


This should have a throws as it may throw a ClosedChannelException

Good catch (its actually ChannelException at the moment).

However this is a runtime exception ... would you prefer a IOException using the errno like in epollWait?

yeah I think this would be better (using NativeIOException)

I just went with IOException for now ... avoids additional conditional checking and the string description of the error is already included in the IoExecution thrown by the JNI code.

normanmaurer · 2018-04-03T06:41:25Z

transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollEventLoop.java

+     * with the time source (e.g. calling System.nanoTime()) which can be expensive.
+     */
+    private final AtomicLong nextDeadlineNanos = new AtomicLong(MAXIMUM_DEADLINE);
+    private final AtomicInteger wakenUp = new AtomicInteger();


Just use an AtomicBoolean as we use it as a boolean.

I considered this, however it involves additional conditional behind the scenes. Since this is internal I think we can just use AtomicInteger.

I also moved away from the AtomicFieldUpdater because:

the excepted number of instances for this class is low

the atomic field updaters have an additional "instance check" which can be avoided.

normanmaurer · 2018-04-03T06:44:51Z

@Scottmitch Thanks for this... in general this looks ok. That said I would be interested to see how it compares to #7816 and if it worth it compared to the "easier" solution that was used in #7816

Scottmitch · 2018-04-03T16:31:26Z

Agreed more analysis is necessary. This PR does a lot of things PR #7816 doesn't do, so lets see if I can quantify it some how :)

carl-mastrangelo · 2018-04-03T22:48:47Z

common/src/main/java/io/netty/util/concurrent/ScheduledFutureTask.java

+    }
+
+    static long deadlineToDelayNanos(long deadlineNanos) {
+        return Math.max(0, deadlineNanos - nanoTime());


nit: I think you need to allow negative values here. If you were sorting task based on their delay rather than their deadline, you still need to keep track of which task is most tardy.

I agree this could likely be changed, but is pre-existing behavior and doesn't impact this PR. Note that we are sorting, and determining "next to expire", based upon deadline (not delay).

Lets discuss in a followup issue: #7840

common/src/main/java/io/netty/util/concurrent/SingleThreadEventExecutor.java

transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollEventLoop.java

carl-mastrangelo

LGTM, no other concerns from me.

Scottmitch · 2018-04-05T16:31:26Z

@carl-mastrangelo - thanks for review! would you be able to run the same benchmark you did in PR #7816? I'm also planning on doing some analysis, but the more data points the better.

trustin · 2018-04-19T06:43:20Z

Sorry I didn't have a chance to review your PR yet, @Scottmitch. Will find some time soon.

M-AJ · 2018-04-19T23:50:10Z

How can we get this artifact jar or artificatId for a maven dependency?

normanmaurer · 2018-04-20T05:48:51Z

@M-AJ I can build one for you if you are interested otherwise you would just apply the patch and build it yourself.

M-AJ · 2018-04-20T15:16:51Z

@normanmaurer If you can build one, that'd be great! Really appreciate it! I've tried building it locally but the process takes a while for me.

Scottmitch · 2018-04-21T17:41:52Z

I just rebased to make getting recent fixes easier ... sorry super busy at the moment ... this is still on the backlog and hopefully will get to this soon. However it would be much appreciated if others could also try this out though 👍

normanmaurer · 2018-04-23T07:28:41Z

@M-AJ you want a netty-all jar or just the epoll jar ?

M-AJ · 2018-04-23T14:35:23Z

@normanmaurer netty-all would be perfect. Appreciate it!

M-AJ · 2018-04-28T17:20:36Z

Hey @normanmaurer, any update on the build? Can't wait to try it out and see how it performs!

normanmaurer · 2018-05-03T13:11:51Z

Sorry it took me some time :(

Try this one:
https://drive.google.com/open?id=1S4Ro2xJu0Mf2NepbaFR4zM75wKXdMhDx

Scottmitch · 2019-04-11T20:16:54Z

@normanmaurer - I updated the commit message. let me know if it makes more sense now.

trustin · 2019-04-15T05:23:07Z

common/src/main/java/io/netty/util/concurrent/AbstractScheduledEventExecutor.java

+        // is being shutdown, so the timerFd and associated polling mechanism will be destroyed anyways.
+    }
+
+    protected final ScheduledTask peekScheduledTaskDelayNanos() {


Missing Javadoc?

trustin · 2019-04-15T05:24:34Z

common/src/main/java/io/netty/util/concurrent/AbstractScheduledEventExecutor.java

+         * Get the time stamp relative to some monotonically increasing time source for which this {@link ScheduledTask}
+         * expires.
+         * @return the time stamp relative to some monotonically increasing time source for which this
+         * {@link ScheduledTask} expires.


nit: indentation

trustin · 2019-04-15T05:25:39Z

common/src/main/java/io/netty/util/concurrent/AbstractScheduledEventExecutor.java

+    protected interface ScheduledTaskRunnable extends Runnable, ScheduledTask {
+        /**
+         * Determine if this object is the result of adding or removing a scheduled task.
+         * @return {@code true} if this object is the result of adding or removing a scheduled task.


Doesn't this method return false on removal?

trustin · 2019-04-15T05:30:00Z

common/src/main/java/io/netty/util/concurrent/AbstractScheduledEventExecutor.java

+     * occur on the event loop thread do not interact with this method.
+     * @param runnable The {@link ScheduledTaskRunnable} to execute.
+     */
+    protected void executeScheduledRunnable(ScheduledTaskRunnable runnable) {


I'm curious if the parameter type of this method can be widen to ScheduledTask or even Runnable. If so, we could hide ScheduledTaskRunnable (and even ScheduledTask) completely from the public API, i.e. less maintenance burden.

If a subclass needs the information like deadline, we could just add it as an additional parameter.

The way this suggestion was implemented makes things more complex IMHO. We are trying to communicate from here that the type of task is for scheduled execution, and this information is used conditionally at higher layers (e.g. should we wake up). This has been changed to introduce 3 methods (e.g. beforeScheduledTaskSubmitted, afterScheduledTaskSubmitted, wakesUpForScheduledRunnable) and 1 additional type is exposed NonWakeupRunnable publicly which may require conditional allocation/wrapping at higher layers. IOW we are losing information by just allocating a Runnable here and have to compensate by complicating method signatures and type hierarchy at other layers.

I'll submit a followup PR (since this plus an additional PR was recently reverted) so we can take a look. /cc @normanmaurer @njhill

Thanks @Scottmitch. I gave this quite a lot of thought so would be good to discuss more. A few initial responses though:

The prior communication depended on the notion of an intermediate task which happens to involve enqueing some other scheduled task onto the internal scheduled-task queue (i.e. not the scheduled task itself), which is something we should not be exposing at all imho... that seems like an internal detail which EL impls shouldn't be aware of.

I don't see wakesUpForScheduledRunnable that you referenced, do you mean wakesUpForRunnable or executeScheduledRunnable?

The only reason I left wakesUpForRunnable there is that it's protected and so would be a breaking change to remove it, I'd suggest that we deprecate it. A public NonWakeupRunnable interface serves a similar purpose but makes much more sense imho. Whether a given task is required to be run immediately is in general a property of that task and should be independent of the executor/EL implementation. We could consider a lazyExecute(Runnable) method in addition to this (might be be nice to have on the EventExecutor interface in netty 5).

I actually think we should remove executeScheduledRunnable altogether (possible since it's package-private), and instead just override schedule(...) in SingleThreadEventExecutor. This would further simplify things and avoid the double Runnable wrapping. Not sure it's also what you had in mind but it's a small change, I'll also push an example of that

Only the changes to EpollEventLoop itself were reverted for the release, the reworked superclass abstractions are still there. I'm still very interested to find out what the issues were ... @normanmaurer any luck with a reproducer or more clues? :)

@Scottmitch I made the changes alluded to above here 3923176 but took things a bit further still, uplifting the before/after hooks to AbstractScheduledEventExecutor and introducing higher level idea of lazyExecute / LazyRunnable which could have general applicability.

I really think this is both simpler to exploit and more generally useful for subclasses (i.e. event loop impls).

trustin · 2019-04-15T05:30:26Z

common/src/main/java/io/netty/util/concurrent/AbstractScheduledEventExecutor.java

+    /**
+     * The next task to expire (e.g. minimum delay) has been removed from the scheduled priority queue.
+     */
+    protected void minimumDelayScheduledTaskRemoved(@SuppressWarnings("unused") ScheduledTask task) {


Ditto - can ScheduledTask be widen to Runnable?

normanmaurer · 2019-07-02T10:25:14Z

@netty-bot test this please

Motivation: EPOLL supports decoupling the timed wakeup mechanism from the selector call. The EPOLL transport takes advantage of this in order to offer more fine grained timer resolution. However we are current calling timerfd_settime on each call to epoll_wait and this is expensive. We don't have to re-arm the timer on every call to epoll_wait and instead only have to arm the timer when a task is scheduled with an earlier expiration than any other existing scheduled task. Modifications: - Before scheduled tasks are added to the task queue, we determine if the new duration is the soonest to expire, and if so update with timerfd_settime. We also drain all the tasks at the end of the event loop to make sure we service any expired tasks and get an accurate next time delay. - EpollEventLoop maintains a volatile variable which represents the next deadline to expire. This variable is modified inside the event loop thread (before calling epoll_wait) and out side the event loop thread (immediately to ensure proper wakeup time). - Execute the task queue before the schedule task priority queue. This means we may delay the processing of scheduled tasks but it ensures we transfer all pending tasks from the task queue to the scheduled priority queue to run the soonest to expire scheduled task first. - Deprecate IORatio on EpollEventLoop, and drain the executor and scheduled queue on each event loop wakeup. Coupling the amount of time we are allowed to drain the executor queue to a proportion of time we process inbound IO may lead to unbounded queue sizes and unpredictable latency. Result: Fixes netty#7829 - In most cases this results in less calls to timerfd_settime - Less event loop wakeups just to check for scheduled tasks executed outside the event loop - More predictable executor queue and scheduled task queue draining - More accurate and responsive scheduled task execution

normanmaurer · 2019-08-14T08:10:43Z

This is sitting here for a long time now... Let me merge this.

njhill

@Scottmitch @normanmaurer apologies for the belated review of this. I think the general change is great, as well as much of the impl.

I made a few comments inline but in general have some reservations regarding the new abstractions exposed by the superclasses to allow EventLoop impls to integrate with the common timed-task scheduling logic.

I'll put more detail as a comment directly in the PR but my immediate concern is that once these APIs ship we can't easily change them, and I think they could be simpler.

njhill · 2019-07-02T21:48:17Z

transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollEventLoop.java

                // Just ignore as we use ET mode for the eventfd and timerfd.
                //
                // See also https://stackoverflow.com/a/12492308/1074097
+            } else if (fd == timerFd.intValue()) {
+                // consume wakeup event, necessary because the timer is added with ET mode.
+                Native.timerFdRead(fd);


May be my misunderstanding, but wouldn't ET imply this read isn't needed?

njhill · 2019-07-03T18:47:18Z

common/src/main/java/io/netty/util/concurrent/SingleThreadEventExecutor.java

+     * @return {@code true} if at least {@link Runnable#run()} was called.
+     */
+    private boolean runExistingTasksFrom(Queue<Runnable> taskQueue) {
+        return taskQueue.offer(BOOKEND_TASK) ? runExistingTasksUntilBookend(taskQueue)


I was thinking the same as @carl-mastrangelo's earlier comment here - that the bookend could be easily avoided by just polling size() times. It would be cheaper and simpler, e.g. there's no need to deal specially with the bounded/full case. The JCTools impls won't involve any iteration when called from the consumer.

good idea! calling from consumer thread for MPSC shouldn't require looping.

njhill · 2019-07-03T19:24:05Z

transport-native-epoll/src/main/c/netty_epoll_native.c

@@ -186,6 +186,31 @@ static jint netty_epoll_native_epollCreate(JNIEnv* env, jclass clazz) {
    return efd;
 }

+static void netty_epoll_native_timerFdSetTime(JNIEnv* env, jclass clazz, jint timerFd, jint tvSec, jint tvNsec) {


Did you ever consider changing to use TFD_TIMER_ABSTIME? We could just measure/store the offset of System.nanoTime() from the system clock during initialization and then use deadlines rather than delays across the board to further avoid continual absolute/relative conversion.

No I didn't consider this.

njhill · 2019-07-11T01:44:21Z

transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollEventLoop.java

+                // lower value.
+                nextDeadline = nextDeadlineNanos.get();
+                if (nextDeadline - candidateNextDeadline < 0) {
+                    setTimerFd(deadlineToDelayNanos(nextDeadline));


I'm not sure that this fully addresses the race condition. I think access to setTimerFd probably needs to be synchronized.

njhill · 2019-08-15T19:03:14Z

transport-native-epoll/src/test/java/io/netty/channel/epoll/EpollTest.java

-                @Override
-                public void run() {
-                    try {
-                        assertEquals(1, Native.epollWait(epoll, eventArray, timerFd, -1, -1));


Instead of removing this test why not just adjust it to use the new equivalent Native.epollWaitNoTimeout method?

normanmaurer · 2019-08-15T20:54:41Z

Want to do a PR to show the ideas ?

…

Am 15.08.2019 um 22:48 schrieb Nick Hill ***@***.***>: @njhill commented on this pull request. @Scottmitch @normanmaurer apologies for the belated review of this. I think the general change is great, as well as much of the impl. I made a few comments inline but in general have some reservations regarding the new abstractions exposed by the superclasses to allow EventLoop impls to integrate with the common timed-task scheduling logic. I'll put more detail as a comment directly in the PR but my immediate concern is that once these APIs ship we can't easily change them, and I think they could be simpler. In transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollEventLoop.java: > // Just ignore as we use ET mode for the eventfd and timerfd. // // See also https://stackoverflow.com/a/12492308/1074097 + } else if (fd == timerFd.intValue()) { + // consume wakeup event, necessary because the timer is added with ET mode. + Native.timerFdRead(fd); May be my misunderstanding, but wouldn't ET imply this read isn't needed? In common/src/main/java/io/netty/util/concurrent/SingleThreadEventExecutor.java: > @@ -397,6 +455,55 @@ protected final boolean runAllTasksFrom(Queue<Runnable> taskQueue) { } } + /** + * What ever tasks are present in ***@***.*** taskQueue} when this method is invoked will be ***@***.*** Runnable#run()}. + * @param taskQueue the task queue to drain. + * @return ***@***.*** true} if at least ***@***.*** Runnable#run()} was called. + */ + private boolean runExistingTasksFrom(Queue<Runnable> taskQueue) { + return taskQueue.offer(BOOKEND_TASK) ? runExistingTasksUntilBookend(taskQueue) I was thinking the same as @carl-mastrangelo's earlier comment here - that the bookend could be easily avoided by just polling size() times. It would be cheaper and simpler, e.g. there's no need to deal specially with the bounded/full case. The JCTools impls won't involve any iteration when called from the consumer. In transport-native-epoll/src/main/c/netty_epoll_native.c: > @@ -186,6 +186,31 @@ static jint netty_epoll_native_epollCreate(JNIEnv* env, jclass clazz) { return efd; } +static void netty_epoll_native_timerFdSetTime(JNIEnv* env, jclass clazz, jint timerFd, jint tvSec, jint tvNsec) { Did you ever consider changing to use TFD_TIMER_ABSTIME? We could just measure/store the offset of System.nanoTime() from the system clock during initialization and then use deadlines rather than delays across the board to further avoid continual absolute/relative conversion. In transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollEventLoop.java: > + } + + private void trySetTimerFd(long candidateNextDeadline) throws IOException { + for (;;) { + long nextDeadline = nextDeadlineNanos.get(); + if (nextDeadline - candidateNextDeadline <= 0) { + break; + } + if (nextDeadlineNanos.compareAndSet(nextDeadline, candidateNextDeadline)) { + setTimerFd(deadlineToDelayNanos(candidateNextDeadline)); + // We are setting the timerFd outside of the EventLoop so it is possible that we raced with another call + // to set the timer and temporarily increased the value, in which case we should set it back to the + // lower value. + nextDeadline = nextDeadlineNanos.get(); + if (nextDeadline - candidateNextDeadline < 0) { + setTimerFd(deadlineToDelayNanos(nextDeadline)); I'm not sure that this fully addresses the race condition. I think access to setTimerFd probably needs to be synchronized. In transport-native-epoll/src/test/java/io/netty/channel/epoll/EpollTest.java: > - @test(timeout = 5000) - public void testEpollWaitWithTimeOutMinusOne() throws Exception { - final EpollEventArray eventArray = new EpollEventArray(8); - try { - final FileDescriptor epoll = Native.newEpollCreate(); - final FileDescriptor timerFd = Native.newTimerFd(); - final FileDescriptor eventfd = Native.newEventFd(); - Native.epollCtlAdd(epoll.intValue(), timerFd.intValue(), Native.EPOLLIN); - Native.epollCtlAdd(epoll.intValue(), eventfd.intValue(), Native.EPOLLIN); - - final AtomicReference<Throwable> ref = new AtomicReference<Throwable>(); - Thread t = new Thread(new Runnable() { - @OverRide - public void run() { - try { - assertEquals(1, Native.epollWait(epoll, eventArray, timerFd, -1, -1)); Instead of removing this test why not just adjust it to use the new equivalent Native.epollWaitNoTimeout method? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

njhill · 2019-08-15T20:56:06Z

@normanmaurer sure, stand by :)

njhill · 2019-08-16T01:14:44Z

@normanmaurer see #9470

franz1981 · 2019-08-16T07:46:21Z

common/src/main/java/io/netty/util/concurrent/AbstractScheduledEventExecutor.java

-        if (scheduledTask.deadlineNanos() <= nanoTime) {
-            scheduledTaskQueue.remove();
+        ScheduledFutureTask<?> scheduledTask = scheduledTaskQueue.peek();
+        if (scheduledTask != null && scheduledTask.deadlineNanos() <= nanoTime) {


Replace direct comparison using nanoTime with differences ie scheduledTask.deadlineNanos() - nanoTime <= 0

Netty's ScheduledFutureTask offsets all time stamps and they are generally compared using equality for deadlines while giving up half of the domain (e.g. overflow earlier). I agree it is confusing relative to typical use of System.nanoTime() and we may want to revisit how time duration/expiration is done in Netty 5 in this area.

"earlier" being after only 300 years, rather than 600 :)

Motivation The epoll transport was updated in netty#7834 to decouple setting of the timerFd from the event loop, so that scheduling delayed tasks does not require waking up epoll_wait. To achieve this, new overridable hooks were added in the AbstractScheduledEventExecutor and SingleThreadEventExecutor superclasses. However, the minimumDelayScheduledTaskRemoved hook has no current purpose and I can't envisage a _practical_ need for it. Removing it would reduce complexity and avoid supporting this specific API indefinitely. We can add something similar later if needed but the opposite is not true. There also isn't a _nice_ way to use the abstractions for wakeup-avoidance optimizations in other EventLoops that don't have a decoupled timer. This PR replaces executeScheduledRunnable and wakesUpForScheduledRunnable with two new methods before/afterFutureTaskScheduled that have slightly different semantics: - They only apply to additions; given the current internals there's no practical use for removals - They allow per-submission wakeup decisions via a boolean return val, which makes them easier to exploit from other existing EL impls (e.g. NIO/KQueue) - They are subjectively "cleaner", taking just the deadline parameter and not exposing Runnables - For current EL/queue impls, only the "after" hook is really needed, but specialized blocking queue impls can conditionally wake on task submission (I have one lined up) Also included are further optimization/simplification/fixes to the timerFd manipulation logic. Modifications - Remove AbstractScheduledEventExecutor#minimumDelayScheduledTaskRemoved() and supporting methods - Uplift NonWakeupRunnable and corresponding default wakesUpForTask() impl from SingleThreadEventLoop to SingleThreadEventExecutor - Change executeScheduledRunnable() to be package-private, and have a final impl in SingleThreadEventExecutor which triggers new overridable hooks before/afterFutureTaskScheduled() - Remove unnecessary use of bookend tasks while draining the task queue - Use new hooks to add simpler wake-up avoidance optimization to NioEventLoop (primarily to demonstrate utility/simplicity) - Reinstate removed EpollTest class In EpollEventLoop: - Refactor to use only the new afterFutureTaskScheduled() hook for updating timerFd - Fix setTimerFd race condition using a monitor - Set nextDeadlineNanos to a negative value while the EL is awake and use this to block timer changes from outside the EL. Restore the known-set value prior to sleeping, updating timerFd first if necessary - Don't read from timerFd when processing expiry event Result - Cleaner API for integrating with different EL/queue timing impls - Fixed race condition to avoid missing scheduled wakeups - Eliminate unnecessary timerFd updates while EL is awake, and unnecessary expired timerFd reads - Avoid unnecessary scheduled-task wakeups when using NIO transport I did not yet further explore the suggestion of using TFD_TIMER_ABSTIME for the timerFd.

Motivation The epoll transport was updated in #7834 to decouple setting of the timerFd from the event loop, so that scheduling delayed tasks does not require waking up epoll_wait. To achieve this, new overridable hooks were added in the AbstractScheduledEventExecutor and SingleThreadEventExecutor superclasses. However, the minimumDelayScheduledTaskRemoved hook has no current purpose and I can't envisage a _practical_ need for it. Removing it would reduce complexity and avoid supporting this specific API indefinitely. We can add something similar later if needed but the opposite is not true. There also isn't a _nice_ way to use the abstractions for wakeup-avoidance optimizations in other EventLoops that don't have a decoupled timer. This PR replaces executeScheduledRunnable and wakesUpForScheduledRunnable with two new methods before/afterFutureTaskScheduled that have slightly different semantics: - They only apply to additions; given the current internals there's no practical use for removals - They allow per-submission wakeup decisions via a boolean return val, which makes them easier to exploit from other existing EL impls (e.g. NIO/KQueue) - They are subjectively "cleaner", taking just the deadline parameter and not exposing Runnables - For current EL/queue impls, only the "after" hook is really needed, but specialized blocking queue impls can conditionally wake on task submission (I have one lined up) Also included are further optimization/simplification/fixes to the timerFd manipulation logic. Modifications - Remove AbstractScheduledEventExecutor#minimumDelayScheduledTaskRemoved() and supporting methods - Uplift NonWakeupRunnable and corresponding default wakesUpForTask() impl from SingleThreadEventLoop to SingleThreadEventExecutor - Change executeScheduledRunnable() to be package-private, and have a final impl in SingleThreadEventExecutor which triggers new overridable hooks before/afterFutureTaskScheduled() - Remove unnecessary use of bookend tasks while draining the task queue - Use new hooks to add simpler wake-up avoidance optimization to NioEventLoop (primarily to demonstrate utility/simplicity) - Reinstate removed EpollTest class In EpollEventLoop: - Refactor to use only the new afterFutureTaskScheduled() hook for updating timerFd - Fix setTimerFd race condition using a monitor - Set nextDeadlineNanos to a negative value while the EL is awake and use this to block timer changes from outside the EL. Restore the known-set value prior to sleeping, updating timerFd first if necessary - Don't read from timerFd when processing expiry event Result - Cleaner API for integrating with different EL/queue timing impls - Fixed race condition to avoid missing scheduled wakeups - Eliminate unnecessary timerFd updates while EL is awake, and unnecessary expired timerFd reads - Avoid unnecessary scheduled-task wakeups when using NIO transport I did not yet further explore the suggestion of using TFD_TIMER_ABSTIME for the timerFd.

Scottmitch self-assigned this Apr 3, 2018

Scottmitch requested review from trustin, normanmaurer and carl-mastrangelo April 3, 2018 03:02

Scottmitch mentioned this pull request Apr 3, 2018

Improve Timer Wakeup for Native Transports #7829

Closed

normanmaurer reviewed Apr 3, 2018

View reviewed changes

transport/src/main/java/io/netty/channel/SingleThreadEventLoop.java Outdated Show resolved Hide resolved

normanmaurer reviewed Apr 3, 2018

View reviewed changes

carl-mastrangelo reviewed Apr 3, 2018

View reviewed changes

Scottmitch mentioned this pull request Apr 4, 2018

ScheduledFutureTask#delayNanos doesn't return negative values #7840

Open

Scottmitch requested review from normanmaurer and carl-mastrangelo April 4, 2018 02:22

carl-mastrangelo approved these changes Apr 5, 2018

View reviewed changes

Scottmitch force-pushed the decouple_timer_and_poll branch from 659ab6c to b363996 Compare April 5, 2018 16:34

Scottmitch force-pushed the decouple_timer_and_poll branch from b363996 to 5e74a0d Compare April 21, 2018 17:39

Scottmitch force-pushed the decouple_timer_and_poll branch from 5e74a0d to 9c380c4 Compare April 24, 2018 14:41

trustin reviewed Apr 15, 2019

View reviewed changes

Scottmitch mentioned this pull request Apr 29, 2019

Avoid unnecessary compareAndSet in EventLoop.wakeup implementations #9109

Closed

njhill mentioned this pull request Jun 21, 2019

POC: Tag-team event loop #9265

Closed

normanmaurer force-pushed the decouple_timer_and_poll branch from d23b852 to dc2cddf Compare July 2, 2019 12:14

Scottmitch added 2 commits July 2, 2019 14:15

add comments regarding null / BOOKEND_TASK

834763e

normanmaurer force-pushed the decouple_timer_and_poll branch 2 times, most recently from dc4ceba to 834763e Compare July 2, 2019 13:34

Address comments of trustin

60a519a

normanmaurer merged commit 1fa7a5e into netty:4.1 Aug 14, 2019

normanmaurer added this to the 4.1.40.Final milestone Aug 14, 2019

njhill reviewed Aug 15, 2019

View reviewed changes

njhill mentioned this pull request Aug 16, 2019

Simplify EventLoop abstractions for timed scheduled tasks #9470

Merged

franz1981 reviewed Aug 16, 2019

View reviewed changes

normanmaurer removed this from the 4.1.40.Final milestone Sep 12, 2019

njhill mentioned this pull request Sep 21, 2019

Externalize lazy execution semantic for EventExecutors #9587

Merged

njhill mentioned this pull request Oct 15, 2019

Reinstate decoupled timerfd logic in EpollEventLoop #9590

Closed

Scottmitch deleted the decouple_timer_and_poll branch January 9, 2020 23:12

lawnight mentioned this pull request Jul 19, 2021

epollEventLoop no way to set ioRatio #11493

Closed

EPOLL - decouple schedule tasks from epoll_wait life cycle #7834

EPOLL - decouple schedule tasks from epoll_wait life cycle #7834

Conversation

Scottmitch commented Apr 3, 2018 • edited

Scottmitch commented Apr 3, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

normanmaurer commented Apr 3, 2018

Scottmitch commented Apr 3, 2018

Choose a reason for hiding this comment

Scottmitch Apr 3, 2018 • edited

Choose a reason for hiding this comment

carl-mastrangelo left a comment

Choose a reason for hiding this comment

Scottmitch commented Apr 5, 2018

trustin commented Apr 19, 2018

M-AJ commented Apr 19, 2018

normanmaurer commented Apr 20, 2018

M-AJ commented Apr 20, 2018

Scottmitch commented Apr 21, 2018 • edited

normanmaurer commented Apr 23, 2018

M-AJ commented Apr 23, 2018

M-AJ commented Apr 28, 2018

normanmaurer commented May 3, 2018

Scottmitch commented Apr 11, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

normanmaurer commented Jul 2, 2019

normanmaurer commented Aug 14, 2019

njhill left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

normanmaurer commented Aug 15, 2019 via email

njhill commented Aug 15, 2019

njhill commented Aug 16, 2019

Choose a reason for hiding this comment

Scottmitch Sep 17, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Scottmitch commented Apr 3, 2018 •

edited

Scottmitch Apr 3, 2018 •

edited

Scottmitch commented Apr 21, 2018 •

edited

Scottmitch Sep 17, 2019 •

edited