Allow EmbeddedChannel users to control the scheduler time #12433

yawkat · 2022-05-30T12:14:20Z

It would be useful if EmbeddedChannel provided a way to manage the scheduler time (e.g. freeze or advance the nanoTime used internally). This would allow race-free testing of time-based algorithms.

I want to try working on this, this issue is to make sure it's not a stupid idea that will be rejected :)

Constraints are:

performance of normal (non-embedded) EventExecutor should not be affected.
compatibility with current EmbeddedEventLoop behavior must be retained.

Expected behavior

Tests using EmbeddedChannel should be able to (1) stop scheduled tasks from executing too early even if they're scheduled very tightly, and (2) execute future tasks earlier than their nominal deadline.

Actual behavior

Tests have to use Thread.sleep to wait for scheduled tasks to reach their deadline.

Netty version

4.1.77.Final

normanmaurer · 2022-06-09T12:54:24Z

sounds good... Do you want to contribute something ? I think this is very similar that what SwiftNIO does:

https://github.com/apple/swift-nio/blob/main/Sources/NIOEmbedded/Embedded.swift#L146

yawkat · 2022-06-09T14:27:29Z

Yep, I'll try implementing it. The swift-nio thing indeed sounds similar

Motivation: Tests using EmbeddedEventLoop can run faster if they can "advance time" so that scheduled tasks (e.g. timeouts) run earlier. Additionally, "freeze time" functionality can improve reliability of such tests. Modification: - Introduce a protected method `AbstractScheduledEventExecutor.getCurrentTimeNanos` that replaces the previous static `nanoTime` method (now deprecated). Replace usages of `nanoTime` with the new method. - Override `getCurrentTimeNanos` with the new time control (freeze, unfreeze, advanceBy) features in `EmbeddedEventLoop`. - Add a microbenchmark that tests one of the sites that seemed most likely to see negative performance impact by the change (`ScheduledFutureTask.delayNanos`). Result: Fixes #12433. Local runs of the `ScheduleFutureTaskBenchmark` microbenchmark shows no evidence for performance impact (within error bounds of each other): ``` before: Benchmark (num) Mode Cnt Score Error Units ScheduleFutureTaskBenchmark.scheduleCancelLotsOutsideLoop 100000 thrpt 20 132.437 ± 15.116 ops/s ScheduleFutureTaskBenchmark.scheduleLots 100000 thrpt 20 694.475 ± 8.184 ops/s ScheduleFutureTaskBenchmark.scheduleLotsOutsideLoop 100000 thrpt 20 88.037 ± 4.013 ops/s after: Benchmark (num) Mode Cnt Score Error Units ScheduleFutureTaskBenchmark.scheduleCancelLotsOutsideLoop 100000 thrpt 20 149.629 ± 7.514 ops/s ScheduleFutureTaskBenchmark.scheduleLots 100000 thrpt 20 688.954 ± 7.831 ops/s ScheduleFutureTaskBenchmark.scheduleLotsOutsideLoop 100000 thrpt 20 85.426 ± 1.104 ops/s ``` The new `ScheduleFutureTaskDeadlineBenchmark` shows some performance degradation: ``` before: Benchmark Mode Cnt Score Error Units ScheduleFutureTaskDeadlineBenchmark.requestDeadline thrpt 20 60726336.795 ± 280054.533 ops/s after: Benchmark Mode Cnt Score Error Units ScheduleFutureTaskDeadlineBenchmark.requestDeadline thrpt 20 56948231.480 ± 188264.092 ops/s ``` The difference is small, but it's there, so I investigated this further using jitwatch. Looking at the generated assembly, the call to `getCurrentTimeNanos` is devirtualized and inlined in the absence of `EmbeddedEventLoop`, so the code is mostly identical. However there is the added getfield and checkcast for the executor, which probably explains the discrepancy. In my opinion this is acceptable, because the performance impact is not severe, this use is likely the worst case (virtual call through `scheduledExecutorService()`), and it is never as hot as in this benchmark. Note that if an `EmbeddedEventLoop` is present in the application, the performance impact is likely substantially higher, because this would necessitate a virtual call. However this is not an issue for production applications, and the affected code is still not very hot. Co-authored-by: Norman Maurer <norman_maurer@apple.com>

Motivation: Tests using EmbeddedEventLoop can run faster if they can "advance time" so that scheduled tasks (e.g. timeouts) run earlier. Additionally, "freeze time" functionality can improve reliability of such tests. Modification: - Introduce a protected method `AbstractScheduledEventExecutor.getCurrentTimeNanos` that replaces the previous static `nanoTime` method (now deprecated). Replace usages of `nanoTime` with the new method. - Override `getCurrentTimeNanos` with the new time control (freeze, unfreeze, advanceBy) features in `EmbeddedEventLoop`. - Add a microbenchmark that tests one of the sites that seemed most likely to see negative performance impact by the change (`ScheduledFutureTask.delayNanos`). Result: Fixes #12433. Local runs of the `ScheduleFutureTaskBenchmark` microbenchmark shows no evidence for performance impact (within error bounds of each other): ``` before: Benchmark (num) Mode Cnt Score Error Units ScheduleFutureTaskBenchmark.scheduleCancelLotsOutsideLoop 100000 thrpt 20 132.437 ± 15.116 ops/s ScheduleFutureTaskBenchmark.scheduleLots 100000 thrpt 20 694.475 ± 8.184 ops/s ScheduleFutureTaskBenchmark.scheduleLotsOutsideLoop 100000 thrpt 20 88.037 ± 4.013 ops/s after: Benchmark (num) Mode Cnt Score Error Units ScheduleFutureTaskBenchmark.scheduleCancelLotsOutsideLoop 100000 thrpt 20 149.629 ± 7.514 ops/s ScheduleFutureTaskBenchmark.scheduleLots 100000 thrpt 20 688.954 ± 7.831 ops/s ScheduleFutureTaskBenchmark.scheduleLotsOutsideLoop 100000 thrpt 20 85.426 ± 1.104 ops/s ``` The new `ScheduleFutureTaskDeadlineBenchmark` shows some performance degradation: ``` before: Benchmark Mode Cnt Score Error Units ScheduleFutureTaskDeadlineBenchmark.requestDeadline thrpt 20 60726336.795 ± 280054.533 ops/s after: Benchmark Mode Cnt Score Error Units ScheduleFutureTaskDeadlineBenchmark.requestDeadline thrpt 20 56948231.480 ± 188264.092 ops/s ``` The difference is small, but it's there, so I investigated this further using jitwatch. Looking at the generated assembly, the call to `getCurrentTimeNanos` is devirtualized and inlined in the absence of `EmbeddedEventLoop`, so the code is mostly identical. However there is the added getfield and checkcast for the executor, which probably explains the discrepancy. In my opinion this is acceptable, because the performance impact is not severe, this use is likely the worst case (virtual call through `scheduledExecutorService()`), and it is never as hot as in this benchmark. Note that if an `EmbeddedEventLoop` is present in the application, the performance impact is likely substantially higher, because this would necessitate a virtual call. However this is not an issue for production applications, and the affected code is still not very hot. Co-authored-by: Norman Maurer <norman_maurer@apple.com> Co-authored-by: Jonas Konrad <me@yawk.at>

Motivation: Tests using EmbeddedEventLoop can run faster if they can "advance time" so that scheduled tasks (e.g. timeouts) run earlier. Additionally, "freeze time" functionality can improve reliability of such tests. Modification: - Introduce a protected method `AbstractScheduledEventExecutor.getCurrentTimeNanos` that replaces the previous static `nanoTime` method (now deprecated). Replace usages of `nanoTime` with the new method. - Override `getCurrentTimeNanos` with the new time control (freeze, unfreeze, advanceBy) features in `EmbeddedEventLoop`. - Add a microbenchmark that tests one of the sites that seemed most likely to see negative performance impact by the change (`ScheduledFutureTask.delayNanos`). Result: Fixes netty#12433. Local runs of the `ScheduleFutureTaskBenchmark` microbenchmark shows no evidence for performance impact (within error bounds of each other): ``` before: Benchmark (num) Mode Cnt Score Error Units ScheduleFutureTaskBenchmark.scheduleCancelLotsOutsideLoop 100000 thrpt 20 132.437 ± 15.116 ops/s ScheduleFutureTaskBenchmark.scheduleLots 100000 thrpt 20 694.475 ± 8.184 ops/s ScheduleFutureTaskBenchmark.scheduleLotsOutsideLoop 100000 thrpt 20 88.037 ± 4.013 ops/s after: Benchmark (num) Mode Cnt Score Error Units ScheduleFutureTaskBenchmark.scheduleCancelLotsOutsideLoop 100000 thrpt 20 149.629 ± 7.514 ops/s ScheduleFutureTaskBenchmark.scheduleLots 100000 thrpt 20 688.954 ± 7.831 ops/s ScheduleFutureTaskBenchmark.scheduleLotsOutsideLoop 100000 thrpt 20 85.426 ± 1.104 ops/s ``` The new `ScheduleFutureTaskDeadlineBenchmark` shows some performance degradation: ``` before: Benchmark Mode Cnt Score Error Units ScheduleFutureTaskDeadlineBenchmark.requestDeadline thrpt 20 60726336.795 ± 280054.533 ops/s after: Benchmark Mode Cnt Score Error Units ScheduleFutureTaskDeadlineBenchmark.requestDeadline thrpt 20 56948231.480 ± 188264.092 ops/s ``` The difference is small, but it's there, so I investigated this further using jitwatch. Looking at the generated assembly, the call to `getCurrentTimeNanos` is devirtualized and inlined in the absence of `EmbeddedEventLoop`, so the code is mostly identical. However there is the added getfield and checkcast for the executor, which probably explains the discrepancy. In my opinion this is acceptable, because the performance impact is not severe, this use is likely the worst case (virtual call through `scheduledExecutorService()`), and it is never as hot as in this benchmark. Note that if an `EmbeddedEventLoop` is present in the application, the performance impact is likely substantially higher, because this would necessitate a virtual call. However this is not an issue for production applications, and the affected code is still not very hot. Co-authored-by: Norman Maurer <norman_maurer@apple.com>

yawkat mentioned this issue Jun 10, 2022

Allow controlling time flow for EmbeddedEventLoop #12459

Merged

normanmaurer closed this as completed in #12459 Jun 13, 2022

normanmaurer added this to the 4.1.78.Final milestone Jun 13, 2022

normanmaurer mentioned this issue Jun 13, 2022

Allow controlling time flow for EmbeddedEventLoop (#12459) #12464

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow EmbeddedChannel users to control the scheduler time #12433

Allow EmbeddedChannel users to control the scheduler time #12433

yawkat commented May 30, 2022

normanmaurer commented Jun 9, 2022

yawkat commented Jun 9, 2022

Allow EmbeddedChannel users to control the scheduler time #12433

Allow EmbeddedChannel users to control the scheduler time #12433

Comments

yawkat commented May 30, 2022

Expected behavior

Actual behavior

Netty version

normanmaurer commented Jun 9, 2022

yawkat commented Jun 9, 2022