Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow EmbeddedChannel users to control the scheduler time #12433

Closed
yawkat opened this issue May 30, 2022 · 2 comments · Fixed by #12459
Closed

Allow EmbeddedChannel users to control the scheduler time #12433

yawkat opened this issue May 30, 2022 · 2 comments · Fixed by #12459
Milestone

Comments

@yawkat
Copy link
Contributor

yawkat commented May 30, 2022

It would be useful if EmbeddedChannel provided a way to manage the scheduler time (e.g. freeze or advance the nanoTime used internally). This would allow race-free testing of time-based algorithms.

I want to try working on this, this issue is to make sure it's not a stupid idea that will be rejected :)

Constraints are:

  • performance of normal (non-embedded) EventExecutor should not be affected.
  • compatibility with current EmbeddedEventLoop behavior must be retained.

Expected behavior

Tests using EmbeddedChannel should be able to (1) stop scheduled tasks from executing too early even if they're scheduled very tightly, and (2) execute future tasks earlier than their nominal deadline.

Actual behavior

Tests have to use Thread.sleep to wait for scheduled tasks to reach their deadline.

Netty version

4.1.77.Final

@normanmaurer
Copy link
Member

sounds good... Do you want to contribute something ? I think this is very similar that what SwiftNIO does:

https://github.com/apple/swift-nio/blob/main/Sources/NIOEmbedded/Embedded.swift#L146

@yawkat
Copy link
Contributor Author

yawkat commented Jun 9, 2022

Yep, I'll try implementing it. The swift-nio thing indeed sounds similar

@normanmaurer normanmaurer added this to the 4.1.78.Final milestone Jun 13, 2022
normanmaurer added a commit that referenced this issue Jun 13, 2022
Motivation:

Tests using EmbeddedEventLoop can run faster if they can "advance time" so that scheduled tasks (e.g. timeouts) run earlier. Additionally, "freeze time" functionality can improve reliability of such tests.

Modification:

- Introduce a protected method `AbstractScheduledEventExecutor.getCurrentTimeNanos` that replaces the previous static `nanoTime` method (now deprecated). Replace usages of `nanoTime` with the new method.
- Override `getCurrentTimeNanos` with the new time control (freeze, unfreeze, advanceBy) features in `EmbeddedEventLoop`.
- Add a microbenchmark that tests one of the sites that seemed most likely to see negative performance impact by the change (`ScheduledFutureTask.delayNanos`).

Result:

Fixes  #12433. 

Local runs of the `ScheduleFutureTaskBenchmark` microbenchmark shows no evidence for performance impact (within error bounds of each other):

```
before:
Benchmark                                                   (num)   Mode  Cnt    Score    Error  Units
ScheduleFutureTaskBenchmark.scheduleCancelLotsOutsideLoop  100000  thrpt   20  132.437 ± 15.116  ops/s
ScheduleFutureTaskBenchmark.scheduleLots                   100000  thrpt   20  694.475 ±  8.184  ops/s
ScheduleFutureTaskBenchmark.scheduleLotsOutsideLoop        100000  thrpt   20   88.037 ±  4.013  ops/s
after:
Benchmark                                                   (num)   Mode  Cnt    Score   Error  Units
ScheduleFutureTaskBenchmark.scheduleCancelLotsOutsideLoop  100000  thrpt   20  149.629 ± 7.514  ops/s
ScheduleFutureTaskBenchmark.scheduleLots                   100000  thrpt   20  688.954 ± 7.831  ops/s
ScheduleFutureTaskBenchmark.scheduleLotsOutsideLoop        100000  thrpt   20   85.426 ± 1.104  ops/s
```

The new `ScheduleFutureTaskDeadlineBenchmark` shows some performance degradation:

```
before:
Benchmark                                             Mode  Cnt         Score        Error  Units
ScheduleFutureTaskDeadlineBenchmark.requestDeadline  thrpt   20  60726336.795 ± 280054.533  ops/s
after:
Benchmark                                             Mode  Cnt         Score        Error  Units
ScheduleFutureTaskDeadlineBenchmark.requestDeadline  thrpt   20  56948231.480 ± 188264.092  ops/s
```

The difference is small, but it's there, so I investigated this further using jitwatch. Looking at the generated assembly, the call to `getCurrentTimeNanos` is devirtualized and inlined in the absence of `EmbeddedEventLoop`, so the code is mostly identical. However there is the added getfield and checkcast for the executor, which probably explains the discrepancy.

In my opinion this is acceptable, because the performance impact is not severe, this use is likely the worst case (virtual call through `scheduledExecutorService()`), and it is never as hot as in this benchmark.

Note that if an `EmbeddedEventLoop` is present in the application, the performance impact is likely substantially higher, because this would necessitate a virtual call. However this is not an issue for production applications, and the affected code is still not very hot.

Co-authored-by: Norman Maurer <norman_maurer@apple.com>
normanmaurer added a commit that referenced this issue Jun 13, 2022
Motivation:

Tests using EmbeddedEventLoop can run faster if they can "advance time" so that scheduled tasks (e.g. timeouts) run earlier. Additionally, "freeze time" functionality can improve reliability of such tests.

Modification:

- Introduce a protected method `AbstractScheduledEventExecutor.getCurrentTimeNanos` that replaces the previous static `nanoTime` method (now deprecated). Replace usages of `nanoTime` with the new method.
- Override `getCurrentTimeNanos` with the new time control (freeze, unfreeze, advanceBy) features in `EmbeddedEventLoop`.
- Add a microbenchmark that tests one of the sites that seemed most likely to see negative performance impact by the change (`ScheduledFutureTask.delayNanos`).

Result:

Fixes  #12433.

Local runs of the `ScheduleFutureTaskBenchmark` microbenchmark shows no evidence for performance impact (within error bounds of each other):

```
before:
Benchmark                                                   (num)   Mode  Cnt    Score    Error  Units
ScheduleFutureTaskBenchmark.scheduleCancelLotsOutsideLoop  100000  thrpt   20  132.437 ± 15.116  ops/s
ScheduleFutureTaskBenchmark.scheduleLots                   100000  thrpt   20  694.475 ±  8.184  ops/s
ScheduleFutureTaskBenchmark.scheduleLotsOutsideLoop        100000  thrpt   20   88.037 ±  4.013  ops/s
after:
Benchmark                                                   (num)   Mode  Cnt    Score   Error  Units
ScheduleFutureTaskBenchmark.scheduleCancelLotsOutsideLoop  100000  thrpt   20  149.629 ± 7.514  ops/s
ScheduleFutureTaskBenchmark.scheduleLots                   100000  thrpt   20  688.954 ± 7.831  ops/s
ScheduleFutureTaskBenchmark.scheduleLotsOutsideLoop        100000  thrpt   20   85.426 ± 1.104  ops/s
```

The new `ScheduleFutureTaskDeadlineBenchmark` shows some performance degradation:

```
before:
Benchmark                                             Mode  Cnt         Score        Error  Units
ScheduleFutureTaskDeadlineBenchmark.requestDeadline  thrpt   20  60726336.795 ± 280054.533  ops/s
after:
Benchmark                                             Mode  Cnt         Score        Error  Units
ScheduleFutureTaskDeadlineBenchmark.requestDeadline  thrpt   20  56948231.480 ± 188264.092  ops/s
```

The difference is small, but it's there, so I investigated this further using jitwatch. Looking at the generated assembly, the call to `getCurrentTimeNanos` is devirtualized and inlined in the absence of `EmbeddedEventLoop`, so the code is mostly identical. However there is the added getfield and checkcast for the executor, which probably explains the discrepancy.

In my opinion this is acceptable, because the performance impact is not severe, this use is likely the worst case (virtual call through `scheduledExecutorService()`), and it is never as hot as in this benchmark.

Note that if an `EmbeddedEventLoop` is present in the application, the performance impact is likely substantially higher, because this would necessitate a virtual call. However this is not an issue for production applications, and the affected code is still not very hot.

Co-authored-by: Norman Maurer <norman_maurer@apple.com>
normanmaurer added a commit that referenced this issue Jun 13, 2022
Motivation:

Tests using EmbeddedEventLoop can run faster if they can "advance time" so that scheduled tasks (e.g. timeouts) run earlier. Additionally, "freeze time" functionality can improve reliability of such tests.

Modification:

- Introduce a protected method `AbstractScheduledEventExecutor.getCurrentTimeNanos` that replaces the previous static `nanoTime` method (now deprecated). Replace usages of `nanoTime` with the new method.
- Override `getCurrentTimeNanos` with the new time control (freeze, unfreeze, advanceBy) features in `EmbeddedEventLoop`.
- Add a microbenchmark that tests one of the sites that seemed most likely to see negative performance impact by the change (`ScheduledFutureTask.delayNanos`).

Result:

Fixes  #12433.

Local runs of the `ScheduleFutureTaskBenchmark` microbenchmark shows no evidence for performance impact (within error bounds of each other):

```
before:
Benchmark                                                   (num)   Mode  Cnt    Score    Error  Units
ScheduleFutureTaskBenchmark.scheduleCancelLotsOutsideLoop  100000  thrpt   20  132.437 ± 15.116  ops/s
ScheduleFutureTaskBenchmark.scheduleLots                   100000  thrpt   20  694.475 ±  8.184  ops/s
ScheduleFutureTaskBenchmark.scheduleLotsOutsideLoop        100000  thrpt   20   88.037 ±  4.013  ops/s
after:
Benchmark                                                   (num)   Mode  Cnt    Score   Error  Units
ScheduleFutureTaskBenchmark.scheduleCancelLotsOutsideLoop  100000  thrpt   20  149.629 ± 7.514  ops/s
ScheduleFutureTaskBenchmark.scheduleLots                   100000  thrpt   20  688.954 ± 7.831  ops/s
ScheduleFutureTaskBenchmark.scheduleLotsOutsideLoop        100000  thrpt   20   85.426 ± 1.104  ops/s
```

The new `ScheduleFutureTaskDeadlineBenchmark` shows some performance degradation:

```
before:
Benchmark                                             Mode  Cnt         Score        Error  Units
ScheduleFutureTaskDeadlineBenchmark.requestDeadline  thrpt   20  60726336.795 ± 280054.533  ops/s
after:
Benchmark                                             Mode  Cnt         Score        Error  Units
ScheduleFutureTaskDeadlineBenchmark.requestDeadline  thrpt   20  56948231.480 ± 188264.092  ops/s
```

The difference is small, but it's there, so I investigated this further using jitwatch. Looking at the generated assembly, the call to `getCurrentTimeNanos` is devirtualized and inlined in the absence of `EmbeddedEventLoop`, so the code is mostly identical. However there is the added getfield and checkcast for the executor, which probably explains the discrepancy.

In my opinion this is acceptable, because the performance impact is not severe, this use is likely the worst case (virtual call through `scheduledExecutorService()`), and it is never as hot as in this benchmark.

Note that if an `EmbeddedEventLoop` is present in the application, the performance impact is likely substantially higher, because this would necessitate a virtual call. However this is not an issue for production applications, and the affected code is still not very hot.

Co-authored-by: Norman Maurer <norman_maurer@apple.com>
Co-authored-by: Jonas Konrad <me@yawk.at>
raidyue pushed a commit to raidyue/netty that referenced this issue Jul 8, 2022
Motivation:

Tests using EmbeddedEventLoop can run faster if they can "advance time" so that scheduled tasks (e.g. timeouts) run earlier. Additionally, "freeze time" functionality can improve reliability of such tests.

Modification:

- Introduce a protected method `AbstractScheduledEventExecutor.getCurrentTimeNanos` that replaces the previous static `nanoTime` method (now deprecated). Replace usages of `nanoTime` with the new method.
- Override `getCurrentTimeNanos` with the new time control (freeze, unfreeze, advanceBy) features in `EmbeddedEventLoop`.
- Add a microbenchmark that tests one of the sites that seemed most likely to see negative performance impact by the change (`ScheduledFutureTask.delayNanos`).

Result:

Fixes  netty#12433. 

Local runs of the `ScheduleFutureTaskBenchmark` microbenchmark shows no evidence for performance impact (within error bounds of each other):

```
before:
Benchmark                                                   (num)   Mode  Cnt    Score    Error  Units
ScheduleFutureTaskBenchmark.scheduleCancelLotsOutsideLoop  100000  thrpt   20  132.437 ± 15.116  ops/s
ScheduleFutureTaskBenchmark.scheduleLots                   100000  thrpt   20  694.475 ±  8.184  ops/s
ScheduleFutureTaskBenchmark.scheduleLotsOutsideLoop        100000  thrpt   20   88.037 ±  4.013  ops/s
after:
Benchmark                                                   (num)   Mode  Cnt    Score   Error  Units
ScheduleFutureTaskBenchmark.scheduleCancelLotsOutsideLoop  100000  thrpt   20  149.629 ± 7.514  ops/s
ScheduleFutureTaskBenchmark.scheduleLots                   100000  thrpt   20  688.954 ± 7.831  ops/s
ScheduleFutureTaskBenchmark.scheduleLotsOutsideLoop        100000  thrpt   20   85.426 ± 1.104  ops/s
```

The new `ScheduleFutureTaskDeadlineBenchmark` shows some performance degradation:

```
before:
Benchmark                                             Mode  Cnt         Score        Error  Units
ScheduleFutureTaskDeadlineBenchmark.requestDeadline  thrpt   20  60726336.795 ± 280054.533  ops/s
after:
Benchmark                                             Mode  Cnt         Score        Error  Units
ScheduleFutureTaskDeadlineBenchmark.requestDeadline  thrpt   20  56948231.480 ± 188264.092  ops/s
```

The difference is small, but it's there, so I investigated this further using jitwatch. Looking at the generated assembly, the call to `getCurrentTimeNanos` is devirtualized and inlined in the absence of `EmbeddedEventLoop`, so the code is mostly identical. However there is the added getfield and checkcast for the executor, which probably explains the discrepancy.

In my opinion this is acceptable, because the performance impact is not severe, this use is likely the worst case (virtual call through `scheduledExecutorService()`), and it is never as hot as in this benchmark.

Note that if an `EmbeddedEventLoop` is present in the application, the performance impact is likely substantially higher, because this would necessitate a virtual call. However this is not an issue for production applications, and the affected code is still not very hot.

Co-authored-by: Norman Maurer <norman_maurer@apple.com>
franz1981 pushed a commit to franz1981/netty that referenced this issue Aug 22, 2022
Motivation:

Tests using EmbeddedEventLoop can run faster if they can "advance time" so that scheduled tasks (e.g. timeouts) run earlier. Additionally, "freeze time" functionality can improve reliability of such tests.

Modification:

- Introduce a protected method `AbstractScheduledEventExecutor.getCurrentTimeNanos` that replaces the previous static `nanoTime` method (now deprecated). Replace usages of `nanoTime` with the new method.
- Override `getCurrentTimeNanos` with the new time control (freeze, unfreeze, advanceBy) features in `EmbeddedEventLoop`.
- Add a microbenchmark that tests one of the sites that seemed most likely to see negative performance impact by the change (`ScheduledFutureTask.delayNanos`).

Result:

Fixes  netty#12433. 

Local runs of the `ScheduleFutureTaskBenchmark` microbenchmark shows no evidence for performance impact (within error bounds of each other):

```
before:
Benchmark                                                   (num)   Mode  Cnt    Score    Error  Units
ScheduleFutureTaskBenchmark.scheduleCancelLotsOutsideLoop  100000  thrpt   20  132.437 ± 15.116  ops/s
ScheduleFutureTaskBenchmark.scheduleLots                   100000  thrpt   20  694.475 ±  8.184  ops/s
ScheduleFutureTaskBenchmark.scheduleLotsOutsideLoop        100000  thrpt   20   88.037 ±  4.013  ops/s
after:
Benchmark                                                   (num)   Mode  Cnt    Score   Error  Units
ScheduleFutureTaskBenchmark.scheduleCancelLotsOutsideLoop  100000  thrpt   20  149.629 ± 7.514  ops/s
ScheduleFutureTaskBenchmark.scheduleLots                   100000  thrpt   20  688.954 ± 7.831  ops/s
ScheduleFutureTaskBenchmark.scheduleLotsOutsideLoop        100000  thrpt   20   85.426 ± 1.104  ops/s
```

The new `ScheduleFutureTaskDeadlineBenchmark` shows some performance degradation:

```
before:
Benchmark                                             Mode  Cnt         Score        Error  Units
ScheduleFutureTaskDeadlineBenchmark.requestDeadline  thrpt   20  60726336.795 ± 280054.533  ops/s
after:
Benchmark                                             Mode  Cnt         Score        Error  Units
ScheduleFutureTaskDeadlineBenchmark.requestDeadline  thrpt   20  56948231.480 ± 188264.092  ops/s
```

The difference is small, but it's there, so I investigated this further using jitwatch. Looking at the generated assembly, the call to `getCurrentTimeNanos` is devirtualized and inlined in the absence of `EmbeddedEventLoop`, so the code is mostly identical. However there is the added getfield and checkcast for the executor, which probably explains the discrepancy.

In my opinion this is acceptable, because the performance impact is not severe, this use is likely the worst case (virtual call through `scheduledExecutorService()`), and it is never as hot as in this benchmark.

Note that if an `EmbeddedEventLoop` is present in the application, the performance impact is likely substantially higher, because this would necessitate a virtual call. However this is not an issue for production applications, and the affected code is still not very hot.

Co-authored-by: Norman Maurer <norman_maurer@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants