Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8307766: Linux: Provide the option to override the timer slack #13889

Closed
wants to merge 13 commits into from

Conversation

shipilev
Copy link
Member

@shipilev shipilev commented May 9, 2023

See the description in the RFE.

On c6.8xlarge, the sleep accuracy is improved the following way:

% CONF=linux-x86_64-server-release make test TEST=micro:ThreadSleep.millisNanos

Benchmark                   (sleep)  Mode  Cnt           Score      Error  Units

# Default
ThreadSleep.millisNanos           0  avgt   15         473.727 ±    3.542  ns/op
ThreadSleep.millisNanos           1  avgt   15         585.868 ±    2.010  ns/op
ThreadSleep.millisNanos          10  avgt   15         583.475 ±    1.426  ns/op
ThreadSleep.millisNanos         100  avgt   15       57833.073 ±  146.406  ns/op
ThreadSleep.millisNanos        1000  avgt   15       59324.901 ±  620.924  ns/op
ThreadSleep.millisNanos       10000  avgt   15       68071.990 ±  537.880  ns/op
ThreadSleep.millisNanos      100000  avgt   15      158069.170 ±  329.991  ns/op
ThreadSleep.millisNanos     1000000  avgt   15     1059044.770 ±  370.982  ns/op
ThreadSleep.millisNanos    10000000  avgt   15    10060017.354 ±  399.309  ns/op
ThreadSleep.millisNanos   100000000  avgt   15   100062969.073 ± 2399.053  ns/op
ThreadSleep.millisNanos  1000000000  avgt   15  1000068474.467 ± 2669.980  ns/op

# -XX:TimerSlack=1
ThreadSleep.millisNanos           0  avgt   15         469.632 ±    1.500  ns/op
ThreadSleep.millisNanos           1  avgt   15         587.562 ±    8.966  ns/op
ThreadSleep.millisNanos          10  avgt   15         584.241 ±    1.997  ns/op
ThreadSleep.millisNanos         100  avgt   15        7907.160 ±   12.476  ns/op
ThreadSleep.millisNanos        1000  avgt   15        7913.984 ±   56.996  ns/op
ThreadSleep.millisNanos       10000  avgt   15       17929.107 ±  432.083  ns/op
ThreadSleep.millisNanos      100000  avgt   15      108154.851 ±  403.717  ns/op
ThreadSleep.millisNanos     1000000  avgt   15     1008854.095 ±  418.444  ns/op
ThreadSleep.millisNanos    10000000  avgt   15    10009902.829 ±  279.617  ns/op
ThreadSleep.millisNanos   100000000  avgt   15   100014000.020 ± 2143.820  ns/op
ThreadSleep.millisNanos  1000000000  avgt   15  1000020734.333 ± 4846.754  ns/op

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8307766: Linux: Provide the option to override the timer slack (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/13889/head:pull/13889
$ git checkout pull/13889

Update a local copy of the PR:
$ git checkout pull/13889
$ git pull https://git.openjdk.org/jdk.git pull/13889/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 13889

View PR using the GUI difftool:
$ git pr show -t 13889

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/13889.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented May 9, 2023

👋 Welcome back shade! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented May 9, 2023

@shipilev The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label May 9, 2023
@shipilev shipilev force-pushed the JDK-8307766-linux-timer-slack branch from 7c048d0 to 4251128 Compare May 19, 2023 07:34
@shipilev shipilev marked this pull request as ready for review May 19, 2023 07:43
@openjdk openjdk bot added the rfr Pull request is ready for review label May 19, 2023
@mlbridge
Copy link

mlbridge bot commented May 19, 2023

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is really of use to people then I expect it would also be useful to know what the current setting is for a thread - perhaps either logged at thread creation, or perhaps included in one of the thread print_on functions?

@dholmes-ora
Copy link
Member

dholmes-ora commented May 22, 2023

ThreadSleep.millisNanos          10  avgt   15         583.475 ±    1.426  ns/op
ThreadSleep.millisNanos         100  avgt   15       57833.073 ±  146.406  ns/op
ThreadSleep.millisNanos        1000  avgt   15       59324.901 ±  620.924  ns/op
ThreadSleep.millisNanos       10000  avgt   15       68071.990 ±  537.880  ns/op

Not sure what is being measured here but this looks odd. The first 10x increase to 100 results in a 100x increase. But then the next 10x is almost no difference, and the 10x after that not that much difference.

@shipilev
Copy link
Member Author

ThreadSleep.millisNanos          10  avgt   15         583.475 ±    1.426  ns/op
ThreadSleep.millisNanos         100  avgt   15       57833.073 ±  146.406  ns/op
ThreadSleep.millisNanos        1000  avgt   15       59324.901 ±  620.924  ns/op
ThreadSleep.millisNanos       10000  avgt   15       68071.990 ±  537.880  ns/op

Not sure what is being measured here but this looks odd. The first 10x increase to 100 results in a 100x increase. But then the next 10x is almost no difference, and the 10x after that not that much difference.

Odd, right? This is how the 50us default timer slack value introduces jitter: you can turn around fast, or you can get stalled up to the timer slack. And this is precisely (pun intended) why I'd like to have some control over the timer slack :)

@shipilev
Copy link
Member Author

If this is really of use to people then I expect it would also be useful to know what the current setting is for a thread - perhaps either logged at thread creation, or perhaps included in one of the thread print_on functions?

Yes, that would be a good thing to add, see new commit. I tried to limit the fan-out for this change by printing the timer slack in Linux-specific thread init block, near the stack printing.

@dholmes-ora
Copy link
Member

And this is precisely (pun intended) why I'd like to have some control over the timer slack

I'm still not really understanding what this timer slack actually means in practice and how it relates to the sleep benchmark.

@shipilev
Copy link
Member Author

shipilev commented May 23, 2023

And this is precisely (pun intended) why I'd like to have some control over the timer slack

I'm still not really understanding what this timer slack actually means in practice and how it relates to the sleep benchmark.

Perhaps this LWN article would help: https://lwn.net/Articles/369549/

Arjan van de Ven has proposed an enhancement to the timer API - called 
[timer slack](http://lwn.net/Articles/369361/) - which should make it easier to coalesce
timer events. In essence, it adds a certain amount of fuzziness to timer 
expiration times, giving the kernel some flexibility in how the timers are scheduled.
That fuzziness is set with:

    void set_timer_slack(struct timer_list *timer, int slack_hz);

In essence, this call says that any timeout scheduled with the given timer can be
delayed by up to slack_hz jiffies. By default, the slack is set to 0.4% of the total 
timeout period - a very conservative value. When the timer is queued, the actual 
expiration time is determined by means of a simple algorithm to choose a well-defined
time within the slack period.

Larger timer slack => coarser timer expiration window => more opportunities to merge timers / keep CPUs at low power state => less accuracy / more efficiency.

Timer slack affects Thread.sleep, because we effectively use hi-res timers that are subject to timer slack to perform the sleep.

@dholmes-ora
Copy link
Member

Perhaps this LWN article would help ...

Thanks. So are timer events only coalesced within a process, or is this system wide? I'm guessing the basic operation is that if a thread requests a "timed wait" that would require a timer event at time T1, but there already exists a timer event for time T2, (T2 > T1) then if T2-T1 < timer-slack then an event for T1 is not created and we piggy-back on the event for T2 instead. So the lower the slack the more likely we get our own timer-event and so get a resolution closer to that of the h/w, but it also means we have more timer-events to process so that could lower the resolution again.

@shipilev
Copy link
Member Author

shipilev commented May 23, 2023

Perhaps this LWN article would help ...

Thanks. So are timer events only coalesced within a process, or is this system wide? I'm guessing the basic operation is that if a thread requests a "timed wait" that would require a timer event at time T1, but there already exists a timer event for time T2, (T2 > T1) then if T2-T1 < timer-slack then an event for T1 is not created and we piggy-back on the event for T2 instead. So the lower the slack the more likely we get our own timer-event and so get a resolution closer to that of the h/w, but it also means we have more timer-events to process so that could lower the resolution again.

I have not read the kernel sources for this, but my recollection is that timer lists are maintained per-CPU + there is a migration mechanics, which effectively makes timers system-wide. So timer slack allows piggy-backing on hi-res timers already pending on the system, which improves efficiency at the accuracy cost. I suspect that even if we are the only timer currently pending, the kernel would still apply the timer slack to it, hoping that some other timer would be able to piggy-back on us. Which explains why a single-threaded Thread.sleep workload is sensitive to timer slack even on a rather quiet system.

Yes, I do wonder if we can blow out the kernel time subsystem with very low timer slack value, but that is something that needed to be checked for the concrete workload. Linux helpfully provides us the API to tune the heuristics either way, so this PR gives us the handy ability to experiment with it :)

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 20, 2023

@shipilev This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@shipilev
Copy link
Member Author

(Trying to revive this PR) Let me know if there are more comments to this.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is experimental and seems non-intrusive I can approve it. But how will this be tested? Can we at least sanity check setting it and reading back the value we set?

Thanks.

@shipilev
Copy link
Member Author

As this is experimental and seems non-intrusive I can approve it. But how will this be tested? Can we at least sanity check setting it and reading back the value we set?

Yes, and even more, see the new commit with a test.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all seems reasonable to me.

Thanks.

@openjdk
Copy link

openjdk bot commented Jul 6, 2023

@shipilev This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8307766: Linux: Provide the option to override the timer slack

Reviewed-by: dholmes, stuefe

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 10 new commits pushed to the master branch:

  • edb2be1: 8311279: TestStressIGVNAndCCP.java failed with different IGVN traces for the same seed
  • d072c40: 8311183: Remove unused mapping test files
  • 66d2736: 8307526: [JFR] Better handling of tampered JFR repository
  • 0616648: 8311035: CDS should not use dump time JVM narrow Klass encoding to pre-compute Klass ids
  • 6eba096: 8310999: Add @SInCE info in jdk.jsobject files
  • 6ebb0e3: 8311023: assert(false) failed: EA: missing memory path
  • 2cffef2: 8311290: Improve java.lang.ref.Cleaner rendered documentation
  • 22e17c2: 8311180: Remove unused unneeded definitions from globalDefinitions
  • cf82e31: 8311077: Fix -Wconversion warnings in jvmti code
  • 00ac46c: 8310645: CancelledResponse.java does not use HTTP/2 when testing the HttpClient

Please see this link for an up-to-date comparison between the source branch of this pull request and the master branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jul 6, 2023
Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay.

@shipilev
Copy link
Member Author

shipilev commented Jul 6, 2023

RISC-V build failure looks like an infrastructure issue.

I re-tested current patch on ThreadSleep bench with various TimerSlack values, and they still make sense.

/integrate

@openjdk
Copy link

openjdk bot commented Jul 6, 2023

Going to push as commit 7173c30.
Since your change was applied there have been 13 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jul 6, 2023
@openjdk openjdk bot closed this Jul 6, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jul 6, 2023
@openjdk
Copy link

openjdk bot commented Jul 6, 2023

@shipilev Pushed as commit 7173c30.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@shipilev shipilev deleted the JDK-8307766-linux-timer-slack branch August 10, 2023 08:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime hotspot-runtime-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

3 participants