New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8255984: Shenandoah: "adaptive" heuristic is prone to missing load spikes #1099
8255984: Shenandoah: "adaptive" heuristic is prone to missing load spikes #1099
Conversation
The reactive heuristic maintains a margin of error and a spike detection mechanism to trigger cycles somewhat more aggressively than the 'adaptive' heuristic. This heuristic 'reacts' to the outcome of GC cycles by adjusting the sensitivity of the triggers.
Hi @earthling-amzn, welcome to this OpenJDK project and thanks for contributing! We do not recognize you as Contributor and need to ensure you have signed the Oracle Contributor Agreement (OCA). If you have not signed the OCA, please follow the instructions. Please fill in your GitHub username in the "Username" field of the application. Once you have signed the OCA, please let us know by writing If you already are an OpenJDK Author, Committer or Reviewer, please click here to open a new issue so that we can record that fact. Please use "Add GitHub user earthling-amzn" as summary for the issue. If you are contributing this work on behalf of your employer and your employer has signed the OCA, please let us know by writing |
@earthling-amzn The following labels will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command. |
/covered |
Thank you! Please allow for a few business days to verify that your employer has signed the OCA. Also, please note that pull requests that are pending an OCA check will not usually be evaluated, so your patience is appreciated! |
Hi, Please contact me at Dalibor.topic@oracle.com so that I can verify your GitHub account. Thanks! |
This looks interesting. I have to look closer at this, but maybe we should instead fold this to |
I think folding it in to |
While waiting for merge to "adaptive", here is a round of minor things from the first read.
src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp
Outdated
Show resolved
Hide resolved
src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp
Outdated
Show resolved
Hide resolved
src/hotspot/share/gc/shenandoah/heuristics/shenandoahReactiveHeuristics.cpp
Outdated
Show resolved
Hide resolved
src/hotspot/share/gc/shenandoah/heuristics/shenandoahReactiveHeuristics.cpp
Outdated
Show resolved
Hide resolved
src/hotspot/share/gc/shenandoah/heuristics/shenandoahReactiveHeuristics.cpp
Outdated
Show resolved
Hide resolved
src/hotspot/share/gc/shenandoah/heuristics/shenandoahReactiveHeuristics.cpp
Outdated
Show resolved
Hide resolved
- Remove unnecessary log level checks - More idiomatic style
Thanks for updates. I would like to dive in with some testing. Second read nits below, meanwhile.
src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp
Outdated
Show resolved
Hide resolved
src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp
Outdated
Show resolved
Hide resolved
Oh, and change PR title to "8255984: Shenandoah: "adaptive" heuristic is prone to missing load spikes", so that bots know what issue it refers to. |
Webrevs
|
This lets the heuristics update state without const_casts.
Though we tested this with several benchmarks and internal services, these changes were primarily developed using the HyperAlloc benchmark. The attached chart shows stacked histograms for the heuristics that triggered a collection cycle during a run of HyperAlloc. Each test run was given a 3g heap. HyperAlloc targeted a 1g/s allocation rate (-a 1000) with a max object size of 1000 bytes (-x 1000). It was configured to maintain 1gb of live objects on the heap (-s 1024) and it ran for two minutes (-d 120). Each row in the chart is for a different allocation smoothness factor. The top row has an allocation smoothness of zero (i.e., most spiky), the bottom row has an allocation smoothness of 0.4, still spiky, but manageable. Each column in the chart is a different heuristic. On the left is the original |
The more I study this patch, the more I like it, good job. Another round of changes below. I fixed a few mentioned issues in my private working copy while doing initial performance testing. Would be good to have an updated PR for testing too.
src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp
Outdated
Show resolved
Hide resolved
src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp
Outdated
Show resolved
Hide resolved
src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp
Outdated
Show resolved
Hide resolved
src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp
Outdated
Show resolved
Hide resolved
src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp
Outdated
Show resolved
Hide resolved
src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp
Outdated
Show resolved
Hide resolved
src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp
Outdated
Show resolved
Hide resolved
src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp
Outdated
Show resolved
Hide resolved
src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp
Outdated
Show resolved
Hide resolved
src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp
Outdated
Show resolved
Hide resolved
Yes, it mostly matches what I would expect from this heuristic improvement. As it stands right now, "adaptive" balances way too closely to the cliff, and so it is expected that improvement would make GC cycles more frequent to move GC triggers away from it. On the chart, "Headroom" probably means "Average allocation rate" trigger? It still a bit sad we cannot handle |
I don't understand why GH actions do not test your patch :/ Can you go to https://github.com/earthling-amzn/jdk/settings/ and check that both "Actions" and "Secrets" do not have anything special defined? "Actions" should say "Allow all actions", "Secrets" should not have JDK_SUBMIT_PLATFORMS defined. |
"Allow all actions" is checked, nothing defined for secrets. Is it because the title didn't have the JBS issue number in it when I opened the PR? |
Unlikely. If you go to https://github.com/earthling-amzn/jdk/actions, can you trigger the workflow by hand? |
Ah, it says this at the actions link:
I've re-enabled them and the pre-submit action is running now. |
I'm also running my own test suite (heapotheysis/HyperAlloc) with different permutations for the margin of error, spike threshold and sample frequency. I'll run the whole thing again without any headroom buffer. Will have the results by end of day. |
Finished running more tests. My preference is still to keep the headroom/penalties in place. It can run without these, but there is a modest increase in degenerated cycles. If we did remove these safeguards, I'd recommend increasing the default sample frequency to 100hz. |
Okay then! Time to integrate and do anything else in follow-ups.
@earthling-amzn This change now passes all automated pre-integration checks. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 114 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@shipilev) but any other Committer may sponsor as well.
|
Without this, the average allocation rate will be much higher than it should be.
This causes too many false positives and triggers unnecessary cycles.
Sorry, added one bug fix and another small change. The bug caused the average allocation rate to be much higher than it should be. The other change is to only consider the values which are actually sampled as a "spike" (as opposed to previously when it would recompute the instantaneous rate on every call to |
Okay, I have a minor suggestion that seems still preserve the semantics. It is generally looks good, the testing passes, and benchmarks look okay (some regressions, but expected). Feel free to integrate.
src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp
Outdated
Show resolved
Hide resolved
/integrate |
@earthling-amzn |
/sponsor |
Congratulations! If you have time, please work on removing old alloc spike threshold and penalties handling. |
@shipilev @earthling-amzn Since your change was applied there have been 114 commits pushed to the
Your commit was automatically rebased without conflicts. Pushed as commit aac5c2a. |
This change adds a "reactive" heuristic for triggering concurrent GC cycles.
The reactive heuristic maintains a margin of error and an allocation spike detection mechanism to trigger cycles somewhat more aggressively than the 'adaptive' heuristic. This heuristic 'reacts' to the outcome of GC cycles by adjusting the sensitivity of the triggers.
JBS ticket is here: https://bugs.openjdk.java.net/browse/JDK-8255984
The "adaptive" heuristic remains the default.
Steps to reproduce and test will follow shortly (there are no new jtreg test failures for Shenandoah with this change).
Progress
Testing
Issue
Reviewers
Download
$ git fetch https://git.openjdk.java.net/jdk pull/1099/head:pull/1099
$ git checkout pull/1099