-
Notifications
You must be signed in to change notification settings - Fork 38
8306334: Handle preemption of old cycle between filling and bootstrap phases #262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Welcome back wkemper! A progress list of the required criteria for merging this PR into |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this fixes JDK-8306334 (which you are re-enabling), wouldn't it make sense to use that bug id for this PR?
Yes! I'll update the PR title. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't say I fully understand the interactions of states and transitions to do justice to an actual review, but the changes look good to me. Thanks for the documentation comment noting the reason for the check and short-circuit without entering the bootstrap cycle, which may be resumed later.
I left a few comments that might be useful, but they are more in the nature of questions from this newbie reader of this code, rather than any suggestions.
@@ -533,11 +533,31 @@ void ShenandoahControlThread::service_concurrent_old_cycle(const ShenandoahHeap* | |||
assert(old_generation->state() == ShenandoahOldGeneration::BOOTSTRAPPING, "Finished with filling, should be bootstrapping"); | |||
} | |||
case ShenandoahOldGeneration::BOOTSTRAPPING: { | |||
// It is possible for a young generation request to preempt this nascent old |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is helpful.
There is an ASCII art of an intended state transition diagram for the old generation states in shenandoahOldGeneration.cpp
. It would be a good idea to update that diagram with possible premptible states and also the state transitions that might occur as a result of cancellations that might occur outside of the pre-emptible states. (I assume that cancellations can only interrupt and reset us out of pre-emptible states, not from any other, i.e. non-pre-emptible state.)
Similarly, at lines 444-458 above is another, presumably for Controller thread states. Does that figure get new transitions as a result of this change?
I see this documentation comment further below at lines 607-610:
// We can only tolerate being cancelled during concurrent marking or during preparation for mixed
// evacuation. This flag here (passed by reference) is used to control precisely where the regulator
// is allowed to cancel a GC.
ShenandoahOldGC gc(generation, _allow_old_preemption);
The comment says "cancel", but uses a flag _allow_old_preemption
. When I first saw this I was a bit confused, until I realized that cancellations will be see only in the pre-emptible states, and ignored in all other states until we have completed the work for that state and transition out of it, or if we are otherwise in a pre-emptible state.
Your description below hints at these interactions, I think, but it would be great if this is documented clearly somewhere (may be it is, somewhere in either old generation or in the control thread). It would also be nice to explain, in case this makes sense, if there any possible correspondences between states of the control thread and those of the old generation state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the diagram depicts preemption as the "YOUNG GC" boxes associated with "FILLING" and "MARKING". The comment on the diagram explains that an allocation failure may interrupt any state and that they all lead to degenerated/full GCs. These were omitted from the diagram to reduce clutter.
The new state in the control thread isn't strictly necessary for this fix, but it prevents the regulator from trying to start a concurrent young cycle during the bootstrap cycle (which is a concurrent young cycle).
The implementation of all this could certainly stand some readability improvement. Essentially, there are two reasons an old cycle may be cancelled:
- Allocation failure. This can happen at any phase of the old cycle and always results in at least a degenerated cycle.
- Preemption. If the regulator thread detects a need to run a young generation it may "preempt" the old cycle and run a young cycle. This does not result in a degenerated cycle, but it pauses the old cycle. This requires some coordination with the old collection and so is only supported during the filling and marking phases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your description & responses above; I am slowly getting the hang of the use of the terms and the supporting code.
It would be good to use the terms "cancellation" and "preemption" uniformly everywhere. Cancellibility and Pre-emptibility seem, to me, to be properties of a phase (or state) of an old collection. Here is my understanding of these two terms. Cancellation and Pre-emption are actions that may be taken on phases (states) that are respectively cancellable or pre-emptible.
Cancellation is stronger than pre-emption in that we transition out of the old generation state where cancellation happened, and the old generation collection is commandeered because of that cancellation (in some cases, resulting in a degenerate collection and in other cases a full collection).
Pre-emption, on the other hand, merely "interrupts" (or suspends) an ongoing old generation collection while leaving it in the same state, for young collection to do its work. Persumably in the happy case, the old generation collection continues from its interruption point.
In the odd case, a pre-emption may be followed by a cancellation.
Pre-emption can happen in only a subset of the states, viz. "filling" and "marking". In the non-pre-emptible cases, the pre-emption happens when the work in the state is completed and the old generation collection has proceeded to a pre-emption point, which can be at the point of transition into the next state (either pre-emptible or not).
Cancellation must be, and is, supported in any state of the old generation collection, and subsequent actions reset the state of the old generation collection to an initial (idle) state.
The term "degeneration" applies to a collection that follows an allocation failure (either by a mutator allocating in the young generation, or a mutator or GC thread allocating in the survivor space or in the old generation but not finding any space to do that allocation), which results in the cancellation of an ongoing collection cycle which would have normally produced that space (usually an old collection cycle when the allocation failure was while servicing a young collection, but could also be a young collection cycle when a mutator was unable to allocate in the young generation), and is followed by work done to complete that cycle with the failed allocation suspended until space has been freed up by that collection.
If I nailed down my understanding of the use of these terms a little, I would probably understand their relationship more easily. :-)
I'll follow up offline with y'all to make sure these are clear in my head.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an accurate description of preemption and cancellation. The two concepts share some code paths their representation is not as distinct as it could be. There is one other nuance to consider: a degenerated young cycle does not necessarily result in the cancellation of the old cycle. Really, it is only global collections (be it concurrent, degenerated or a full gc) that cause the old collection to be entirely abandoned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, that makes sense; thanks @earthling-amzn !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I have observed in various performance test workloads that the bootstrap GC takes "significantly longer" than a normal young GC. Part of this may be the requirement to coalesce and fill. Another part is probably that we are not ignoring references to old-gen during the concurrent mark effort.
In any case, one of the "heuristics" I have implemented in the expand-old-on-demand is to preceded every bootstrap GC with a young GC which is immediately followed by the bootstrap GC. This reduces the chance that we'll run out of mutator memory while doing the bootstrap and also reduces the chance that a young-gc will "trigger" while we're doing c&f.
@earthling-amzn This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 1 new commit pushed to the
Please see this link for an up-to-date comparison between the source branch of this pull request and the ➡️ To integrate this PR with the above commit message to the |
/integrate |
@earthling-amzn Pushed as commit 64d14f3. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
In the case when a request to run a young cycle arrives after making the old generation parseable but before disallowing preemption, the preemption request would cause the young bootstrap cycle to be preempted (not cancelled). This is not expected and would result in the old marking phase beginning with stale oops in the young mark queues after the mark bitmaps were cleared. This ultimately triggers assertions that such objects should be marked.
With this change if we detect a cancellation between filling and bootstrapping we make a distinction between:
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/shenandoah.git pull/262/head:pull/262
$ git checkout pull/262
Update a local copy of the PR:
$ git checkout pull/262
$ git pull https://git.openjdk.org/shenandoah.git pull/262/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 262
View PR using the GUI difftool:
$ git pr show -t 262
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/shenandoah/pull/262.diff
Webrev
Link to Webrev Comment