-
Notifications
You must be signed in to change notification settings - Fork 5.8k
8308387: CLD created and unloading list sharing _next node pointer leads to concurrent YC missing CLD roots #14241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Welcome back aboldtch! A progress list of the required criteria for merging this PR into |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the patch looks good. I've added a suggestion for a rewording of the comment about the two list. Feel free to skip it if you liked your comment better.
@xmas92 This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 59 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
Co-authored-by: Stefan Karlsson <stefan.karlsson@oracle.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Is there a way to add a test?
@@ -268,14 +269,6 @@ inline void assert_is_safepoint_or_gc() { | |||
"Must be called by safepoint or GC"); | |||
} | |||
|
|||
void ClassLoaderDataGraph::cld_unloading_do(CLDClosure* cl) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for noticing that we're not using this anymore and cleaning it up. JFR used to use this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is much clearer. Thanks for fixing the bug and cleaning it up.
Is the YC in the synopsis meant to be GC? If so please fix in JBS and the PR. Thanks |
The timing window here is really narrow. It requires the young root scan iteration to reach a dead CLD before it is unlinked by the old collection and hold on to it until after it is linked onto the unloading list, the young root scan filters out dead CLDs so this is just a couple of instructions. It is possible to make the the race more reproducible with a couple of sleeps in the VM and running frequent concurrent young collections and class unloading. There might be a possibility to abstract the linked list implementation and make it mockable in gtests, and then have gtests which stresses concurrent reads and unlinks and asserts invariants. However that would require a rewrite that does not feel worth it.
YC is suppose to abbreviate Young Collection in a generational collector. But maybe YC/OC are not common enough terms to be used in a general bug report. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Sorry for missing this the first round.
Better not to add a test if it's sensitive to timing. thanks. |
Thanks for all the reviews. |
Going to push as commit 7b0a336.
Your commit was automatically rebased without conflicts. |
JDK-8307106 introduced the ability to walk the created CLDs list in the CLDG without a lock. This was primarily introduced to allow lockless concurrent CLD roots scanning for young collections in generational ZGC. However because the CLD _next node pointer is shared between the two list this can lead to a concurrent iteration of the created CLDs missing list entries.
This change introduces a second _unloading_next node pointer which is used for the unloading CLDs list. The set_next is now maintains the invariant that it only ever unlinks is_unloading() CLDs and maintains a consistent view of the tail list for anyone reading the list concurrently.
Testing: tier1-3 and tier1-7 with Generational ZGC
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/14241/head:pull/14241
$ git checkout pull/14241
Update a local copy of the PR:
$ git checkout pull/14241
$ git pull https://git.openjdk.org/jdk.git pull/14241/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 14241
View PR using the GUI difftool:
$ git pr show -t 14241
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/14241.diff
Webrev
Link to Webrev Comment