Skip to content

8308387: CLD created and unloading list sharing _next node pointer leads to concurrent YC missing CLD roots #14241

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 12 commits into from

Conversation

xmas92
Copy link
Member

@xmas92 xmas92 commented May 31, 2023

JDK-8307106 introduced the ability to walk the created CLDs list in the CLDG without a lock. This was primarily introduced to allow lockless concurrent CLD roots scanning for young collections in generational ZGC. However because the CLD _next node pointer is shared between the two list this can lead to a concurrent iteration of the created CLDs missing list entries.

This change introduces a second _unloading_next node pointer which is used for the unloading CLDs list. The set_next is now maintains the invariant that it only ever unlinks is_unloading() CLDs and maintains a consistent view of the tail list for anyone reading the list concurrently.

Testing: tier1-3 and tier1-7 with Generational ZGC


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8308387: CLD created and unloading list sharing _next node pointer leads to concurrent YC missing CLD roots

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/14241/head:pull/14241
$ git checkout pull/14241

Update a local copy of the PR:
$ git checkout pull/14241
$ git pull https://git.openjdk.org/jdk.git pull/14241/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 14241

View PR using the GUI difftool:
$ git pr show -t 14241

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/14241.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented May 31, 2023

👋 Welcome back aboldtch! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label May 31, 2023
@openjdk
Copy link

openjdk bot commented May 31, 2023

@xmas92 The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label May 31, 2023
@mlbridge
Copy link

mlbridge bot commented May 31, 2023

Webrevs

Copy link
Member

@stefank stefank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the patch looks good. I've added a suggestion for a rewording of the comment about the two list. Feel free to skip it if you liked your comment better.

@openjdk
Copy link

openjdk bot commented May 31, 2023

@xmas92 This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8308387: CLD created and unloading list sharing _next node pointer leads to concurrent YC missing CLD roots

Reviewed-by: stefank, coleenp, dholmes, eosterlund

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 59 new commits pushed to the master branch:

  • 2bb1972: 8308954: [JVMCI] code installation increments decompile_count for call_site_target_value failures
  • 0ab0963: 8308469: [PPC64] Implement alternative fast-locking scheme
  • ec55539: 8309138: Fix container tests for jdks with symlinked conf dir
  • e827164: 8309146: extend JDI StackFrame.setValue() and JDWP StackFrame.setValues minimal support for virtual threads
  • be36096: 8302822: Method/Field/Constructor/RecordComponent::getGenericInfo() is not thread safe
  • c6f20db: 8308232: nsk/jdb tests don't pass -verbose flag to the debuggee
  • d987176: 8307794: Test for HSS/LMS Signature Verification
  • 050425b: 8298127: HSS/LMS Signature Verification
  • a6109bf: 8308856: jdk.internal.classfile.impl.EntryMap::nextPowerOfTwo math problem
  • 6adc242: 8308943: jdk.internal.le build fails on AIX
  • ... and 49 more: https://git.openjdk.org/jdk/compare/78aac241b8a3f29111e2901e8b7fbadd502a31a9...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label May 31, 2023
Co-authored-by: Stefan Karlsson <stefan.karlsson@oracle.com>
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels May 31, 2023
@openjdk openjdk bot added ready Pull request is ready to be integrated rfr Pull request is ready for review labels May 31, 2023
Copy link
Contributor

@coleenp coleenp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Is there a way to add a test?

@@ -268,14 +269,6 @@ inline void assert_is_safepoint_or_gc() {
"Must be called by safepoint or GC");
}

void ClassLoaderDataGraph::cld_unloading_do(CLDClosure* cl) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for noticing that we're not using this anymore and cleaning it up. JFR used to use this.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is much clearer. Thanks for fixing the bug and cleaning it up.

@dholmes-ora
Copy link
Member

Is the YC in the synopsis meant to be GC? If so please fix in JBS and the PR. Thanks

@xmas92
Copy link
Member Author

xmas92 commented Jun 1, 2023

Looks good. Is there a way to add a test?

The timing window here is really narrow. It requires the young root scan iteration to reach a dead CLD before it is unlinked by the old collection and hold on to it until after it is linked onto the unloading list, the young root scan filters out dead CLDs so this is just a couple of instructions. It is possible to make the the race more reproducible with a couple of sleeps in the VM and running frequent concurrent young collections and class unloading.

There might be a possibility to abstract the linked list implementation and make it mockable in gtests, and then have gtests which stresses concurrent reads and unlinks and asserts invariants. However that would require a rewrite that does not feel worth it.

Is the YC in the synopsis meant to be GC? If so please fix in JBS and the PR. Thanks

YC is suppose to abbreviate Young Collection in a generational collector. But maybe YC/OC are not common enough terms to be used in a general bug report.
I will at least extend the description on JBS to reflect the bug, not just the symptom (the crash).

Copy link
Contributor

@fisk fisk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Sorry for missing this the first round.

@coleenp
Copy link
Contributor

coleenp commented Jun 1, 2023

Looks good. Is there a way to add a test?

Better not to add a test if it's sensitive to timing. thanks.

@xmas92
Copy link
Member Author

xmas92 commented Jun 2, 2023

Thanks for all the reviews.
/integrate

@openjdk
Copy link

openjdk bot commented Jun 2, 2023

Going to push as commit 7b0a336.
Since your change was applied there have been 64 commits pushed to the master branch:

  • 60f3b87: 8309295: C2: MaxNode::signed_min() returns nullptr for int operands
  • 8007599: 8309093: Underscore with brackets
  • 5bd2af2: 8307478: Implementation of Prepare to Restrict The Dynamic Loading of Agents
  • 325940b: 8307105: JFileChooser InvalidPathException when selecting some system folders on Windows
  • 101bf22: 8308891: TestCDSVMCrash.java needs @requires vm.cds
  • 2bb1972: 8308954: [JVMCI] code installation increments decompile_count for call_site_target_value failures
  • 0ab0963: 8308469: [PPC64] Implement alternative fast-locking scheme
  • ec55539: 8309138: Fix container tests for jdks with symlinked conf dir
  • e827164: 8309146: extend JDI StackFrame.setValue() and JDWP StackFrame.setValues minimal support for virtual threads
  • be36096: 8302822: Method/Field/Constructor/RecordComponent::getGenericInfo() is not thread safe
  • ... and 54 more: https://git.openjdk.org/jdk/compare/78aac241b8a3f29111e2901e8b7fbadd502a31a9...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jun 2, 2023
@openjdk openjdk bot closed this Jun 2, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jun 2, 2023
@openjdk
Copy link

openjdk bot commented Jun 2, 2023

@xmas92 Pushed as commit 7b0a336.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime hotspot-runtime-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

5 participants