Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8274338: com/sun/jdi/RedefineCrossEvent.java failed "assert(m != __null) failed: NULL mirror" #5935

Closed
wants to merge 3 commits into from

Conversation

coleenp
Copy link
Contributor

@coleenp coleenp commented Oct 13, 2021

The MultiArray_lock isn't held when restoring shared arrays, so an observer can find one that isn't completely restored.
Tested with tier1-3 in progress, and CDS tests locally. Not really reproduceable otherwise even with sleeps.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8274338: com/sun/jdi/RedefineCrossEvent.java failed "assert(m != __null) failed: NULL mirror"

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/5935/head:pull/5935
$ git checkout pull/5935

Update a local copy of the PR:
$ git checkout pull/5935
$ git pull https://git.openjdk.java.net/jdk pull/5935/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 5935

View PR using the GUI difftool:
$ git pr show -t 5935

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/5935.diff

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Oct 13, 2021

👋 Welcome back coleenp! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr label Oct 13, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Oct 13, 2021

@coleenp The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot label Oct 13, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented Oct 13, 2021

Webrevs

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Hi Coleen,

Given the array_klasses() is accessed lock-free (using acquire/release semantics) I don't see how adding this lock prevents a partially restored array from being observed? The lock prevents concurrent restoration operations, and prevents restoration concurrently with creation (if that is even possible), but readers are still lock-free AFAICS.

Thanks,
David

@@ -2584,6 +2584,9 @@ void InstanceKlass::restore_unshareable_info(ClassLoaderData* loader_data, Handl
constants()->restore_unshareable_info(CHECK);

if (array_klasses() != NULL) {
// To get a consistent list of classes we need MultiArray_lock to ensure
// array classes aren't observed while they are being created.
Copy link
Member

@dholmes-ora dholmes-ora Oct 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/created/restored

Copy link
Contributor Author

@coleenp coleenp Oct 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@coleenp
Copy link
Contributor Author

@coleenp coleenp commented Oct 13, 2021

If you look at everywhere array classes are created, there's a MultiArray_lock protecting them until the mirror is created so that it seems atomic when added to the ClassLoaderData::_klasses and the mirror is created and added to the klass. The JVMTI code locks MultiArray_lock to iterate through the classes (with a similar comment, actually copied).
The restore_unshareable_info code doesn't lock restoring the mirror and adding the array Klass to the ClassLoaderData, but it should. The lock could be lower down but just taking it once seemed just as good and in a better place with less conditionals.

Copy link
Member

@iklam iklam left a comment

LGTM

@dholmes-ora
Copy link
Member

@dholmes-ora dholmes-ora commented Oct 14, 2021

If the JVMTI code already acquires the lock to iterate through the classes then I can see why taking the lock in restore_unshareable_info might fix the observed problem. But that begs the question as to why JVMTI, as an observer, needs to take the lock in the first place, when these objects should not be observable until after they have been safely published? If the objects are not being safely published then other observers may also need to take out this lock. The real bug here seems to be that instanceKlasses can be found and inspected before InstanceKlass::restore_unshareable_info has been applied to that instanceKlass! Looking at Klass::restore_unshareable_info it has this:

    // Add to class loader list first before creating the mirror
    // (same order as class file parsing)
    loader_data->add_class(this);

And that to me seems a bug as now the class can be found through the CLD before it is ready to be observed.

You fixed the current problem where JVMTI stumbles over a null mirror related to an array class, but what other inappropriately set data in the instanceKlass might it also look at?

@iklam
Copy link
Member

@iklam iklam commented Oct 14, 2021

If the JVMTI code already acquires the lock to iterate through the classes then I can see why taking the lock in restore_unshareable_info might fix the observed problem. But that begs the question as to why JVMTI, as an observer, needs to take the lock in the first place, when these objects should not be observable until after they have been safely published? If the objects are not being safely published then other observers may also need to take out this lock. The real bug here seems to be that instanceKlasses can be found and inspected before InstanceKlass::restore_unshareable_info has been applied to that instanceKlass! Looking at Klass::restore_unshareable_info it has this:

    // Add to class loader list first before creating the mirror
    // (same order as class file parsing)
    loader_data->add_class(this);

And that to me seems a bug as now the class can be found through the CLD before it is ready to be observed.

You fixed the current problem where JVMTI stumbles over a null mirror related to an array class, but what other inappropriately set data in the instanceKlass might it also look at?

Yes, I agree that's fishy. But I think the bug may not be the call to add_class. If you look at ClassFileParser::fill_instance_klass(), it also does this very early, before many fields of the class are initialized:

  const bool publicize = !is_internal();
  _loader_data->add_class(ik, publicize);

whereas Klass::restore_unshareable_info always uses publicize==true. Could that be an issue?

Also, when a class is parsed, the mirror is created almost at the very end. However, for archived classes, the mirror is created when only the Klass portion are initialized. Maybe we should delay the mirror creation to a later time?

@dholmes-ora
Copy link
Member

@dholmes-ora dholmes-ora commented Oct 14, 2021

whereas Klass::restore_unshareable_info always uses publicize==true. Could that be an issue?

The definition of publicize is just:

publicize = !is_internal();

and I can't see how that would change the visibility in terms of the JVMTI code.

Maybe we should delay the mirror creation to a later time?

I think there should be a lot of similarity between the process for regular loading and the loading from the archive. But ultimately both should have the same property: no publishing of the klass until after it is fully "initialized". But neither process seems to ensure that, which is puzzling ... unless I'm not looking high enough up for some locking in the process in both cases.

Thinking further on the fix ... without knowing exactly how the assertion failure arises, it seems to me that the fix cannot be complete. If the JVMTI code is racing with the execution of restore_unshareable_info and the setting of the mirror, then even with the lock added the JVMTI code could still get there first and find the mirror not set.

@dholmes-ora
Copy link
Member

@dholmes-ora dholmes-ora commented Oct 14, 2021

AFAICS publicize only affects logging in add_class.

@dholmes-ora
Copy link
Member

@dholmes-ora dholmes-ora commented Oct 14, 2021

> This is the assert it gets in ClassLoaderData::loaded_classes_do, where this is the NULL CLD.
> 359 if (k->is_array_klass() || (k->is_instance_klass() && InstanceKlass::cast(k)->is_loaded())) {
> 360 #ifdef ASSERT
> 361 oop m = k->java_mirror();
> 362 assert(m != NULL, "NULL mirror"); 

Maybe it is the assertion that is wrong? If we are actually adding incomplete classes to the CLD such that they are visible to traversal, then the code should not be assuming they are complete and a NULL mirror may be perfectly acceptable?

@coleenp
Copy link
Contributor Author

@coleenp coleenp commented Oct 14, 2021

Thanks Ioi for the review and for answering David's questions. Let me try to answer further.

In ClassLoaderData::loaded_classes_do, the InstanceKlass is added to the CLD but it's not marked as 'loaded' until the mirror is created, so it won't be followed. The array case has no such initialization state field, so both adding to CLD and updating the mirror must be as-if atomic. This is accomplished by locking the MultiArray_lock.

In the creation case, the ArrayKlass is added to the CLD after the mirror is created, still with the lock. In the restore_unshareable_info, the class is added to the CLD first because it shares code with InstanceKlass (it's in Klass::restore_unshareable_info). We could add some 'if' statements in Klass::restore_unshareable_info to fix the order of the ArrayKlass case, but taking out the lock will still be needed to be consistent with the creation case, and is sufficient here also.

@coleenp
Copy link
Contributor Author

@coleenp coleenp commented Oct 14, 2021

Also, I would not like to take out the null check for this case. Having code find a null mirror further down will likely cause problems in whatever the do_klass() callback wants (in this case, likely wants the mirror).

@dholmes-ora
Copy link
Member

@dholmes-ora dholmes-ora commented Oct 14, 2021

I will state again: If the JVMTI code is racing with the execution of restore_unshareable_info and the setting of the mirror, then even with the lock added the JVMTI code could still get there first and find the mirror not set.

The added lock simply ensures that the JVMTI code and the restore_unshareable_info code cannot execute concurrently - it does not guarantee that the JVMTI code can't execute until after restore_unshareable_info.

@iklam
Copy link
Member

@iklam iklam commented Oct 14, 2021

I will state again: If the JVMTI code is racing with the execution of restore_unshareable_info and the setting of the mirror, then even with the lock added the JVMTI code could still get there first and find the mirror not set.

The added lock simply ensures that the JVMTI code and the restore_unshareable_info code cannot execute concurrently - it does not guarantee that the JVMTI code can't execute until after restore_unshareable_info.

For an InstanceKlass k to be iterated on by ClassLoaderData::loaded_classes_do(), k->is_loaded() must be true. The class enters the "loaded" state when SystemDictionay::add_to_hierarchy(k) is called, which happens after k->restore_unshareable_info() has completed. So the current implementation is safe w.r.t. InstanceKlasses.

For ArrayKlasses, they would be kind of safe after this PR, as the JVMTI code will not see an ArrayKlasses that's in the middle of restore_unshareable_info(). However, it will still be possible for the JVMTI code to see the [LFooBar; class without seeing the FooBar; class. This may produce unexpected results.

@dholmes-ora
Copy link
Member

@dholmes-ora dholmes-ora commented Oct 14, 2021

We hit an assertion failure because execution of loaded_klasses_do by JVMTI encountered a shared array class that had a NULL mirror. So two things need to be known:

  1. At what point does that shared array class become visible to loaded_klasses_do? and
  2. At what point does the shared array class have its mirror set?

Either we must guarantee that the mirror is set before the array class is visible, or the two actions must be atomic with respect to the execution of loaded_klasses_do. The latter implies that any locking must cover both actions.

@coleenp
Copy link
Contributor Author

@coleenp coleenp commented Oct 15, 2021

Yes:

  1. the shared array class becomes visible to loaded_klasses_do when ClassLoaderData::add_class() is called in Klass::restore_unshareable_info
  2. the shared array class has its mirror set afterward in that same function: Klass::restore_unshareable_info

The guarantee that the two actions are atomic wrt to loaded_classes_do are because the MultiArray_lock is held while both 1 and 2 are executed. The lock keeps 1 and 2 together. The lock keeps Jvmti from iterating through the CLDG while the ArrayKlass is in an inconsistent state.

@iklam you're right - we'd see the ObjArrayKlass first since the InstanceKlass will be added to the CLD before its marked as is_loaded(). I don't know what the result of that would be. You could open a new bug for that to investigate that and try to write a test case (using JVMTI that would show if it got confused).

@dholmes-ora
Copy link
Member

@dholmes-ora dholmes-ora commented Oct 15, 2021

@coleenp , @iklam thanks for your patience on this. My confusion/concern was around whether loaded_classes_do could find the ik and from that use ik->array_klasses() to get to the ak and then find it has no mirror. But Coleen assures me that we never do that and will only access the ak via loaded_classes_do directly, which is safe due to the additional locking added.

Thanks,
David

@openjdk
Copy link

@openjdk openjdk bot commented Oct 15, 2021

@coleenp This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8274338: com/sun/jdi/RedefineCrossEvent.java failed "assert(m != __null) failed: NULL mirror"

Reviewed-by: dholmes, iklam

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 26 new commits pushed to the master branch:

  • c355704: 8041125: ColorConvertOp filter much slower in JDK 8 compared to JDK7
  • 9623d5b: 8275265: java/nio/channels tests needing large heap sizes fail on x86_32
  • 21df412: 8275240: Change nested classes in jdk.attach to static nested classes
  • a16f2d0: 8272908: Missing coverage for certain classes in com.sun.org.apache.xml.internal.security
  • ede3f4e: 8274795: AArch64: avoid spilling and restoring r18 in macro assembler
  • 40d69f0: 8254267: javax/xml/crypto/dsig/LogParameters.java failed with "RuntimeException: Unexpected log output:"
  • 54b8870: 8275035: Clean up worker thread infrastructure
  • 3b0b6ad: 8275226: Shenandoah: Relax memory constraint for worker claiming tasks/ranges
  • 8d9004b: 8274781: Use monospace font for enclosing interface
  • 333c469: 8275262: [BACKOUT] AArch64: Implement string_compare intrinsic in SVE
  • ... and 16 more: https://git.openjdk.java.net/jdk/compare/124f82377ba93359bc59118ee315ba194080fa92...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready label Oct 15, 2021
@iklam
Copy link
Member

@iklam iklam commented Oct 15, 2021

@iklam you're right - we'd see the ObjArrayKlass first since the InstanceKlass will be added to the CLD before its marked as is_loaded(). I don't know what the result of that would be. You could open a new bug for that to investigate that and try to write a test case (using JVMTI that would show if it got confused).

I filed https://bugs.openjdk.java.net/browse/JDK-8275318 "loaded_classes_do may see ArrayKlass before InstanceKlass is loaded"

iklam
iklam approved these changes Oct 15, 2021
@coleenp
Copy link
Contributor Author

@coleenp coleenp commented Oct 15, 2021

Thank you David and Ioi.
/integrate

@openjdk
Copy link

@openjdk openjdk bot commented Oct 15, 2021

Going to push as commit 172aed1.
Since your change was applied there have been 30 commits pushed to the master branch:

  • ced7909: 8275286: Check current thread when calling JRT methods that expect it
  • c0f3e1d: 8271071: accessibility of a table on macOS lacks cell navigation
  • 4cb7124: 8262912: ciReplay: replay does not simulate unresolved classes
  • 322b130: 8275106: Cleanup Iterator usages in java.desktop
  • c355704: 8041125: ColorConvertOp filter much slower in JDK 8 compared to JDK7
  • 9623d5b: 8275265: java/nio/channels tests needing large heap sizes fail on x86_32
  • 21df412: 8275240: Change nested classes in jdk.attach to static nested classes
  • a16f2d0: 8272908: Missing coverage for certain classes in com.sun.org.apache.xml.internal.security
  • ede3f4e: 8274795: AArch64: avoid spilling and restoring r18 in macro assembler
  • 40d69f0: 8254267: javax/xml/crypto/dsig/LogParameters.java failed with "RuntimeException: Unexpected log output:"
  • ... and 20 more: https://git.openjdk.java.net/jdk/compare/124f82377ba93359bc59118ee315ba194080fa92...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot closed this Oct 15, 2021
@openjdk openjdk bot added integrated and removed ready rfr labels Oct 15, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Oct 15, 2021

@coleenp Pushed as commit 172aed1.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@coleenp coleenp deleted the unshareable branch Oct 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot integrated
3 participants