Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8270085: Suspend during block transition may deadlock if lock held #4828

Closed
wants to merge 4 commits into from

Conversation

pchilano
Copy link
Contributor

@pchilano pchilano commented Jul 19, 2021

Hi all,

The following patch fixes deadlocks issues that could occur when checking for suspension while holding VM locks. See the bug description for a concrete case. The solution is to avoid checking for suspend requests when using the TBIVM wrapper. The original patch was actually written by @robehn(now on vacations) and I only made small changes.

Run tiers1-6 in mach5 and currently running tier7. I also verified that the new added test SuspendBlocked.java deadlocks with the current bits and passes with this patch.

Thanks,

Patricio


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8270085: Suspend during block transition may deadlock if lock held

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/4828/head:pull/4828
$ git checkout pull/4828

Update a local copy of the PR:
$ git checkout pull/4828
$ git pull https://git.openjdk.java.net/jdk pull/4828/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 4828

View PR using the GUI difftool:
$ git pr show -t 4828

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/4828.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Jul 19, 2021

👋 Welcome back pchilanomate! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jul 19, 2021

@pchilano The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Jul 19, 2021
@pchilano
Copy link
Contributor Author

/label add hotspot-runtime

@pchilano
Copy link
Contributor Author

/label remove hotspot

@openjdk openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label Jul 19, 2021
@openjdk
Copy link

openjdk bot commented Jul 19, 2021

@pchilano
The hotspot-runtime label was successfully added.

@openjdk openjdk bot removed the hotspot hotspot-dev@openjdk.org label Jul 19, 2021
@openjdk
Copy link

openjdk bot commented Jul 19, 2021

@pchilano
The hotspot label was successfully removed.

@pchilano pchilano marked this pull request as ready for review July 19, 2021 15:51
@openjdk openjdk bot added the rfr Pull request is ready for review label Jul 19, 2021
@mlbridge
Copy link

mlbridge bot commented Jul 19, 2021

Webrevs

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Patricio,

Thanks for taking over this fix from @robehn! And thanks Robbin for the initial fix.

So in simple terms we must never suspend whilst holding a VM internal lock (or any critical resource). So the ThreadBlockInVM destructor, which runs after we acquired a contended lock, must not suspend, but the ThreadBlockInVMPreProcess that we use for ObjectMonitors must suspend (and will release the lock).

I took a look at the other uses for ThreadBlockInVM. There are quite a few places where ThreadBlockInVM is used, some of which relate to threads that can't be suspended, but others where it looks very suspicious that we might have suspended before this change - so I think this is addressing a number of potential bugs.

I do wonder (perhaps separate RFE as it touches a number of files) whether ThreadBlockInVM should now be renamed ThreadBlockInVMNoSuspend to make it very clear the thread can't suspend? Though that might also mislead people as some of the logic is executed by threads that don't allow for suspension (not visible to Java code in general).

Overall I think this fix looks good, but I can't help wonder if the switch to using handshakes for suspend, is not still causing suspension checks in far more places than we used to check, and in potentially inappropriate (or at best unnecessary) places ... I'll take that thought up off line.

Thanks,
David

@@ -289,7 +291,7 @@ class ThreadBlockInVM {
ThreadBlockInVMPreprocess<InFlightMutexRelease> _tbivmpp;
public:
ThreadBlockInVM(JavaThread* thread, Mutex** in_flight_mutex_addr = NULL)
: _ifmr(in_flight_mutex_addr), _tbivmpp(thread, _ifmr) {}
: _ifmr(in_flight_mutex_addr), _tbivmpp(thread, _ifmr, false) {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comment:

false /* no suspend */) { }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -0,0 +1,58 @@
/*
* Copyright (c) 2020, 2021, Oracle and/or its affiliates. All rights reserved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New file so should only have one copyright year.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

public static void run_loop() {
WhiteBox wb = WhiteBox.getWhiteBox();
for (int i = 0; i < 100; i++) {
wb.lockAndBlock(false /* suspender */);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two ways to document an argument like this:

  1. What it logically means e.g. (false /* not suspender */)
  2. What parameter it refers to - in which case, to avoid confusion with comments of type 1, I suggest we use a convention (already in use) (/* suspender= */ false)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@openjdk
Copy link

openjdk bot commented Jul 20, 2021

@pchilano This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8270085: Suspend during block transition may deadlock if lock held

Reviewed-by: dholmes, coleenp

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 12 new commits pushed to the master branch:

  • 38694aa: 8270939: ProblemList java/lang/invoke/RicochetTest.java until JDK-8251969 is fixed
  • 754352f: 8270340: Base64 decodeBlock intrinsic for Power64 needs cleanup
  • 8cd0769: 8270875: Deprecate the FilterSpuriousWakeups flag so it can be removed
  • 534f005: 8268284: javax/swing/JComponent/7154030/bug7154030.java fails with "Exception: Failed to hide opaque button"
  • 00195b8: 8265604: Support unlinked classes in dynamic CDS archive
  • 7f35e5b: 8270869: G1ServiceThread may not terminate
  • c3519c3: Merge
  • c130451: 8269752: C2: assert(false) failed: Bad graph detected in build_loop_late
  • f644365: 8269689: Update --release 17 symbol information for JDK 17 build 31
  • 3fc761d: 8269032: Stringdedup tests are failing if the ergonomically select GC does not support it
  • ... and 2 more: https://git.openjdk.java.net/jdk/compare/e7cdfebbeebb274b28495b469f39d5874af45e65...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jul 20, 2021
@pchilano
Copy link
Contributor Author

Hi David,

Hi Patricio,

Thanks for taking over this fix from @robehn! And thanks Robbin for the initial fix.

So in simple terms we must never suspend whilst holding a VM internal lock (or any critical resource). So the ThreadBlockInVM destructor, which runs after we acquired a contended lock, must not suspend, but the ThreadBlockInVMPreProcess that we use for ObjectMonitors must suspend (and will release the lock).

I took a look at the other uses for ThreadBlockInVM. There are quite a few places where ThreadBlockInVM is used, some of which relate to threads that can't be suspended, but others where it looks very suspicious that we might have suspended before this change - so I think this is addressing a number of potential bugs.

I do wonder (perhaps separate RFE as it touches a number of files) whether ThreadBlockInVM should now be renamed ThreadBlockInVMNoSuspend to make it very clear the thread can't suspend? Though that might also mislead people as some of the logic is executed by threads that don't allow for suspension (not visible to Java code in general).

In general I think this details of what things get processed when transitioning from one state to another should not be exposed to the users unless is necessary. In this case I think most users of TBIVM don't really care about processing suspend requests or not other than TBIVM should do whatever is right, which here is just never process them. For the special cases of java monitors and jvmti raw monitors where we specifically want to honor suspend requests we have the special ThreadBlockInVMPreprocess. I actually looked into having a single TBIVM with an additional parameter to allow for suspend in those special cases. It wasn't straightforward because of the template but I'll try to revisit that.

Thanks for reviewing this David!

Patricio

@dcubed-ojdk
Copy link
Member

This fix appears to be baselined on https://github.com/openjdk/jdk
instead of JDK17. The bug is targeted at JDK17 so you probably need
to reparent the PR (if possible) or recreate it (ouch).

I'm going to go ahead and start my review anyway.

@pchilano
Copy link
Contributor Author

pchilano commented Jul 20, 2021

This fix appears to be baselined on https://github.com/openjdk/jdk
instead of JDK17. The bug is targeted at JDK17 so you probably need
to reparent the PR (if possible) or recreate it (ouch).

I'm going to go ahead and start my review anyway.

Ok, I thought I was supposed to integrate it in 18 first and then backport it (?).

@dcubed-ojdk
Copy link
Member

It's a P2 so you should request permission to integrate it in JDK17 (follow
the RDP2 rules) and integrate there first. It will be automatically forward
ported when @JesperIRL syncs from JDK17 -> JDK18.

// suspended in ~ThreadBlockInVM. This verifies we only suspend
// at the right place.
while (Atomic::cmpxchg(&_emulated_lock, 0, 1) != 0) {}
assert(_emulated_lock == 1, "Not locked");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/"Not locked"/"Must be locked"/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

}
Atomic::store(&_emulated_lock, 0);
WB_END

//Some convenience methods to deal with objects from java
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit typo: please add a space before 'Some' (not your bug, but since you are here...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment on lines 2641 to 2643

{CC"lockAndBlock", CC"(Z)V", (void*)&WB_LockAndBlock},

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why you added a blank line above and below...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@@ -441,6 +451,11 @@ bool HandshakeState::have_non_self_executable_operation() {
return _queue.contains(non_self_queue_filter);
}

bool HandshakeState::has_none_suspend_operation() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is bugging me: has_none_suspend_operation()

I think the function is supposed to return true if the queue contains
at least one non-suspend operation. If that's the case, then perhaps
it should be named:

has_a_non_suspend_operation()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me too.

@@ -289,7 +291,7 @@ class ThreadBlockInVM {
ThreadBlockInVMPreprocess<InFlightMutexRelease> _tbivmpp;
public:
ThreadBlockInVM(JavaThread* thread, Mutex** in_flight_mutex_addr = NULL)
: _ifmr(in_flight_mutex_addr), _tbivmpp(thread, _ifmr) {}
: _ifmr(in_flight_mutex_addr), _tbivmpp(thread, _ifmr, false /* no suspend */) {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dholmes-ora asked in a different place for a change like this:

/* allow_suspend= */ false

I think that makes sense here too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I said there were two styles for doing this kind of commenting. I like this style and suggested the comment.

Copy link
Contributor

@coleenp coleenp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some comments and questions. Looks good!

}
// Handshakes cannot safely safepoint.
// The exception to this rule is the asynchronous suspension handshake.
// It by-passes the NSV by manually doing the transition.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what this comment means in this context. Below we take out a lock with _no_safepoint_check which is essentially a NSV. You just moved this comment so I don't suggest changing it at this time.

// actual suspend since Handshake::execute() above only installed
// the asynchronous handshake.
SafepointMechanism::process_if_requested(self);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an optimization? Or can the thread escape the suspend request?

/*
* @test SuspendBlocked
* @bug 8270085
* @library /testlibrary /test/lib
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is referring to /testlibrary that I removed. Can you see if it's still needed?

@pchilano
Copy link
Contributor Author

jdk17 PR: openjdk/jdk17#257

@pchilano
Copy link
Contributor Author

Closing and using jdk17 PR: openjdk/jdk17#257

@pchilano pchilano closed this Jul 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime hotspot-runtime-dev@openjdk.org ready Pull request is ready to be integrated rfr Pull request is ready for review
Development

Successfully merging this pull request may close these issues.

4 participants