-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8270085: Suspend during block transition may deadlock if lock held #4828
Conversation
👋 Welcome back pchilanomate! A progress list of the required criteria for merging this PR into |
/label add hotspot-runtime |
/label remove hotspot |
@pchilano |
@pchilano |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Patricio,
Thanks for taking over this fix from @robehn! And thanks Robbin for the initial fix.
So in simple terms we must never suspend whilst holding a VM internal lock (or any critical resource). So the ThreadBlockInVM destructor, which runs after we acquired a contended lock, must not suspend, but the ThreadBlockInVMPreProcess that we use for ObjectMonitors must suspend (and will release the lock).
I took a look at the other uses for ThreadBlockInVM. There are quite a few places where ThreadBlockInVM is used, some of which relate to threads that can't be suspended, but others where it looks very suspicious that we might have suspended before this change - so I think this is addressing a number of potential bugs.
I do wonder (perhaps separate RFE as it touches a number of files) whether ThreadBlockInVM should now be renamed ThreadBlockInVMNoSuspend to make it very clear the thread can't suspend? Though that might also mislead people as some of the logic is executed by threads that don't allow for suspension (not visible to Java code in general).
Overall I think this fix looks good, but I can't help wonder if the switch to using handshakes for suspend, is not still causing suspension checks in far more places than we used to check, and in potentially inappropriate (or at best unnecessary) places ... I'll take that thought up off line.
Thanks,
David
@@ -289,7 +291,7 @@ class ThreadBlockInVM { | |||
ThreadBlockInVMPreprocess<InFlightMutexRelease> _tbivmpp; | |||
public: | |||
ThreadBlockInVM(JavaThread* thread, Mutex** in_flight_mutex_addr = NULL) | |||
: _ifmr(in_flight_mutex_addr), _tbivmpp(thread, _ifmr) {} | |||
: _ifmr(in_flight_mutex_addr), _tbivmpp(thread, _ifmr, false) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add comment:
false /* no suspend */) { }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -0,0 +1,58 @@ | |||
/* | |||
* Copyright (c) 2020, 2021, Oracle and/or its affiliates. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New file so should only have one copyright year.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
public static void run_loop() { | ||
WhiteBox wb = WhiteBox.getWhiteBox(); | ||
for (int i = 0; i < 100; i++) { | ||
wb.lockAndBlock(false /* suspender */); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two ways to document an argument like this:
- What it logically means e.g.
(false /* not suspender */)
- What parameter it refers to - in which case, to avoid confusion with comments of type 1, I suggest we use a convention (already in use)
(/* suspender= */ false)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
@pchilano This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 12 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
Hi David,
In general I think this details of what things get processed when transitioning from one state to another should not be exposed to the users unless is necessary. In this case I think most users of TBIVM don't really care about processing suspend requests or not other than TBIVM should do whatever is right, which here is just never process them. For the special cases of java monitors and jvmti raw monitors where we specifically want to honor suspend requests we have the special ThreadBlockInVMPreprocess. I actually looked into having a single TBIVM with an additional parameter to allow for suspend in those special cases. It wasn't straightforward because of the template but I'll try to revisit that. Thanks for reviewing this David! Patricio |
This fix appears to be baselined on https://github.com/openjdk/jdk I'm going to go ahead and start my review anyway. |
Ok, I thought I was supposed to integrate it in 18 first and then backport it (?). |
It's a P2 so you should request permission to integrate it in JDK17 (follow |
src/hotspot/share/prims/whitebox.cpp
Outdated
// suspended in ~ThreadBlockInVM. This verifies we only suspend | ||
// at the right place. | ||
while (Atomic::cmpxchg(&_emulated_lock, 0, 1) != 0) {} | ||
assert(_emulated_lock == 1, "Not locked"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/"Not locked"/"Must be locked"/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
src/hotspot/share/prims/whitebox.cpp
Outdated
} | ||
Atomic::store(&_emulated_lock, 0); | ||
WB_END | ||
|
||
//Some convenience methods to deal with objects from java |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit typo: please add a space before 'Some' (not your bug, but since you are here...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
src/hotspot/share/prims/whitebox.cpp
Outdated
|
||
{CC"lockAndBlock", CC"(Z)V", (void*)&WB_LockAndBlock}, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why you added a blank line above and below...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
@@ -441,6 +451,11 @@ bool HandshakeState::have_non_self_executable_operation() { | |||
return _queue.contains(non_self_queue_filter); | |||
} | |||
|
|||
bool HandshakeState::has_none_suspend_operation() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name is bugging me: has_none_suspend_operation()
I think the function is supposed to return true if the queue contains
at least one non-suspend operation. If that's the case, then perhaps
it should be named:
has_a_non_suspend_operation()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Me too.
@@ -289,7 +291,7 @@ class ThreadBlockInVM { | |||
ThreadBlockInVMPreprocess<InFlightMutexRelease> _tbivmpp; | |||
public: | |||
ThreadBlockInVM(JavaThread* thread, Mutex** in_flight_mutex_addr = NULL) | |||
: _ifmr(in_flight_mutex_addr), _tbivmpp(thread, _ifmr) {} | |||
: _ifmr(in_flight_mutex_addr), _tbivmpp(thread, _ifmr, false /* no suspend */) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dholmes-ora asked in a different place for a change like this:
/* allow_suspend= */ false
I think that makes sense here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I said there were two styles for doing this kind of commenting. I like this style and suggested the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some comments and questions. Looks good!
} | ||
// Handshakes cannot safely safepoint. | ||
// The exception to this rule is the asynchronous suspension handshake. | ||
// It by-passes the NSV by manually doing the transition. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea what this comment means in this context. Below we take out a lock with _no_safepoint_check which is essentially a NSV. You just moved this comment so I don't suggest changing it at this time.
// actual suspend since Handshake::execute() above only installed | ||
// the asynchronous handshake. | ||
SafepointMechanism::process_if_requested(self); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an optimization? Or can the thread escape the suspend request?
/* | ||
* @test SuspendBlocked | ||
* @bug 8270085 | ||
* @library /testlibrary /test/lib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is referring to /testlibrary that I removed. Can you see if it's still needed?
jdk17 PR: openjdk/jdk17#257 |
Closing and using jdk17 PR: openjdk/jdk17#257 |
Hi all,
The following patch fixes deadlocks issues that could occur when checking for suspension while holding VM locks. See the bug description for a concrete case. The solution is to avoid checking for suspend requests when using the TBIVM wrapper. The original patch was actually written by @robehn(now on vacations) and I only made small changes.
Run tiers1-6 in mach5 and currently running tier7. I also verified that the new added test SuspendBlocked.java deadlocks with the current bits and passes with this patch.
Thanks,
Patricio
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/4828/head:pull/4828
$ git checkout pull/4828
Update a local copy of the PR:
$ git checkout pull/4828
$ git pull https://git.openjdk.java.net/jdk pull/4828/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 4828
View PR using the GUI difftool:
$ git pr show -t 4828
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/4828.diff