-
Notifications
You must be signed in to change notification settings - Fork 145
8270085: Suspend during block transition may deadlock if lock held #257
Conversation
👋 Welcome back pchilanomate! A progress list of the required criteria for merging this PR into |
/label add hotspot-runtime |
/label remove hotspot |
@pchilano |
@pchilano |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thumbs up.
@pchilano This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 23 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
Thanks for the review Dan! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Sorry I missed the fact this should have been based on 17 to begin with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some comments and questions. Looks good!
// Handshakes cannot safely safepoint. | ||
// The exception to this rule is the asynchronous suspension handshake. | ||
// It by-passes the NSV by manually doing the transition. | ||
NoSafepointVerifier nsv; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea what this comment means in this context. Below we take out a lock with _no_safepoint_check which is essentially a NSV. You just moved this comment so I don't suggest changing it at this time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, it's created with _allow_vm_block = true so the NSV is redundant. The _no_safepoint_check actually means we don't check for safepoints if there is contention while acquiring the monitor but we don't do inc_no_safepoint_count() to enforce no safepoints later though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A side effect of the no_safepoint_check locking is that it has an implicit NSV though. We do call inc_no_safepoint_count when setting the owner, but I remembered this wrong. It's not unconditional:
// NSV implied with locking allow_vm_block flag.
// The tty_lock is special because it is released for the safepoint by
// the safepoint mechanism.
if (new_owner->is_Java_thread() && _allow_vm_block && this != tty_lock) {
JavaThread::cast(new_owner)->inc_no_safepoint_count();
}
// actual suspend since Handshake::execute() above only installed | ||
// the asynchronous handshake. | ||
SafepointMechanism::process_if_requested(self); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an optimization? Or can the thread escape the suspend request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For self-suspend requests coming from Java we will suspend when going back to Java. In the case of JVMTI self-suspend requests I think Robbin found some JVMTI tests that were expecting to not return back to native without suspending.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should document the order of events why this code is needed in a future RFE, because this comment doesn't really explain it to me and knowing where to put all these process_if_requested() and process() calls is far from obvious. I believe you and Robbin, but there are too many mysteries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I commented the line and run all the serviceability/ tests to look for issues. The only test that failed was SuspendWithCurrentThread.java due to the recent 8264663 change (openjdk/jdk@b5c6351). The patch aimed to verify that the self-suspending thread actually suspended and didn't exited, but the check is too restrictive because it doesn't consider that the JT will still suspend when going back to Java. That check can be moved to Java to verify the suspender didn't return from suspendTestedThreads().
Maybe I should do that in another RFE along with removing this code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that would be good to clear up in a future RFE.
/* | ||
* @test SuspendBlocked | ||
* @bug 8270085 | ||
* @library /testlibrary /test/lib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is referring to /testlibrary that I removed. Can you see if it's still needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed, removed.
Thanks for the review David! |
Thanks for the review Coleen! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still good.
// Handshakes cannot safely safepoint. | ||
// The exception to this rule is the asynchronous suspension handshake. | ||
// It by-passes the NSV by manually doing the transition. | ||
NoSafepointVerifier nsv; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A side effect of the no_safepoint_check locking is that it has an implicit NSV though. We do call inc_no_safepoint_count when setting the owner, but I remembered this wrong. It's not unconditional:
// NSV implied with locking allow_vm_block flag.
// The tty_lock is special because it is released for the safepoint by
// the safepoint mechanism.
if (new_owner->is_Java_thread() && _allow_vm_block && this != tty_lock) {
JavaThread::cast(new_owner)->inc_no_safepoint_count();
}
// actual suspend since Handshake::execute() above only installed | ||
// the asynchronous handshake. | ||
SafepointMechanism::process_if_requested(self); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should document the order of events why this code is needed in a future RFE, because this comment doesn't really explain it to me and knowing where to put all these process_if_requested() and process() calls is far from obvious. I believe you and Robbin, but there are too many mysteries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still good.
While chatting with Coleen I realized the comment in should_process_no_suspend() wasn't accurate so I fixed it. Not necessarily there will be a suspend request when reaching that path. The poll could be armed just because a safepoint or a non-suspend handshake operation was executed while the thread was safepoint safe. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adjusting the comment. Still good.
@pchilano Could not parse
|
/contributor add @robehn |
/contributor add @pchilano |
@pchilano |
@pchilano |
Thanks @dholmes-ora, @dcubed-ojdk and @coleenp for the reviews! |
/integrate |
Going to push as commit e7f9009.
Your commit was automatically rebased without conflicts. |
Hi all,
The following patch fixes deadlocks issues that could occur when checking for suspension while holding VM locks. See the bug description for a concrete case. The solution is to avoid checking for suspend requests when using the TBIVM wrapper. The original patch was actually written by @robehn(now on vacations) and I only made small changes.
Tested in mach5 tiers1-7. I also verified that the new added test SuspendBlocked.java deadlocks with the current bits and passes with this patch.
Thanks,
Patricio
Progress
Issue
Reviewers
Contributors
<rehn@openjdk.org>
<pchilanomate@openjdk.org>
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk17 pull/257/head:pull/257
$ git checkout pull/257
Update a local copy of the PR:
$ git checkout pull/257
$ git pull https://git.openjdk.java.net/jdk17 pull/257/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 257
View PR using the GUI difftool:
$ git pr show -t 257
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk17/pull/257.diff