Skip to content
This repository has been archived by the owner on Sep 2, 2022. It is now read-only.
/ jdk17 Public archive

8270085: Suspend during block transition may deadlock if lock held #257

Closed
wants to merge 3 commits into from

Conversation

pchilano
Copy link
Contributor

@pchilano pchilano commented Jul 20, 2021

Hi all,

The following patch fixes deadlocks issues that could occur when checking for suspension while holding VM locks. See the bug description for a concrete case. The solution is to avoid checking for suspend requests when using the TBIVM wrapper. The original patch was actually written by @robehn(now on vacations) and I only made small changes.

Tested in mach5 tiers1-7. I also verified that the new added test SuspendBlocked.java deadlocks with the current bits and passes with this patch.

Thanks,

Patricio


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8270085: Suspend during block transition may deadlock if lock held

Reviewers

Contributors

  • Robbin Ehn <rehn@openjdk.org>
  • Patricio Chilano Mateo <pchilanomate@openjdk.org>

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk17 pull/257/head:pull/257
$ git checkout pull/257

Update a local copy of the PR:
$ git checkout pull/257
$ git pull https://git.openjdk.java.net/jdk17 pull/257/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 257

View PR using the GUI difftool:
$ git pr show -t 257

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk17/pull/257.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Jul 20, 2021

👋 Welcome back pchilanomate! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jul 20, 2021

@pchilano The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.java.net label Jul 20, 2021
@pchilano
Copy link
Contributor Author

/label add hotspot-runtime

@pchilano
Copy link
Contributor Author

/label remove hotspot

@openjdk openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.java.net label Jul 20, 2021
@openjdk
Copy link

openjdk bot commented Jul 20, 2021

@pchilano
The hotspot-runtime label was successfully added.

@openjdk openjdk bot removed the hotspot hotspot-dev@openjdk.java.net label Jul 20, 2021
@openjdk
Copy link

openjdk bot commented Jul 20, 2021

@pchilano
The hotspot label was successfully removed.

@pchilano pchilano marked this pull request as ready for review July 20, 2021 21:28
@openjdk openjdk bot added the rfr Pull request is ready for review label Jul 20, 2021
@mlbridge
Copy link

mlbridge bot commented Jul 20, 2021

Webrevs

Copy link
Member

@dcubed-ojdk dcubed-ojdk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thumbs up.

@openjdk
Copy link

openjdk bot commented Jul 20, 2021

@pchilano This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8270085: Suspend during block transition may deadlock if lock held

Co-authored-by: Robbin Ehn <rehn@openjdk.org>
Co-authored-by: Patricio Chilano Mateo <pchilanomate@openjdk.org>
Reviewed-by: dcubed, dholmes, coleenp

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 23 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jul 20, 2021
@pchilano
Copy link
Contributor Author

Thumbs up.

Thanks for the review Dan!

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Sorry I missed the fact this should have been based on 17 to begin with.

Copy link
Contributor

@coleenp coleenp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some comments and questions. Looks good!

// Handshakes cannot safely safepoint.
// The exception to this rule is the asynchronous suspension handshake.
// It by-passes the NSV by manually doing the transition.
NoSafepointVerifier nsv;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what this comment means in this context. Below we take out a lock with _no_safepoint_check which is essentially a NSV. You just moved this comment so I don't suggest changing it at this time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it's created with _allow_vm_block = true so the NSV is redundant. The _no_safepoint_check actually means we don't check for safepoints if there is contention while acquiring the monitor but we don't do inc_no_safepoint_count() to enforce no safepoints later though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A side effect of the no_safepoint_check locking is that it has an implicit NSV though. We do call inc_no_safepoint_count when setting the owner, but I remembered this wrong. It's not unconditional:

// NSV implied with locking allow_vm_block flag.
// The tty_lock is special because it is released for the safepoint by
// the safepoint mechanism.
if (new_owner->is_Java_thread() && _allow_vm_block && this != tty_lock) {
  JavaThread::cast(new_owner)->inc_no_safepoint_count();
}

// actual suspend since Handshake::execute() above only installed
// the asynchronous handshake.
SafepointMechanism::process_if_requested(self);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an optimization? Or can the thread escape the suspend request?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For self-suspend requests coming from Java we will suspend when going back to Java. In the case of JVMTI self-suspend requests I think Robbin found some JVMTI tests that were expecting to not return back to native without suspending.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should document the order of events why this code is needed in a future RFE, because this comment doesn't really explain it to me and knowing where to put all these process_if_requested() and process() calls is far from obvious. I believe you and Robbin, but there are too many mysteries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented the line and run all the serviceability/ tests to look for issues. The only test that failed was SuspendWithCurrentThread.java due to the recent 8264663 change (openjdk/jdk@b5c6351). The patch aimed to verify that the self-suspending thread actually suspended and didn't exited, but the check is too restrictive because it doesn't consider that the JT will still suspend when going back to Java. That check can be moved to Java to verify the suspender didn't return from suspendTestedThreads().
Maybe I should do that in another RFE along with removing this code?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would be good to clear up in a future RFE.

/*
* @test SuspendBlocked
* @bug 8270085
* @library /testlibrary /test/lib
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is referring to /testlibrary that I removed. Can you see if it's still needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed, removed.

@pchilano
Copy link
Contributor Author

Looks good.

Sorry I missed the fact this should have been based on 17 to begin with.

Thanks for the review David!

@pchilano
Copy link
Contributor Author

Just some comments and questions. Looks good!

Thanks for the review Coleen!

Copy link
Contributor

@coleenp coleenp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still good.

// Handshakes cannot safely safepoint.
// The exception to this rule is the asynchronous suspension handshake.
// It by-passes the NSV by manually doing the transition.
NoSafepointVerifier nsv;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A side effect of the no_safepoint_check locking is that it has an implicit NSV though. We do call inc_no_safepoint_count when setting the owner, but I remembered this wrong. It's not unconditional:

// NSV implied with locking allow_vm_block flag.
// The tty_lock is special because it is released for the safepoint by
// the safepoint mechanism.
if (new_owner->is_Java_thread() && _allow_vm_block && this != tty_lock) {
  JavaThread::cast(new_owner)->inc_no_safepoint_count();
}

// actual suspend since Handshake::execute() above only installed
// the asynchronous handshake.
SafepointMechanism::process_if_requested(self);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should document the order of events why this code is needed in a future RFE, because this comment doesn't really explain it to me and knowing where to put all these process_if_requested() and process() calls is far from obvious. I believe you and Robbin, but there are too many mysteries.

Copy link
Member

@dcubed-ojdk dcubed-ojdk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still good.

@pchilano
Copy link
Contributor Author

While chatting with Coleen I realized the comment in should_process_no_suspend() wasn't accurate so I fixed it. Not necessarily there will be a suspend request when reaching that path. The poll could be armed just because a safepoint or a non-suspend handshake operation was executed while the thread was safepoint safe.

Copy link
Member

@dcubed-ojdk dcubed-ojdk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adjusting the comment. Still good.

@openjdk
Copy link

openjdk bot commented Jul 22, 2021

@pchilano Could not parse @robehn @pchilano as a valid contributor.
Syntax: /contributor (add|remove) [@user | openjdk-user | Full Name <email@address>]. For example:

  • /contributor add @openjdk-bot
  • /contributor add duke
  • /contributor add J. Duke <duke@openjdk.org>

@pchilano
Copy link
Contributor Author

/contributor add @robehn

@pchilano
Copy link
Contributor Author

/contributor add @pchilano

@openjdk
Copy link

openjdk bot commented Jul 22, 2021

@pchilano
Contributor Robbin Ehn <rehn@openjdk.org> successfully added.

@openjdk
Copy link

openjdk bot commented Jul 22, 2021

@pchilano
Contributor Patricio Chilano Mateo <pchilanomate@openjdk.org> successfully added.

@pchilano
Copy link
Contributor Author

Thanks @dholmes-ora, @dcubed-ojdk and @coleenp for the reviews!

@pchilano
Copy link
Contributor Author

/integrate

@openjdk
Copy link

openjdk bot commented Jul 22, 2021

Going to push as commit e7f9009.
Since your change was applied there have been 23 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot closed this Jul 22, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jul 22, 2021
@openjdk
Copy link

openjdk bot commented Jul 22, 2021

@pchilano Pushed as commit e7f9009.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
hotspot-runtime hotspot-runtime-dev@openjdk.java.net integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

4 participants