Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8274282: Clarify special wait assert #5684

Closed
wants to merge 4 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
@@ -384,14 +384,20 @@ void Mutex::check_rank(Thread* thread) {
if (owned_by_self()) {
// wait() case
Mutex* least = get_least_ranked_lock_besides_this(locks_owned);
// We enforce not holding locks of rank nosafepoint or lower while waiting.
// We enforce not holding locks of rank nosafepoint or lower while waiting for JavaThreads,
// because the held lock has a NoSafepointVerifier so waiting on a lock will block out safepoints.
Copy link
Member

@dholmes-ora dholmes-ora Sep 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NSV is not the reason, it is a mechanism to detect violation of the rules. The reason we must not block in a JavaThread is because it will block without a safepoint check and without becoming safepoint-safe, thus blocking out safepoints while the wait is in progress.

Also note that this applies to any wait() on a nosafepoint lock, not just one done while holding a safepoint lock.

Copy link
Contributor

@pchilano pchilano Oct 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can probably shed some light on the origin of this comment since it came from discussions I had with Coleen. We were looking at cases like the Service_lock, where the ServiceThread waits on a lock with rank <= nosafepoint but within the context of a ThreadBlockInVM. If a JT would call wait() on a nosafepoint lock and we detect it holds some other nosafepoint lock, then that would imply there was no immediate TBIVM, like with the Service_lock case, because that would have asserted due to the NSV and so we know the call will block safepoints (assuming no manual changes to _thread_blocked that would bypass the check_possible_safepoint() call or some weird use of the TBIVM like: TBIVM, lock(a), lock(b), wait(b), where a and b are nosafepoint locks). But as you pointed out the issue of this JT blocking safepoints is a consequence of waiting on nosafepoint locks, regardless of which other locks the JT is holding. It just happens that if the JT is holding other nosafepoint locks then we already know this will block out safepoints (ruling out those behaviors described above).

I also just realized that waiting while holding a nosafepoint lock could also block out safepoints due to other JT's being blocked in a safepoint-unsafe state while trying to lock that monitor. So even if the waiting JT did a manual transition to _thread_blocked or used the TBIVM in some weird way, safepoints could still be blocked out due to those other JT's.

But in any case, if we want to verify if a wait() call might block safepoints, we should verify all cases where this JT will wait in an unsafe state, and that means checking whether the current lock is nosafepoint and the current state of the JT: "if thread->is_Java_thread() && this->rank <= Mutex::nosafepoint && (thread->thread_state() != _thread_blocked)". Today that would assert since we have cases where a JT waits on a nosafepoint lock without being _thread_blocked, mainly os::create_thread() and JfrThreadSampler::on_javathread_suspend() from some tests I run. VMThread::wait_for_vm_thread_exit() and ThreadsSMRSupport::wait_until_not_protected() also assert but in those cases the JT was already removed from the JT list so it wouldn't be blocking safepoints.

Copy link
Member

@dholmes-ora dholmes-ora Oct 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think that a JT should ever switch to _thread_blocked when waiting on a non-safepoint monitor! That is totally confusing/conflating the very notions of having safepoint checking and non-safepoint checking mutexes/monitors. If you have to make yourself appear safepoint-safe when waiting on a non-safepoint monitor then that says to me that the monitor should not be a non-safepoint one! I need to look at the history of how we have defined and used the Service_lock to see why we do this.

The thread creation situation is somewhat complex because the newly running thread is not initially able to participate in safepoints and so although a JT it has to wait without a safepoint check. But the creating thread could theoretically wait with safepoint checks. But that would require the monitor to be safepoint-check-sometimes which we removed. Alternatively we may need to do something different with how _thread_new state is treated.

Anyway this is digressing somewhat. :)

I still dislike the wording of the comment - sorry.

Copy link
Contributor Author

@coleenp coleenp Oct 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the explanation and further experiments @pchilano .

VMThread::wait_for_vm_thread_exit() and ThreadsSMRSupport::wait_until_not_protected() also assert but in those cases the JT was already removed from the JT list so it wouldn't be blocking safepoints.

I wonder if the check should be is_active_Java_thread instead?

The Service_lock, EscapeBarrier_lock, MonitorDeflation_lock, Notification_lock, and CodeSweeper_lock are held with a TBIVM (same mechanism). These locks are fairly low level locks and don't check for safepoints.

I think the problematic scenario that Patricio is referring to is:
Thread1: TBIVM (state == _thread_blocked), comes out of wait so owns Service_lock, does things while safepoint safe
Thread2: waiting on Service_lock, so blocking safepoint
But then Thread1 will come back to wait when done with things, Thread2 will proceed then get to it's safepoint check when done with eg. Service_lock.

Copy link
Contributor Author

@coleenp coleenp Oct 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also trying the experiment that Patricio refers to above:

+    } else if (thread->is_active_Java_thread() && rank() <= Mutex::nosafepoint) {
+      // Don't allow JavaThread to wait on non-safepoint checking lock if not in a safe state.
+      JavaThread* current = JavaThread::cast(thread);
+      assert(current->thread_state() == _thread_blocked || current->thread_state() == _thread_in_native,
+             "Cannot wait on active Java thread in unsafe state");

but the os::create_thread() case is an active Java thread.

Copy link
Contributor Author

@coleenp coleenp Oct 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't easily add this new assert. It's going to take a lot more code so should be done in a separate RFE.

// For NonJavaThreads, we enforce not waiting on the tty lock since that could block progress also.
coleenp marked this conversation as resolved.
Show resolved Hide resolved
// Also "this" should be the monitor with lowest rank owned by this thread.
if (least != NULL && (least->rank() <= nosafepoint || least->rank() <= this->rank())) {
if (least != NULL && ((least->rank() <= Mutex::nosafepoint && thread->is_Java_thread()) ||
least->rank() <= Mutex::tty ||
least->rank() <= this->rank())) {
assert(false, "Attempting to wait on monitor %s/%d while holding lock %s/%d -- "
"possible deadlock. %s", name(), rank(), least->name(), least->rank(),
least->rank() <= this->rank() ?
"Should wait on the least ranked monitor from all owned locks." :
"Should not block(wait) while holding a lock of rank nosafepoint or below.");
thread->is_Java_thread() ?
"Should not block(wait) while holding a lock of rank nosafepoint or below." :
"Should not block(wait) while holding a lock of rank tty or below.");
}
} else {
// lock()/lock_without_safepoint_check()/try_lock() case
@@ -220,20 +220,27 @@ TEST_VM_ASSERT_MSG(MutexRank, monitor_wait_rank_nosafepoint,
monitor_rank_nosafepoint->unlock();
}

// NonJavaThreads can't wait while holding tty lock or below.
class VM_MutexWaitTTY : public VM_GTestExecuteAtSafepoint {
public:
void doit() {
Monitor* monitor_rank_tty = new Monitor(Mutex::tty, "monitor_rank_tty", Mutex::_safepoint_check_never);
Monitor* monitor_rank_event = new Monitor(Mutex::event, "monitor_rank_event", Mutex::_safepoint_check_never);

monitor_rank_tty->lock_without_safepoint_check();
monitor_rank_event->lock_without_safepoint_check();
monitor_rank_event->wait_without_safepoint_check(1);
monitor_rank_event->unlock();
monitor_rank_tty->unlock();
}
};

TEST_VM_ASSERT_MSG(MutexRank, monitor_wait_event_tty,
".* Attempting to wait on monitor monitor_rank_event/0 while holding lock monitor_rank_tty/.*"
"-- possible deadlock. Should not block\\(wait\\) while holding a lock of rank nosafepoint or below.") {
JavaThread* THREAD = JavaThread::current();
ThreadInVMfromNative invm(THREAD);

Monitor* monitor_rank_tty = new Monitor(Mutex::tty, "monitor_rank_tty", Mutex::_safepoint_check_never);
Monitor* monitor_rank_event = new Monitor(Mutex::event, "monitor_rank_event", Mutex::_safepoint_check_never);

monitor_rank_tty->lock_without_safepoint_check();
monitor_rank_event->lock_without_safepoint_check();
monitor_rank_event->wait_without_safepoint_check(1);
monitor_rank_event->unlock();
monitor_rank_tty->unlock();
"-- possible deadlock. Should not block\\(wait\\) while holding a lock of rank tty or below.") {
VM_MutexWaitTTY op;
ThreadInVMfromNative invm(JavaThread::current());
VMThread::execute(&op);
}

TEST_VM_ASSERT_MSG(MutexRank, monitor_wait_tty_nosafepoint,