Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8249004: Reduce ThreadsListHandle overhead in relation to direct handshakes #4677

Closed
wants to merge 14 commits into from
Closed
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
@@ -1560,7 +1560,7 @@ JvmtiEnv::GetThreadListStackTraces(jint thread_count, const jthread* thread_list
}

GetSingleStackTraceClosure op(this, current_thread, *thread_list, max_frame_count);
Handshake::execute(&op, java_thread);
Handshake::execute(&op, &tlh, java_thread);
err = op.result();
if (err == JVMTI_ERROR_NONE) {
*stack_info_ptr = op.stack_info();
@@ -616,8 +616,14 @@ JvmtiEventControllerPrivate::recompute_enabled() {
}

// compute and set thread-filtered events
for (JvmtiThreadState *state = JvmtiThreadState::first(); state != NULL; state = state->next()) {
any_env_thread_enabled |= recompute_thread_enabled(state);
JvmtiThreadState *state = JvmtiThreadState::first();
if (state != nullptr) {
// If we have a JvmtiThreadState, then we've reached the point where
// threads can exist so create a ThreadsListHandle to protect them.
ThreadsListHandle tlh;
Copy link
Member

@dholmes-ora dholmes-ora Oct 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on the missing TLH for this code.

Copy link
Member Author

@dcubed-ojdk dcubed-ojdk Oct 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't quite missing from the baseline code. This version of execute():

Handshake::execute(HandshakeClosure* hs_cl, JavaThread* target)

used to always create a ThreadsListHandle. I added a ThreadsListHandle*
parameter to that version and created a wrapper with the existing signature
to pass nullptr to the execute() version with the ThreadsListHandle*
parameter. What that means is that all existing callers of:

Handshake::execute(HandshakeClosure* hs_cl, JavaThread* target)

no longer had a ThreadsListHandle created for them. With the new sanity
check in place, I shook the trees to make sure that we had explicit
ThreadsListHandles in place for the locations that needed them.

JvmtiEventControllerPrivate::recompute_enabled() happened to be
one of the places where the ThreadsListHandle created by execute()
was hiding the fact that recompute_enabled() needed one.

Copy link
Contributor

@coleenp coleenp Nov 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the ThreadsListHandle protect JvmtiThreadState::first also? If state->next() needs it why doesn't the first entry need this? There's no atomic load on the _head field.

Copy link
Member Author

@dcubed-ojdk dcubed-ojdk Nov 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ThreadsListHandle protects JavaThread objects not JvmtiThreadState objects.
JvmtiThreadState::first() returns the head of the global list of JvmtiThreadState
objects for the system. Each JvmtiThreadState object contains a JavaThread* and
we have to protect use of the JavaThread* which can happen in the
recompute_thread_enabled(state) call below.

Copy link
Contributor

@coleenp coleenp Nov 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JvmtiThreadState objects point to JavaThread and vice versa, so I still don't see why you don't protect the first element.

Copy link
Member Author

@dcubed-ojdk dcubed-ojdk Nov 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've written up a rather long analysis about how the use of JvmtiThreadState_lock
in JvmtiEventControllerPrivate::recompute_enabled() means that we can safely
traverse the JvmtiThreadState list returned by JvmtiThreadState::first() without
racing with an exiting JavaThread. I've sent it to you via direct email.

I could amend the comment above the ThreadsListHandle like this:

    // If we have a JvmtiThreadState, then we've reached the point where
    // threads can exist so create a ThreadsListHandle to protect them.
    // The held JvmtiThreadState_lock prevents exiting JavaThreads from
    // being removed from the JvmtiThreadState list we're about to walk
    // so this ThreadsListHandle exists just to satisfy the lower level sanity
    // checks that the target JavaThreads are protected.
    ThreadsListHandle tlh;

Copy link
Contributor

@coleenp coleenp Nov 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this comment is good and it does explain why it's safe and why the TLH is there. Thanks for doing the deep dive and explanation.

Copy link
Member Author

@dcubed-ojdk dcubed-ojdk Nov 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated comment has been pushed to this PR.

for (; state != nullptr; state = state->next()) {
any_env_thread_enabled |= recompute_thread_enabled(state);
}
}

// set universal state (across all envs and threads)
@@ -2072,8 +2072,9 @@ WB_ENTRY(jboolean, WB_HandshakeReadMonitors(JNIEnv* env, jobject wb, jobject thr
ReadMonitorsClosure rmc;
oop thread_oop = JNIHandles::resolve(thread_handle);
if (thread_oop != NULL) {
ThreadsListHandle tlh;
JavaThread* target = java_lang_Thread::thread(thread_oop);
Handshake::execute(&rmc, target);
Handshake::execute(&rmc, &tlh, target);
}
return rmc.executed();
WB_END
@@ -2104,8 +2105,9 @@ WB_ENTRY(jint, WB_HandshakeWalkStack(JNIEnv* env, jobject wb, jobject thread_han
} else {
oop thread_oop = JNIHandles::resolve(thread_handle);
if (thread_oop != NULL) {
ThreadsListHandle tlh;
JavaThread* target = java_lang_Thread::thread(thread_oop);
Handshake::execute(&tsc, target);
Handshake::execute(&tsc, &tlh, target);
}
}
return tsc.num_threads_completed();
@@ -2131,7 +2133,10 @@ WB_ENTRY(void, WB_AsyncHandshakeWalkStack(JNIEnv* env, jobject wb, jobject threa
};
oop thread_oop = JNIHandles::resolve(thread_handle);
if (thread_oop != NULL) {
ThreadsListHandle tlh;
JavaThread* target = java_lang_Thread::thread(thread_oop);
// Sometimes 'target' is NULL and this test code expects
// Handshake::execute(AsyncHandshakeClosure,...) to handle it.
TraceSelfClosure* tsc = new TraceSelfClosure(target);
Handshake::execute(tsc, target);
}
@@ -342,15 +342,32 @@ void Handshake::execute(HandshakeClosure* hs_cl) {
}

void Handshake::execute(HandshakeClosure* hs_cl, JavaThread* target) {
// tlh_p == nullptr means we rely on a ThreadsListHandle somewhere
// in the caller's context (and we sanity check for that).
Handshake::execute(hs_cl, nullptr, target);
}

void Handshake::execute(HandshakeClosure* hs_cl, ThreadsListHandle* tlh_p, JavaThread* target) {
Copy link
Member

@dholmes-ora dholmes-ora Nov 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can we drop the _p part of tlh_p please.

Copy link
Contributor

@robehn robehn Nov 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please.

Copy link
Member Author

@dcubed-ojdk dcubed-ojdk Nov 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in handshake.[ch]pp.

JavaThread* self = JavaThread::current();
HandshakeOperation op(hs_cl, target, Thread::current());

jlong start_time_ns = os::javaTimeNanos();

ThreadsListHandle tlh;
if (tlh.includes(target)) {
target->handshake_state()->add_operation(&op);
bool target_is_dead = false;
if (target == nullptr) {
target_is_dead = true;
Copy link
Contributor

@coleenp coleenp Oct 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would you pass a NULL target thread to Handshake::execute? Why would the caller not check if the target is dead?

Copy link
Member Author

@dcubed-ojdk dcubed-ojdk Oct 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NULL target thread being passed in is actually handled by the baseline code:

  ThreadsListHandle tlh;
  if (tlh.includes(target)) {

tlh.includes(target) returns false when target is NULL/nullptr.
I just made the already handled situation more explicit.

Why would the caller not check if the target is dead?

Hmmm... It's hard for me to answer that question since I didn't write
the original code. The test code that calls WB_HandshakeWalkStack()
or WB_AsyncHandshakeWalkStack() can call those functions with
a thread_handle that translates into a thread_oop that returns a
NULL JavaThread*.

See the comment that I added to WB_AsyncHandshakeWalkStack() above.

Copy link
Member Author

@dcubed-ojdk dcubed-ojdk Oct 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: I've added comments to WB_HandshakeReadMonitors() and
WB_HandshakeWalkStack() to clarify their expectations.

Copy link
Member Author

@dcubed-ojdk dcubed-ojdk Nov 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update again: I took a closer look at WB_AsyncHandshakeWalkStack(),
WB_HandshakeReadMonitors() and WB_HandshakeWalkStack() and
I realized that those functions were not properly converting a jobject into
a protected JavaThread*. I've updated them to call the proper conversion
function and I've changed this code to guarantee() that the target is not
nullptr.

Copy link
Contributor

@coleenp coleenp Nov 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great.

} else {
if (tlh_p == nullptr) {
guarantee(Thread::is_JavaThread_protected_by_my_ThreadsList(target),
Copy link
Member

@dholmes-ora dholmes-ora Oct 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be an assert once this has had some bake time.

Copy link
Member Author

@dcubed-ojdk dcubed-ojdk Oct 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. All of the guarantee(Thread::is_JavaThread_protected_by_my_ThreadsList(),...
calls should be changed to asserts down the road.

"missing ThreadsListHandle in calling context.");
target->handshake_state()->add_operation(&op);
} else if (tlh_p->includes(target)) {
target->handshake_state()->add_operation(&op);
} else {
target_is_dead = true;
}
}
if (target_is_dead) {
char buf[128];
jio_snprintf(buf, sizeof(buf), "(thread= " INTPTR_FORMAT " dead)", p2i(target));
log_handshake_info(start_time_ns, op.name(), 0, 0, buf);
@@ -396,13 +413,24 @@ void Handshake::execute(AsyncHandshakeClosure* hs_cl, JavaThread* target) {
jlong start_time_ns = os::javaTimeNanos();
AsyncHandshakeOperation* op = new AsyncHandshakeOperation(hs_cl, target, start_time_ns);

ThreadsListHandle tlh;
if (tlh.includes(target)) {
target->handshake_state()->add_operation(op);
} else {
if (target == nullptr) {
// Stress test that calls WB_AsyncHandshakeWalkStack() can get here:
log_handshake_info(start_time_ns, op->name(), 0, 0, "(thread dead)");
delete op;
return;
}

Thread* current = Thread::current();
if (current != target) {
// Another thread is handling the request and it must be protecting
// the target.
guarantee(Thread::is_JavaThread_protected_by_my_ThreadsList(target),
"missing ThreadsListHandle in calling context.");
}
// Implied else:
// The target is handling the request itself so it can't be dead.

target->handshake_state()->add_operation(op);
}

HandshakeState::HandshakeState(JavaThread* target) :
@@ -36,6 +36,7 @@ class HandshakeOperation;
class JavaThread;
class SuspendThreadHandshake;
class ThreadSelfSuspensionHandshake;
class ThreadsListHandle;

// A handshake closure is a callback that is executed for a JavaThread
// while it is in a safepoint/handshake-safe state. Depending on the
@@ -65,6 +66,7 @@ class Handshake : public AllStatic {
// Execution of handshake operation
static void execute(HandshakeClosure* hs_cl);
static void execute(HandshakeClosure* hs_cl, JavaThread* target);
static void execute(HandshakeClosure* hs_cl, ThreadsListHandle* tlh_p, JavaThread* target);
static void execute(AsyncHandshakeClosure* hs_cl, JavaThread* target);
};

@@ -486,6 +486,29 @@ bool Thread::is_JavaThread_protected(const JavaThread* p) {
return false;
}

// Is the target JavaThread protected by a ThreadsList associated
// with the calling Thread.
//
// Thread::is_JavaThread_protected() above is the more general check.
// This function ONLY checks the ThreadsLists (if any) associated with
// the calling thread in order to verify proper ThreadsListHandle
// placement somewhere in the calling context.
bool Thread::is_JavaThread_protected_by_my_ThreadsList(const JavaThread* p) {
Thread* current_thread = Thread::current();
Copy link
Contributor

@coleenp coleenp Oct 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you call this on the current thread as "this" argument?

Copy link
Member Author

@dcubed-ojdk dcubed-ojdk Oct 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modeled the new check after the existing:

bool Thread::is_JavaThread_protected(const JavaThread* p) {

which is also a static function.

Copy link
Member

@dholmes-ora dholmes-ora Oct 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the name is somewhat ungainly - and unnecessarily detailed given is_JavaThread_protected has a similar constraint - it should be a static function as given because it must only be called on the current thread, and an instance method would give the false impression that it could be called on any thread.

That said it should be possible to write that code block only once and reuse it. And the name as I said is somewhat ungainly. You could even have:

static bool is_JavaThread_protected(const JavaThread* p, bool checkTLHOnly = false) {

Copy link
Member Author

@dcubed-ojdk dcubed-ojdk Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm checking out adding the checkTLHOnly param to is_JavaThread_protected()
and replacing is_JavaThread_protected_by_my_ThreadsList() calls with calls
to the updated function.

Copy link
Member Author

@dcubed-ojdk dcubed-ojdk Nov 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing went well so the above change will be in the next change that I push.

// Check the ThreadsLists associated with the calling thread (if any)
// to see if one of them protects the target JavaThread:
for (SafeThreadsListPtr* stlp = current_thread->_threads_list_ptr;
stlp != NULL; stlp = stlp->previous()) {
if (stlp->list()->includes(p)) {
// The target JavaThread is protected by this ThreadsList:
return true;
}
}

// The target JavaThread is not protected.
return false;
}

ThreadPriority Thread::get_priority(const Thread* const thread) {
ThreadPriority priority;
// Can return an error!
@@ -1743,18 +1766,20 @@ void JavaThread::send_thread_stop(oop java_throwable) {
// - Target thread will not enter any new monitors.
//
bool JavaThread::java_suspend() {
ThreadsListHandle tlh;
if (!tlh.includes(this)) {
log_trace(thread, suspend)("JavaThread:" INTPTR_FORMAT " not on ThreadsList, no suspension", p2i(this));
guarantee(Thread::is_JavaThread_protected_by_my_ThreadsList(this),
"missing ThreadsListHandle in calling context.");
if (is_exiting()) {
Copy link
Member

@dholmes-ora dholmes-ora Oct 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems an unrelated change in behaviour ??

Copy link
Contributor

@coleenp coleenp Oct 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So suspend_thread and resume thread's caller already takes a ThreadsListHandle so this is unnecessary and never happens.

Copy link
Member Author

@dcubed-ojdk dcubed-ojdk Oct 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems an unrelated change in behaviour ??

Actually this is equivalent code. The baseline code does this:

  ThreadsListHandle tlh;
  if (!tlh.includes(this)) {
    log_trace(thread, suspend)("JavaThread:" INTPTR_FORMAT " not on ThreadsList, no suspension", p2i(this));
    return false;
  }

What that code means is: if this thread does not appear in the newly created
ThreadsListHandle's list, then this thread has called java_suspend() after
this thread has exited. That's the only way that this thread could be missing
from a newly created ThreadsListHandle's list.

All I've done here is get rid of the ThreadsListHandle, verify that the calling
context is protecting this and switch to an is_exiting() call.

So suspend_thread and resume thread's caller already takes a ThreadsListHandle
so this is unnecessary and never happens.

I presume @coleenp is saying that the is_exiting() check is not necessary.

That might be true, it might not be true. I was trying to create equivalent
code without creating a nested ThreadsListHandle here and I think I've
done that. I actually think it might be possible for a JVM/TI event handler
to fire after the thread has set is_exiting() and for that event handler to
call SuspendThread() which could get us to this point.

Copy link
Member Author

@dcubed-ojdk dcubed-ojdk Oct 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On rereading all of these comments and the current baseline code, I have
to clarify one thing:

There is a minor change in behavior caused by switching from a
ThreadsListHandle::includes() check to a JavaThread::is_exiting()
check. The JavaThread::is_exiting() will return true before the target
JavaThread is removed from the system's current ThreadsList. So the
JavaThread::is_exiting() check will return true faster and the
ThreadsListHandle::includes() check.

However, that change in behavior does not result in a change in
behavior for a suspend thread request. Here's the relevant code:

src/hotspot/share/runtime/thread.cpp:
void JavaThread::exit(bool destroy_vm, ExitType exit_type) {
<snip>

    // The careful dance between thread suspension and exit is handled here.
    // Since we are in thread_in_vm state and suspension is done with handshakes,
    // we can just put in the exiting state and it will be correctly handled.
    set_terminated(_thread_exiting);

<snip>

  // Remove from list of active threads list, and notify VM thread if we are the last non-daemon thread
  Threads::remove(this, daemon);

<snip>
}

When we changed the suspend thread mechanism to use handshakes, we
made it so that once the JavaThread called set_terminated(_thread_exiting)
it no longer had to honor a suspend thread request.

Summary: Yes, the is_exiting() call will result in detecting the exiting
JavaThread sooner than the ThreadsListHandle::includes() check.
However, it will not result in a change in suspend thread behavior.

Copy link
Member

@dholmes-ora dholmes-ora Oct 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The is_exiting check seems unnecessary as the handshake code will not handshake with an exiting thread. The nested TLH was unnecessary too AFAICS.

Copy link
Member Author

@dcubed-ojdk dcubed-ojdk Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ummmm... The purpose of the new is_exiting() check and the baseline's
ThreadsListHandle::includes() check is to avoid making this call:

  return this->handshake_state()->suspend();

The call we are avoiding is the one that makes the synchronous
SuspendThreadHandshake request (for threads other than self)
so by detecting the exiting thread early, we are avoiding entering
the handshake machinery.

Copy link
Member

@dholmes-ora dholmes-ora Nov 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why do we care about that? The low-level mechanism handles exiting threads so there is no need for the higher-level code to do that. The current logic gives the appearance that we must filter out exiting threads at this level because the lower-level code does not handle them.

Copy link
Member Author

@dcubed-ojdk dcubed-ojdk Nov 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay you've convinced me. I'm merging my latest patch with master
and then I'll retest with the is_exiting() check removed.

log_trace(thread, suspend)("JavaThread:" INTPTR_FORMAT " is exiting, no suspension", p2i(this));
return false;
}
return this->handshake_state()->suspend();
}

bool JavaThread::java_resume() {
ThreadsListHandle tlh;
if (!tlh.includes(this)) {
log_trace(thread, suspend)("JavaThread:" INTPTR_FORMAT " not on ThreadsList, nothing to resume", p2i(this));
guarantee(Thread::is_JavaThread_protected_by_my_ThreadsList(this),
"missing ThreadsListHandle in calling context.");
if (is_exiting()) {
Copy link
Member

@dholmes-ora dholmes-ora Nov 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we remove this the same as we did for java_suspend()?

Copy link
Contributor

@robehn robehn Nov 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please

Copy link
Member Author

@dcubed-ojdk dcubed-ojdk Nov 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rationale for removing the is_exiting() check from java_suspend() was that it
was redundant because the handshake code detected and handled the is_exiting()
case so we didn't need to do that work twice.

If we look at HandshakeState::resume() there is no logic for detecting or handling
the possibility of an exiting thread. That being said, we have to look closer at what
HandshakeState::resume() does and whether that logic can be harmful if executed
on an exiting thread.

Here's the code:

bool HandshakeState::resume() {
  if (!is_suspended()) {
    return false;
  }
  MutexLocker ml(&_lock, Mutex::_no_safepoint_check_flag);
  if (!is_suspended()) {
    assert(!_handshakee->is_suspended(), "cannot be suspended without a suspend request");
    return false;
  }
  // Resume the thread.
  set_suspended(false);
  _lock.notify();
  return true;
}

I'm not seeing anything in HandshakeState::resume() that
worries me with respect to an exiting thread. Of course, the
proof is in the testing so I'll rerun the usual testing after
deleting that code.

Copy link
Member

@dholmes-ora dholmes-ora Nov 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A suspended thread cannot be exiting - else the suspend logic is broken. So, given you can call resume() on a not-suspended thread, as long as the handshake code checks for is_supended() (which it does) then no explicit is_exiting check is needed.

Copy link
Member Author

@dcubed-ojdk dcubed-ojdk Nov 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! I have to keep reminding myself that with handshake based suspend
and resume, we just don't have the same races with exiting threads that we
used to have.

log_trace(thread, suspend)("JavaThread:" INTPTR_FORMAT " is exiting, nothing to resume", p2i(this));
return false;
}
return this->handshake_state()->resume();
@@ -203,6 +203,9 @@ class Thread: public ThreadShadow {
// Is the target JavaThread protected by the calling Thread
// or by some other mechanism:
static bool is_JavaThread_protected(const JavaThread* p);
// Is the target JavaThread protected by a ThreadsList associated
// with the calling Thread.
static bool is_JavaThread_protected_by_my_ThreadsList(const JavaThread* p);

void* operator new(size_t size) throw() { return allocate(size, true); }
void* operator new(size_t size, const std::nothrow_t& nothrow_constant) throw() {