Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8257831: Suspend with handshakes (Preview) #2625

Closed
wants to merge 33 commits into from

Conversation

robehn
Copy link
Contributor

@robehn robehn commented Feb 18, 2021

A suspend request is done by handshaking thread target thread(s).
When executing the handshake operation we know the target mutator
thread is in a dormant state (as in safepoint safe state).
We have a guarantee that it will check it's poll before leaving the
dormant state.
To stop the thread from leaving the the dormant state we install a
second asynchronous handshake to be executed by the targeted thread.
The asynchronous handshake will wait on a monitor while the thread is
suspended.
The target thread cannot not leave the dormant state without a resume
request.

Per thread suspend requests are naturally serialized by the per
thread HandshakeState lock (we can only execute one handshake at a
time per thread).
Instead of having a separate lock we use this to our advantage and use
HandshakeState lock for serializing access to the suspend flag and for
wait/notify.

Suspend:
Requesting thread -> synchronous handshake -> target thread
Inside synchronus handshake (HandshakeState lock is locked while
executing any handshake):
- Set suspended flag
- Install asynchronous handshake

Target thread -> tries to leave dormant state -> Executes handshakes
Target only executes asynchronous handshake:
- While suspended
- Go to blocked
- Wait on HandshakeState lock

Resume:
Resuming thread:
- Lock HandshakeState lock
- Clear suspended flag
- Notify HandshakeState lock
- Unlock HandshakeState lock

The "suspend requested" flag is an optimization, without it a dormant thread
could be suspended and resumed many times and each would add a new
asynchronous handshake.
Suspend requested flag means there is already an asynchronous suspend
handshake in queue which can be re-used, only the suspend flag needs to be
set.


Some code can be simplified or done in a smarter way but I refrained
from doing such changes instead tried to keep existing code as is as far
as possible.
This concerns especially raw monitors.


Regarding the changed test, the documentation says:
"If the calling thread is specified in the request_list array, this
function will not return until some other thread resumes it."

But the code:
LOG("suspendTestedThreads: before JVMTI SuspendThreadList");
err = jvmti->SuspendThreadList(threads_count, threads, results);
...
// Allow the Main thread to inspect the result of tested threads
suspension
agent_unlock(jni);

The thread will never return from SuspendThreadList until resumed, so
it cannot unlock with agent_unlock().
Thus main thread is stuck forever on:
// Block until the suspender thread competes the tested threads
suspension
agent_lock(jni);

And never checks and reumes the threads.
So I removed that lock instead just sleep and check until all thread
have the expected suspended state.


Testing t1-t8, nsk_jdi/nsk_jvmti/jdk_jdi/tck, KS24, RunThese and
combinations like running with -XX:ZCollectionInterval=0.01 -
XX:ZFragmentationLimit=0.
Running above some of above concurrently (load ~240), slow debug,
etc...


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

Download

To checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/2625/head:pull/2625
$ git checkout pull/2625

To update a local copy of the PR:
$ git checkout pull/2625
$ git pull https://git.openjdk.java.net/jdk pull/2625/head

@bridgekeeper
Copy link

bridgekeeper bot commented Feb 18, 2021

👋 Welcome back rehn! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Feb 18, 2021

@robehn The following labels will be automatically applied to this pull request:

  • hotspot
  • serviceability

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added serviceability serviceability-dev@openjdk.org hotspot hotspot-dev@openjdk.org labels Feb 18, 2021
Copy link
Contributor

@coleenp coleenp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some typos

src/hotspot/share/prims/jvmtiRawMonitor.cpp Outdated Show resolved Hide resolved
src/hotspot/share/prims/jvmtiRawMonitor.cpp Outdated Show resolved Hide resolved
src/hotspot/share/runtime/handshake.cpp Outdated Show resolved Hide resolved
src/hotspot/share/runtime/handshake.cpp Outdated Show resolved Hide resolved
@robehn
Copy link
Contributor Author

robehn commented Mar 11, 2021

So I have an update, but I still need todo some more testing.
It seem like I could just remove the UtilLock, since the reason to have it is gone.
I also kept getting collisions with 8262910, so to only deal with that once I rebased on top of that instead of merging.

Hopefully I'll be pushing the update later today.

Copy link
Member

@dcubed-ojdk dcubed-ojdk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An amazing cleanup of the suspend/resume mechanism. This gets rid
of so much ancient code and the new Handshake based mechanism
looks so very promising for my correctness worries.

With this large of a change, there some minor details that need to be
looked up. Overall this is outstanding work!

Comment on lines 1584 to 1592
// On some systems we have seen signal delivery get "stuck" until the signal
// mask is changed as part of thread termination. Check that the current thread
// has not already terminated (via SR_lock()) - else the following assertion
// has not already terminated - else the following assertion
// will fail because the thread is no longer a JavaThread as the ~JavaThread
// destructor has completed.

if (thread->SR_lock() == NULL) {
if (thread->is_Java_thread() && thread->as_Java_thread()->is_terminated()) {
return;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't equivalent code. The old code was relying on the fact that a
JavaThread whose ~JavaThread destructor had run can be checked
for being terminated by seeing if the SR_lock() had been set to NULL.
NULL SR_lock means the now destructed JavaThread is terminated and
the assert that follows:

assert(thread->is_VM_thread() || thread->is_Java_thread(), "Must be VMThread or JavaThread");

won't fire because we bail out early.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If JavaThread destructor have been run the JavaThread pointer is invalid, so we may not dereference it.
Thus the old code is invalid if SR_lock is NULL, because you just dereferenced a pointer to a destroyed object.

If the user of OS Suspend/Resume (e.g. JFR) have it on a ThreadsList it cannot become terminated.
If it's terminated it is not on any newly created ThreadsList, it cannot be seem, anyways.
Assuming the user uses a ThreadsList the code is valid, a user without a ThreadsList this code is still dereferencing a pointer to a destroyed object, but there is no easy fix here for this.
If the user promise to use a ThreadsList we can remove this check.

So instead of trying to make something equivalent I tried to make it 'better' instead.

Sorry I don't see why the assert won't fire?
We used to bailout for any Thread, now we only bailout for JavaTread (if terminated), so the scope have increased.
We now fire always for a non Java/VM thread, which we did not do before.

src/hotspot/share/prims/jvmtiEnv.cpp Outdated Show resolved Hide resolved
assert(!_handshakee->has_last_Java_frame() || _handshakee->frame_anchor()->walkable(), "should have walkable stack");
JavaThreadState jts = _handshakee->thread_state();
while (_handshakee->_suspended) {
_handshakee->set_thread_state(_thread_blocked);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This set of the thread_state to _thread_blocked can be done above the while-loop
since you don't reset the thread_state at the bottom of the loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it can, but I did not want to go to blocked if we are not suspended.
You think we should go to block even if we are no longer suspened?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, you need to move the _handshakee->set_thread_state(jts)
into the while-loop so that you're matching up the set to blocked with
the restore to previous.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want the state to 'flicker', it should be stable blocked while we are suspended.
Therefore I set when we have left the while loop and know we are leaving the suspension.

src/hotspot/share/runtime/mutex.cpp Outdated Show resolved Hide resolved
src/hotspot/share/runtime/mutex.cpp Outdated Show resolved Hide resolved
src/hotspot/share/runtime/thread.cpp Outdated Show resolved Hide resolved
src/hotspot/share/runtime/thread.hpp Outdated Show resolved Hide resolved
src/hotspot/share/runtime/thread.hpp Outdated Show resolved Hide resolved
src/hotspot/share/runtime/thread.hpp Show resolved Hide resolved
src/hotspot/share/runtime/handshake.cpp Show resolved Hide resolved
robehn and others added 17 commits March 16, 2021 12:12
Reviewed-by: coleenp, sspitsyn
Reviewed-by: iklam, coleenp, kbarrett
…adAllBytes and Files.copy

Reviewed-by: mcimadamore, alanb
Reviewed-by: neliasso, thartmann
Reviewed-by: asemenyuk, almatvee, kizune, kcr
Reviewed-by: vlivanov, kvn
Reviewed-by: almatvee, iklam, herrick
@robehn
Copy link
Contributor Author

robehn commented Mar 19, 2021

An amazing cleanup of the suspend/resume mechanism. This gets rid
of so much ancient code and the new Handshake based mechanism
looks so very promising for my correctness worries.

With this large of a change, there some minor details that need to be
looked up. Overall this is outstanding work!

Thank you Dan!

@dcubed-ojdk
Copy link
Member

dcubed-ojdk commented Mar 22, 2021

@robehn - I've re-reviewed this one: 31eaf1a
and I'm good with those changes.

I'm looking at the other newer commits, but I'm finding the history to be a bit confused/confusing.
Other folk's changesets are showing up as commits in your PR...

@dcubed-ojdk
Copy link
Member

@robehn - I've reviewed this one: e3ff536
and I'm good with those changes.

Copy link
Member

@dcubed-ojdk dcubed-ojdk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robehn - I reviewed this one: acee10a

I'm mostly good with these changes and I like the migration of
the suspend stuff over to handshake source files. You haven't
said whether you're done with my code crawl comments yet.
I haven't see anything to deals with my comments about the
older code that was depending on the SR_lock, but maybe you
haven't gotten there yet.

I'll plan to do another crawl thru review when you think you're
ready for a non-preview review.

assert(!_handshakee->has_last_Java_frame() || _handshakee->frame_anchor()->walkable(), "should have walkable stack");
JavaThreadState jts = _handshakee->thread_state();
while (_handshakee->_suspended) {
_handshakee->set_thread_state(_thread_blocked);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, you need to move the _handshakee->set_thread_state(jts)
into the while-loop so that you're matching up the set to blocked with
the restore to previous.

@@ -768,9 +768,6 @@ typedef HashtableEntry<InstanceKlass*, mtClass> KlassHashtableEntry;
/* JavaThread (NOTE: incomplete) */ \
/*********************************/ \
\
volatile_nonstatic_field(JavaThread, _suspended, bool) \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you checked whether any code in Serviceabiity Agent is depending on this field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have looked but not find any, also the tests passes.

@robehn
Copy link
Contributor Author

robehn commented Mar 23, 2021

Great @dcubed-ojdk,

Yes, I was done/waiting for response with/for your crawl!

I haven't see anything to deals with my comments about the
older code that was depending on the SR_lock, but maybe you
haven't gotten there yet.

I added this comment:
"If JavaThread destructor have been run the JavaThread pointer is invalid, so we may not dereference it.
Thus the old code is invalid if SR_lock is NULL, because you just dereferenced a pointer to a destroyed object.

If the user of OS Suspend/Resume (e.g. JFR) have it on a ThreadsList it cannot become terminated.
If it's terminated it is not on any newly created ThreadsList, it cannot be seen, anyways.
Assuming the user uses a ThreadsList the code is valid, a user without a ThreadsList this code is still dereferencing a pointer to a destroyed object, but there is no easy fix here for this.
If the user promise to use a ThreadsList we can remove this check.

So instead of trying to make something equivalent I tried to make it 'better' instead.

Sorry I don't see why the assert won't fire?
We used to bailout for any Thread, now we only bailout for JavaTread (if terminated), so the scope have increased.
We now fire always for a non Java/VM thread, which we did not do before."

I'll plan to do another crawl thru review when you think you're
ready for a non-preview review.

Great!

@robehn robehn closed this Mar 25, 2021
@robehn robehn changed the title 8257831: Suspend with handshakes 8257831: Suspend with handshakes (Preview) Mar 25, 2021
@robehn robehn mentioned this pull request Mar 25, 2021
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org serviceability serviceability-dev@openjdk.org