-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8257831: Suspend with handshakes (Preview) #2625
Conversation
👋 Welcome back rehn! A progress list of the required criteria for merging this PR into |
ed65144
to
503448f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some typos
So I have an update, but I still need todo some more testing. Hopefully I'll be pushing the update later today. |
503448f
to
1f05abc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An amazing cleanup of the suspend/resume mechanism. This gets rid
of so much ancient code and the new Handshake based mechanism
looks so very promising for my correctness worries.
With this large of a change, there some minor details that need to be
looked up. Overall this is outstanding work!
// On some systems we have seen signal delivery get "stuck" until the signal | ||
// mask is changed as part of thread termination. Check that the current thread | ||
// has not already terminated (via SR_lock()) - else the following assertion | ||
// has not already terminated - else the following assertion | ||
// will fail because the thread is no longer a JavaThread as the ~JavaThread | ||
// destructor has completed. | ||
|
||
if (thread->SR_lock() == NULL) { | ||
if (thread->is_Java_thread() && thread->as_Java_thread()->is_terminated()) { | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't equivalent code. The old code was relying on the fact that a
JavaThread whose ~JavaThread destructor had run can be checked
for being terminated by seeing if the SR_lock() had been set to NULL.
NULL SR_lock means the now destructed JavaThread is terminated and
the assert that follows:
assert(thread->is_VM_thread() || thread->is_Java_thread(), "Must be VMThread or JavaThread");
won't fire because we bail out early.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If JavaThread destructor have been run the JavaThread pointer is invalid, so we may not dereference it.
Thus the old code is invalid if SR_lock is NULL, because you just dereferenced a pointer to a destroyed object.
If the user of OS Suspend/Resume (e.g. JFR) have it on a ThreadsList it cannot become terminated.
If it's terminated it is not on any newly created ThreadsList, it cannot be seem, anyways.
Assuming the user uses a ThreadsList the code is valid, a user without a ThreadsList this code is still dereferencing a pointer to a destroyed object, but there is no easy fix here for this.
If the user promise to use a ThreadsList we can remove this check.
So instead of trying to make something equivalent I tried to make it 'better' instead.
Sorry I don't see why the assert won't fire?
We used to bailout for any Thread, now we only bailout for JavaTread (if terminated), so the scope have increased.
We now fire always for a non Java/VM thread, which we did not do before.
assert(!_handshakee->has_last_Java_frame() || _handshakee->frame_anchor()->walkable(), "should have walkable stack"); | ||
JavaThreadState jts = _handshakee->thread_state(); | ||
while (_handshakee->_suspended) { | ||
_handshakee->set_thread_state(_thread_blocked); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This set of the thread_state to _thread_blocked
can be done above the while-loop
since you don't reset the thread_state at the bottom of the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it can, but I did not want to go to blocked if we are not suspended.
You think we should go to block even if we are no longer suspened?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, you need to move the _handshakee->set_thread_state(jts)
into the while-loop so that you're matching up the set to blocked with
the restore to previous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want the state to 'flicker', it should be stable blocked while we are suspended.
Therefore I set when we have left the while loop and know we are leaving the suspension.
…ptimizations Reviewed-by: vlivanov, neliasso
Reviewed-by: coleenp, sspitsyn
Reviewed-by: iklam, coleenp, kbarrett
…adAllBytes and Files.copy Reviewed-by: mcimadamore, alanb
Reviewed-by: goetz, vlivanov
Reviewed-by: stuefe, aefimov
Reviewed-by: neliasso, thartmann
Reviewed-by: shade, zgu
Reviewed-by: rriggs
Reviewed-by: dholmes
Reviewed-by: asemenyuk, almatvee, kizune, kcr
Reviewed-by: vlivanov, kvn
Reviewed-by: ccheung, iklam
Reviewed-by: almatvee, iklam, herrick
Reviewed-by: serb, ihse, azvegint
Reviewed-by: jlahoda
…seIterGVN::optimize" Reviewed-by: thartmann, kvn
…ector while calling StoreVectorNode::memory_size() Reviewed-by: kvn, vlivanov
Reviewed-by: iignatyev
Reviewed-by: iklam
Reviewed-by: thartmann, kvn
…ATION Reviewed-by: dcubed, sspitsyn
Reviewed-by: azvegint, pbansal, aivanov
Thank you Dan! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robehn - I reviewed this one: acee10a
I'm mostly good with these changes and I like the migration of
the suspend stuff over to handshake source files. You haven't
said whether you're done with my code crawl comments yet.
I haven't see anything to deals with my comments about the
older code that was depending on the SR_lock, but maybe you
haven't gotten there yet.
I'll plan to do another crawl thru review when you think you're
ready for a non-preview review.
assert(!_handshakee->has_last_Java_frame() || _handshakee->frame_anchor()->walkable(), "should have walkable stack"); | ||
JavaThreadState jts = _handshakee->thread_state(); | ||
while (_handshakee->_suspended) { | ||
_handshakee->set_thread_state(_thread_blocked); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, you need to move the _handshakee->set_thread_state(jts)
into the while-loop so that you're matching up the set to blocked with
the restore to previous.
@@ -768,9 +768,6 @@ typedef HashtableEntry<InstanceKlass*, mtClass> KlassHashtableEntry; | |||
/* JavaThread (NOTE: incomplete) */ \ | |||
/*********************************/ \ | |||
\ | |||
volatile_nonstatic_field(JavaThread, _suspended, bool) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you checked whether any code in Serviceabiity Agent is depending on this field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have looked but not find any, also the tests passes.
Great @dcubed-ojdk, Yes, I was done/waiting for response with/for your crawl!
I added this comment: If the user of OS Suspend/Resume (e.g. JFR) have it on a ThreadsList it cannot become terminated. So instead of trying to make something equivalent I tried to make it 'better' instead. Sorry I don't see why the assert won't fire?
Great! |
A suspend request is done by handshaking thread target thread(s).
When executing the handshake operation we know the target mutator
thread is in a dormant state (as in safepoint safe state).
We have a guarantee that it will check it's poll before leaving the
dormant state.
To stop the thread from leaving the the dormant state we install a
second asynchronous handshake to be executed by the targeted thread.
The asynchronous handshake will wait on a monitor while the thread is
suspended.
The target thread cannot not leave the dormant state without a resume
request.
Per thread suspend requests are naturally serialized by the per
thread HandshakeState lock (we can only execute one handshake at a
time per thread).
Instead of having a separate lock we use this to our advantage and use
HandshakeState lock for serializing access to the suspend flag and for
wait/notify.
Suspend:
Requesting thread -> synchronous handshake -> target thread
Inside synchronus handshake (HandshakeState lock is locked while
executing any handshake):
- Set suspended flag
- Install asynchronous handshake
Target thread -> tries to leave dormant state -> Executes handshakes
Target only executes asynchronous handshake:
- While suspended
- Go to blocked
- Wait on HandshakeState lock
Resume:
Resuming thread:
- Lock HandshakeState lock
- Clear suspended flag
- Notify HandshakeState lock
- Unlock HandshakeState lock
The "suspend requested" flag is an optimization, without it a dormant thread
could be suspended and resumed many times and each would add a new
asynchronous handshake.
Suspend requested flag means there is already an asynchronous suspend
handshake in queue which can be re-used, only the suspend flag needs to be
set.
Some code can be simplified or done in a smarter way but I refrained
from doing such changes instead tried to keep existing code as is as far
as possible.
This concerns especially raw monitors.
Regarding the changed test, the documentation says:
"If the calling thread is specified in the request_list array, this
function will not return until some other thread resumes it."
But the code:
LOG("suspendTestedThreads: before JVMTI SuspendThreadList");
err = jvmti->SuspendThreadList(threads_count, threads, results);
...
// Allow the Main thread to inspect the result of tested threads
suspension
agent_unlock(jni);
The thread will never return from SuspendThreadList until resumed, so
it cannot unlock with agent_unlock().
Thus main thread is stuck forever on:
// Block until the suspender thread competes the tested threads
suspension
agent_lock(jni);
And never checks and reumes the threads.
So I removed that lock instead just sleep and check until all thread
have the expected suspended state.
Testing t1-t8, nsk_jdi/nsk_jvmti/jdk_jdi/tck, KS24, RunThese and
combinations like running with -XX:ZCollectionInterval=0.01 -
XX:ZFragmentationLimit=0.
Running above some of above concurrently (load ~240), slow debug,
etc...
Progress
Issue
Download
To checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/2625/head:pull/2625
$ git checkout pull/2625
To update a local copy of the PR:
$ git checkout pull/2625
$ git pull https://git.openjdk.java.net/jdk pull/2625/head