New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8254263: Remove special_runtime_exit_condition() check from ~ThreadInVMForHandshake() #577
Conversation
👋 Welcome back pchilanomate! A progress list of the required criteria for merging this PR into |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Patricio, the change looks good.
Thanks, Richard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify my own understanding. Looking at the original problem scenario in JDK-8223572 we have:
- T calls _semaphore.wait_with_safepoint_check()
- T transitions from _thread_in_vm to _thread_blocked
- The VM thread completes H and calls _semaphore.signal()
- Before T can transition back to _thread_in_vm, the VM thread schedules another safepoint and
executes VM_ThreadSuspend on behalf of a JVMTI agent that wants to suspend T
The key change in JDK-8238761 is that we now call:
MutexLocker ml(&_lock, Mutex::_no_safepoint_check_flag);
so there is longer the possibility of T becoming handshake-safe and so the VM_ThreadSuspend cannot happen.
@@ -158,7 +143,8 @@ class ThreadInVMForHandshake : public ThreadStateTransition { | |||
} | |||
|
|||
~ThreadInVMForHandshake() { | |||
transition_back(); | |||
assert(_thread->thread_state() == _thread_in_vm, "should only call when leaving VM after handshake"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also assert !_thread->has_special_runtime_exit_condition()
? Or at least that the value of this has not changed between the TIVFH constructor and destructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
has_special_runtime_exit_condition() checks for async exceptions, external_suspend and JFR suspend. The last two can actually be set while we are in the handshake. It doesn't mean that the caller will consider the target suspended, but the _suspend_flags might be different at construction and destruction of TIVFH. As for async exceptions, I realized we have a similar issue as the one described in 8223572 when calling handle_polling_page_exception() on a return poll:
- T calls ThreadSafepointState::handle_polling_page_exception() because another thread called T.stop() (which we implement with a handshake)
- T calls SafepointMechanism::process_if_requested(thread())
- T calls HandshakeState::process_self_inner() and processes the handshake which calls set_pending_async_exception()
- T returns from SafepointMechanism::process_if_requested(thread()) and goes back to java without calling check_and_handle_async_exceptions() to throw the ThreadDeath() exception
I'm not sure if delaying the throw of ThreadDeath is allowed, so if we need to keep the current behaviour I can add a call to check_and_handle_async_exceptions():
@@ -987,6 +987,7 @@ void ThreadSafepointState::handle_polling_page_exception() {
// Process pending operation
SafepointMechanism::process_if_requested(thread());
+ thread()->check_and_handle_async_exceptions();
// restore oop result, if any
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading that comment I realized that this change would break in flight pr #119
To refresh memories: there I'm adding a new suspend flag currently required for object reallocation. I'm using a handshake to synchronize with the suspendee thread S. After the handshake it is guaranteed that the stack of S is walkable and S cannot push/pop frames. This would not work anymore.
I could replace the handshakes (in escapeBarrier.cpp line 172, 218) with vm operations though. Should be ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. You should still be able to use handshakes. I think the issue is that in handle_polling_page_exception(), instead of calling SafepointMechanism::process_if_requested(thread()), we should be doing { ThreadInVMfromJava __tiv(thread());} and { ThreadInVMfromJavaNoAsyncException __tiv(thread());} for the polling on return and not on return respectively. That will process a safepoint/handshake and then check the special conditions before going back to java.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, looks like this is possible.
Wouldn't this also make the block
if (state != _thread_blocked_trans &&
state != _thread_in_vm_trans &&
thread->has_special_runtime_exit_condition()) {
thread->handle_special_runtime_exit_condition(
!thread->is_at_poll_safepoint() && (state != _thread_in_native_trans));
}
in SafepointSynchronize::block(JavaThread *thread)
redundant? This would eliminate the recursion you mentioned in the PR synopsis, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant the block could be removed if we put
ThreadInVMfromJava/ThreadInVMfromJava (TIVFJ) into
handle_polling_page_exception(). In that case the thread would notice and honor
the suspend when leaving the TIVFJ block. This would not be too late, would it?
Yes, it will honor the first suspend in the TIVFJ destructor by calling handle_special_runtime_exit_condition(), but even in that case, we would still need the check in SS:block() because of the last call to SafepointMechanism::process_if_requested(thread()) in java_suspend_self_with_safepoint_check(). That could be a new VM_ThreadSuspend() operation to suspend the thread again. So that check in SS:block() is actually needed for all the transitions to java not only the one in handle_polling_page_exception(). But I think we can remove that recursion if we move that call to SafepointMechanism::process_if_requested(thread()) inside the do-while().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, I'm thinking the call to SafepointMechanism::process_if_requested(this)
in java_suspend_self_with_safepoint_check() should be moved inside the the
do-while(is_external_suspend()). That way we guarantee after returning from
java_suspend_self_with_safepoint_check() that no one will be assuming the
thread is suspended. And in that case I think that check in SS::block can be
removed.Yes I agree.
Great. I'll test it out.
Well, I don't want to spam your PR with too many questions / comments. The
change still looks good to me.
No problem, it's actually good to have more eyes looking to make sure I don't miss anything. : )
Thanks Richard!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant the block could be removed if we put
ThreadInVMfromJava/ThreadInVMfromJava (TIVFJ) into
handle_polling_page_exception(). In that case the thread would notice and honor
the suspend when leaving the TIVFJ block. This would not be too late, would it?
Yes, it will honor the first suspend in the TIVFJ destructor by calling
handle_special_runtime_exit_condition(), but even in that case, we would still
need the check in SS:block() because of the last call to
SafepointMechanism::process_if_requested(thread()) in
java_suspend_self_with_safepoint_check(). That could be a new
VM_ThreadSuspend() operation to suspend the thread again.
Sorry, I have to ask again to really understand this myself. If it is a new
VM_ThreadSuspend() that started while the target thread T was still
_thread_blocked in java_suspend_self_with_safepoint_check(), then T can not
leave the do-while-loop because is_external_suspend() will return true as
setting the suspend flag for T cannot be reordered with the begin of the
safepoint.
Also if T could leave the loop then the safepoint could be ended before T
reaches SS::block(). It would escape then to java even though it was suspended.
See also review thread for JDK-8252521:
http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-September/041632.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant the block could be removed if we put
ThreadInVMfromJava/ThreadInVMfromJava (TIVFJ) into
handle_polling_page_exception(). In that case the thread would notice and honor
the suspend when leaving the TIVFJ block. This would not be too late, would it?Yes, it will honor the first suspend in the TIVFJ destructor by calling
handle_special_runtime_exit_condition(), but even in that case, we would still
need the check in SS:block() because of the last call to
SafepointMechanism::process_if_requested(thread()) in
java_suspend_self_with_safepoint_check(). That could be a new
VM_ThreadSuspend() operation to suspend the thread again.Sorry, I have to ask again to really understand this myself. If it is a new
VM_ThreadSuspend() that started while the target thread T was still
_thread_blocked in java_suspend_self_with_safepoint_check(), then T can not
leave the do-while-loop because is_external_suspend() will return true as
setting the suspend flag for T cannot be reordered with the begin of the
safepoint.
The new JVM_SuspendThread() call (up to VM_ThreadSuspend()) is started after the thread sets its state back to _thread_in_Java and exits the do-while() loop but before it calls SafepointMechanism::process_if_requested(this). The thread will see there is a safepoint pending and will block in SS::block(). If there is no check in SS::block(), after the safepoint finishes the thread will return to java_suspend_self_with_safepoint_check(), back to ~ TIVFJ and then back to java.
Also if T could leave the loop then the safepoint could be ended before T
reaches SS::block(). It would escape then to java even though it was suspended.
If the VMThread doesn't read the thread's state before it is restored back to _thread_in_Java then the safepoint won't progress since the thread is unsafe (_thread_in_Java). So it will have to wait for the thread to call SS::block(). If the VMThread did read a safe state (_thread_blocked) then it means we will see we were suspended in the do-while() loop and suspend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood now. Thanks!
@pchilano This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 57 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
Exactly. |
Mailing list message from David Holmes on hotspot-runtime-dev: Hi Patricio, On 12/10/2020 2:59 pm, Patricio Chilano Mateo wrote:
Ah right! Oh well :)
To some extent, given it is a race, a delay is permissible, but I find
Why directly in handle_polling_page_exception() rather than restoring it Thanks, |
Mailing list message from Patricio Chilano on hotspot-runtime-dev: Hi David, On 10/12/20 11:35 PM, David Holmes wrote:
Having the check in ~TIVFH saves us from missing the async exception for Thanks,
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me a while to re-read JDK-8223572 and JDK-8238761
and their associated review threads. It's complicated stuff and,
every time I thought I might have an issue, I subsequently
convinced myself all was good.
Thumbs up on this change!
Thanks Dan! Please see the update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I good with this change, but I do have some questions
buried in the code that you changed.
@@ -943,19 +943,21 @@ void ThreadSafepointState::print_on(outputStream *st) const { | |||
|
|||
// Process pending operation. | |||
void ThreadSafepointState::handle_polling_page_exception() { | |||
JavaThread* self = thread(); | |||
assert(self == Thread::current()->as_Java_thread(), "wrong thread"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps: s/wrong thread/must be self/ or s/wrong thread/must be calling thread/
|
||
// Process pending operation | ||
SafepointMechanism::process_if_requested(thread()); | ||
SafepointMechanism::process_if_requested(self); | ||
self->check_and_handle_async_exceptions(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with the addition of this call to check_and_handle_async_exceptions().
In particular because of this comment block in:
src/hotspot/share/runtime/thread.cpp:
void JavaThread::check_and_handle_async_exceptions(bool check_unsafe_error) {
if (has_last_Java_frame() && has_async_condition()) {
// If we are at a polling page safepoint (not a poll return)
// then we must defer async exception because live registers
// will be clobbered by the exception path. Poll return is
// ok because the call we a returning from already collides
// with exception handling registers and so there is no issue.
// (The exception handling path kills call result registers but
// this is ok since the exception kills the result anyway).
so check_and_handle_async_exceptions() clearly was expecting to
be called when at a " poll return" which is the block that contains
this new call. That's all good!
What I can't quite figure out is how we "lost" the call to
check_and_handle_async_exceptions() that must have been
made somehow before. Clearly that function was expecting
to be called from a "poll return" site and that's what you're
putting back, but how was that call made before and what
change got rid of it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same problematic execution sequence detailed in 8223572 for the external_suspend case was there for async exceptions too. So, while the thread was blocked in the handshake a ThreadStop() operation could have been executed. After returning from the handshake there was no async exception check. We probably never noticed it because the throw would happen anyways on some later transition back to java (vm->java or native->java).
As to how was that call made before and what change got rid of it: this code was there before handshakes existed. ThreadStop() was implemented as a safepoint and we were probably relying on the special_runtime_exit_condition() check at the end of SS::block() (exactly like the one I'm removing from ~TIVFH and which I'll try to get rid of to eliminate recursions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not to belabour this but I don't understand what this sentence:
// If we are at a polling page safepoint (not a poll return)
is actually making a distinction between. What is a "poll return" here versus a "polling page safepoint"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"poll return" would be a poll immediately before a return (also found out by reading it from the comments in ThreadSafepointState::handle_polling_page_exception()). So unless the poll in the nmethod was done right before a return, we defer the async exception. There is also a comment before the has_special_runtime_exit_condition() check on SS::block() that explains the reason:
// Note: we never deliver an async exception at a polling point as the
// compiler may not have an exception handler for it. The polling
// code will notice the async and deoptimize and the exception will
// be delivered. (Polling at a return point is ok though). Sure is
// a lot of bother for a deprecated feature...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Patricio. Good getting rid of the recursive case.
Thanks @reinrich, @dholmes-ora, @dcubed-ojdk and @robehn for the reviews! |
/integrate |
@pchilano Since your change was applied there have been 77 commits pushed to the
Your commit was automatically rebased without conflicts. Pushed as commit 55d760d. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
Hi all,
Please review the following patch that removes the call to handle_special_runtime_exit_condition() in ~ThreadInVMForHandshake(). This check was added by 8223572 to prevent a JavaThread that was suspended while trying to process a handshake to transition back to java without noticing it was suspended. Since 8238761, a JavaThread executing HandshakeState::process_by_self() will never become safe. It comes from an unsafe state and remains unsafe, so it cannot be suspended during that time. Removing this check also removes one of the reasons SafepointMechanism::process_if_requested() is recursive (the other one remains SafepointSynchronize::block()).
Tested in mach5, tiers 1-7.
Thanks,
Patricio
Progress
Testing
Failed test tasks
Issue
Reviewers
Download
$ git fetch https://git.openjdk.java.net/jdk pull/577/head:pull/577
$ git checkout pull/577