-
Notifications
You must be signed in to change notification settings - Fork 5.8k
8316397: StackTrace/Suspended/GetStackTraceSuspendedStressTest.java failed with: SingleStep event is NOT expected #23490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ailed with: SingleStep event is NOT expected
👋 Welcome back sspitsyn! A progress list of the required criteria for merging this PR into |
@sspitsyn This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 108 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not at all sure about this. Why are virtual threads different to platform threads here?
My recollection is that the handshake API deliberately does not wait for the suspension to occur, and that there is a separate mechanism to do that for code that needs it - in the old API we had JvmtiEnv::is_thread_fully_suspended
MutexLocker ml(&_lock, Mutex::_no_safepoint_check_flag); | ||
_lock.wait_without_safepoint_check(1); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would normally be incorrectly coded as you do not hold the lock around the state-change and so you may miss the notification. However, it is possible in this case that the overall handshake protocol prevents that from happening, but I cannot easily determine that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the comment, David.
You are right. It is why waiting is with the timeout: _lock.wait_without_safepoint_check(1);
But this is not fully correct either.
I see, Patricio also disagreed with my hack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Serguei,
I investigated this issue but don’t agree with the patch. I added comments below.
Thanks,
Patricio
// Thread suspension works under JvmtiVTMSTransitionDisabler protection, so we need to wait | ||
// for virtual thread to reach a safe state before the JVMTI SuspendThread is returned. | ||
while (_handshakee->thread_state() != _thread_blocked && | ||
_handshakee->thread_state() != _thread_in_native) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already wait for the target to reach a handshake-safe state when executing the SuspendThreadHandshake
. It’s just that we don’t wait until the target suspends in do_self_suspend()
since that can take arbitrarily long (e.g. target just waiting in native).
Now, I was able to reproduce the crash and found the problem. The target is being suspended while creating the JvmtiThreadState in I thought we can add a suspend check before making JVMTI callbacks. But although that would fix this issue, there is still always a race due to the |
Thank you for the comment and nice analysis, Patricio! It helps. Will try to find a right fix for this. |
I attached a patch in JBS that makes the crash easier to reproduce. Give it a try. Running the test with -Xint crashes for me in very few attempts. |
Thank you, Patricio! Will give it a try. |
I've pushed updated which rolls back the initial fix and implements a right approach. |
Right, this is the fix I was thinking about initially. We can always create a test where a suspended target posts an event though, because the target can be seen as handshake-safe after switching to native in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't say that I understand the fix at all. Why would we suspend a platform thread in the middle of post_single_step
? And what has this to do with the original proposal that only affected virtual threads???
If we don't want a suspended thread to be able to post events we should have probably detected that at a higher-level where it makes sense for the thread to actually suspend.
We have not observed this issue for platform threads yet. It is why I did not want this fix to impact the platform threads behavior. Patricio shared his opinion (and I agree with that) that this issue is common for platform and virtual threads. It is possible to take one of two paths:
Not sure what do you mean by "detected at a higher level". Do you mean: detect it in the |
Right. It is important to prevent posting events enabled after suspension.
My understanding is that the event posting is intentionally racy. We can treat it as a thread suspension request has been processed in the middle of an event callback execution. We do not need to fix this as it is a part of the design.
Yes, the target should not execute new Java bytecodes after suspension.
I treat it as VM/JVMTI bug as it looks silly when events are posted event though they were enabled after suspension. Yes, it is an attempt to do he saner thing. Thank you for good questions! |
Thanks Serguei, some comments below.
Ok, makes sense.
For this particular event I don’t think the crash can happen with platform threads because get_jvmti_thread_state() will only block for virtual threads. And I don’t see any other blocking points in InterpreterRuntime::at_safepoint() other than that one. But for other events it can happen since the thread might have been blocked and suspended at some point before (looking at post_class_prepare/post_class_load for example). This leads to another question, should we add this suspend check for all JvmtiExport::post_xxx methods? |
Thank you for the comment.
Right. I wanted to address this separately as you mentioned before that the same issue should exist for other event types. |
if (thread->is_suspended()) { | ||
// suspend here if there is a suspend request | ||
ThreadBlockInVM tbivm(thread, true /* allow suspend */); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This positioning seems better than the previous one, but it still causes me concern as a get function should not really have a side-effect of blocking due to suspension, in my view. What does the actual call-stack look like when we get here? We are obviously missing a suspension check for the current thread, but I still feel that should be higher-up the call-stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This positioning seems better than the previous one, but it still causes me concern as a get function should not really have a side-effect of blocking due to suspension, in my view.
Maybe we should just add a side function to check this and call it from all JvmtiExport::post_xxx
methods.
What does the actual call-stack look like when we get here? We are obviously missing a suspension check for the current thread, but I still feel that should be higher-up the call-stack.
In general we delay suspension until going back to Java or coming out of native, and avoid suspending in the transitions out of a blocked state because we could be holding VM monitors (JDK-8270085). This suspend check aims to address these cases where the thread was suspended sometime before while in a blocked state but it didn’t suspend when transitioning back to vm. So I don’t think there is an obvious better place to have it. Now of course we should check that we are not holding VM monitors in here, otherwise we are still in the same boat, but I see we already have a !thread->owns_locks()
assert as part of JvmtiJavaThreadEventTransition
due to ThreadToNativeFromVM
. Some events use JvmtiThreadEventTransition
though which does manual transitions and we don’t have a check there. We should move this !thread->owns_locks()
check to ThreadStateTransition
to cover all transitions to/from java and native but I’ll investigate that as a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general we delay suspension until going back to Java or coming out of native, and avoid suspending in the transitions out of a blocked state because we could be holding VM monitors
That sounds reasonable but we must also not post any events if we are suspended, which means we have to have checks that there are no locks held if are going to do this deferred suspension. But I would expect many blocked transitions to relate to something that will causes events to be posted, so do we end up effectively doing the suspension check when coming out of the blocked state? I'm still missing the complete big picture here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or should we simply be passing allow_suspend=true
into more ThreadBlockInVM
transitions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And again what is the call stack for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me see if I have the sequence of events correct:
- we have a thread leaving Java to enter the VM, in this case to post a single-step event, which is not suspended
- when trying to acquire the JvmtiThreadState_lock the thread blocks and becomes handshake/safepoint-safe and is then suspended
- the thread acquires the lock and goes to post the event but because it has now been suspended it is an error to do so
This is almost right but let me add a couple of details.
A suspend request was processed/registered/accepted at the handshake/safepoint-safe point but physical suspension in the ThreadSelfSuspensionHandshake
has not happened yet which created a problem. Also, the problem is not that the thread goes to post the event. Suspension and event posting are racy by design. The problem is that the SingleStep
events for this thread are also enabled at the handshake/safepoint-safe point after the suspend request was processed/registered/accepted.
So what we need is a way to disable suspension so that the transient/temporary switch to _thread_blocked does not allow a suspend request to be processed. I think that is preferable to trying to find where to add a place to check for suspension and actually suspend it.
I'm also thinking toward this direction. The best way to fix this would be to disallow processing/accepting new suspend requests in MutexLocker mu(JvmtiThreadState_lock)
which is executed in context of get_jvmti_thread_state()
call:
inline JvmtiThreadState* JvmtiThreadState::state_for(JavaThread *thread, Handle thread_handle) {
// In a case of unmounted virtual thread the thread can be null.
JvmtiThreadState* state = thread_handle == nullptr ? thread->jvmti_thread_state() :
java_lang_Thread::jvmti_thread_state(thread_handle());
if (state == nullptr) {
MutexLocker mu(JvmtiThreadState_lock);
. . .
Unfortunately, it looks like it is tricky and risky to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
David, we had an internal Zoom meeting with Patricio and agreed on a couple of points (I hope, Patricio will fix me if needed).
#1:
The main (biggest) problem is the call to JvmtiEventController::thread_started()
in the function get_jvmti_thread_state()
. The function JvmtiThreadState::state_for()
is called there if a mounted virtual thread is the current thread and any thread-filtered event is enabled globally. The function JvmtiThreadState::state_for()
has this line: MutexLocker mu(JvmtiThreadState_lock);
The JvmtiThreadState_lock
allows safepointing. It is where a JVMTI suspend request for this thread has been processed/registered. The SingleStep
events for this thread are also enabled after the suspend request was processed/registered. However, suspends are not allowed by this MutexLocker
.
This is very common code path for any thread-filtered event (e.g. the event can be enabled for a specific thread).
#2:
This PR fixes this exact issue with this update:
JvmtiThreadState* JvmtiExport::get_jvmti_thread_state(JavaThread *thread) {
assert(thread == JavaThread::current(), "must be current thread");
if (thread->is_vthread_mounted() && thread->jvmti_thread_state() == nullptr) {
JvmtiEventController::thread_started(thread);
+ if (thread->is_suspended()) {
+ // suspend here if there is a suspend request
+ ThreadBlockInVM tbivm(thread);
+ }
}
. . .
But we also agreed to add a tweak for the function post_dynamic_code_generated_while_holding_locks()
to disallow suspension in this context. Then it will look like this:
+JvmtiThreadState* JvmtiExport::get_jvmti_thread_state(JavaThread *thread, bool allow_suspend) {
assert(thread == JavaThread::current(), "must be current thread");
if (thread->is_vthread_mounted() && thread->jvmti_thread_state() == nullptr) {
JvmtiEventController::thread_started(thread);
+ if (allow_suspend && thread->is_suspended()) {
+ // suspend here if there is a suspend request
+ ThreadBlockInVM tbivm(thread, true /* allow suspend */);
+ }
}
. . .
void JvmtiExport::post_dynamic_code_generated_while_holding_locks(const char* name,
address code_begin, address code_end)
{
. . .
JvmtiThreadState *state = get_jvmti_thread_state(thread, false /* allow_suspend */);
. . .
I have already pushed the minor tweak above.
#3:
We also agreed to investigate more case where a similar situation can occur withing a context/scope of all theJvmtiExport::post_*
function calls. One interesting case was found:
bool InstanceKlass::link_class_impl(TRAPS) {
. . .
if (Universe::is_fully_initialized()) {
DeoptimizationScope deopt_scope;
{
// Now mark all code that assumes the class is not linked.
// Set state under the Compile_lock also.
MutexLocker ml(THREAD, Compile_lock); <== !!!
set_init_state(linked);
CodeCache::mark_dependents_on(&deopt_scope, this);
}
// Perform the deopt handshake outside Compile_lock.
deopt_scope.deoptimize_marked();
} else {
set_init_state(linked);
}
if (JvmtiExport::should_post_class_prepare()) {
JvmtiExport::post_class_prepare(THREAD, this); <== !!!
}
}
}
return true;
}
It feels like more thoughts are needed here.
I'm thinking it would be okay to file a separate bug for this additional investigation.
Most likely we do not have a good test coverage for this, and so, no tests are failing but we may want to add it in the future. At the moment, I'm not sure of it is worth it or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the summary Serguei. I would only add that IMO, given that it’s already clear there could be many suspend points before even reaching one of this JvmtiExport::post_*
methods (like the one shown in #3), I would just address all these cases here too my moving the suspend check outside of the thread->is_vthread_mounted() && thread->jvmti_thread_state() == nullptr
conditional. But since this PR has already been blocked for too long I’m okay if we only want to address the specific suspend point in #1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we already guarded/handled suspension whilst acquiring a mutex, as a general case. ??? Now I am getting really confused about the state of the world.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess, you've already figured this out even though it pretty confusing.
One more explanation would not hurt.
We may get a new suspend request down the call to thread_started()=>state_for()
in the MutexLocker mu(JvmtiThreadState_lock)
but the the suspending is not allowed in this MutexLocker
. So, we need to add a suspend point after this call:
JvmtiEventController::thread_started(thread);
+ if (allow_suspend && thread->is_suspended()) {
+ // suspend here if there is a suspend request
+ ThreadBlockInVM tbivm(thread, true /* allow suspend */);
+ }
I'm not sure, if it is possible to allow a suspend in this MutexLocker
.
It would simplify the code if there is a way to do it.
@sspitsyn This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay as a point fix what has been proposed seems reasonable. I would like some addtional comments please.
But I would really like to see some high-level design documents for all the interactions here because I can't get complete sense of the big picture here. It seems we are forever playing whack-a-mole when it comes to virtual threads and JVMTI.
Thanks
JvmtiEventController::thread_started(thread); | ||
if (allow_suspend && thread->is_suspended()) { | ||
// suspend here if there is a suspend request | ||
ThreadBlockInVM tbivm(thread, true /* allow suspend */); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need some kind of comment before thread_started
to make it clear why we need to check suspension only on this path. Something like:
// Within thread_started we could block on a VM mutex which would allow this thread to be marked as suspended,
// so in some cases we need to honor that suspend request before procedding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This a reasonable suggestion, thanks! Will add a better comment. In fact, I forgot to add such a comment myself.
@@ -309,7 +309,7 @@ class JvmtiExport : public AllStatic { | |||
// If the jvmti_thread_state is absent and any thread filtered event | |||
// is enabled globally then it is created. | |||
// Otherwise, the thread->jvmti_thread_state() is returned. | |||
static JvmtiThreadState* get_jvmti_thread_state(JavaThread *thread); | |||
static JvmtiThreadState* get_jvmti_thread_state(JavaThread *thread, bool allow_suspend = true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a detailed comment on what the allow_suspend
parameter means and when it should be true and false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion, thanks! Will add a comment.
Thank you for looking at it again and your passion, David.
What kind of a design document can it be? The complexity comes from the fact that virtual threads can be unmounted so we can't walk all of them when JVMTI |
@@ -309,6 +309,10 @@ class JvmtiExport : public AllStatic { | |||
// If the jvmti_thread_state is absent and any thread filtered event | |||
// is enabled globally then it is created. | |||
// Otherwise, the thread->jvmti_thread_state() is returned. | |||
// The 'allow_suspend' parameter is passed as 'true' by default which work for almost all call sites. | |||
// It means that a suspend point need to be organized by this function for virtual threads if the call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// It means that a suspend point need to be organized by this function for virtual threads if the call | |
// It means that a suspend point needs to be organized by this function for virtual threads if the call |
Not sure given my confusion about the various states that are possible. We basically have a situation where we are doing something and we should not hit a safepoint check where a suspend request could be processed or a single-steo event be accepted, but we unintentionally do hit it due to blocking on an internal Mutex. This still says to me that somewhere on this code path there should be something that says "any safepoints/handshakes here should be deferred" and that would be honored by the thread-state transition code used in TBIVM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing further from me in terms of this PR.
Thanks for your patience and perseverance on this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad to see this one resolved after all, thanks!
David and Patricio, thank you for review! |
/integrate |
Going to push as commit 04ad59d.
Your commit was automatically rebased without conflicts. |
The
get_jvmti_thread_state()
function is called fromJvmtiExport::at_single_stepping_point()
. It can block for virtual threads. Then theSingleStep
events can be enabled at that point. The incorrect behavior is that theSingleStep
events will be posted even though the virtual thread has been suspended with the JVMTISuspendThread
,SuspendThreadList
, orSuspendAllVirtualThreads
. The fix is to add a suspend point for virtual threads to theget_jvmti_thread_state()
function.Testing:
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/23490/head:pull/23490
$ git checkout pull/23490
Update a local copy of the PR:
$ git checkout pull/23490
$ git pull https://git.openjdk.org/jdk.git pull/23490/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 23490
View PR using the GUI difftool:
$ git pr show -t 23490
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/23490.diff
Using Webrev
Link to Webrev Comment