Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with "assert(java_thread == _state->get_thread()) failed: Must be" #6440

Closed
wants to merge 4 commits into from

Conversation

sspitsyn
Copy link
Contributor

@sspitsyn sspitsyn commented Nov 17, 2021

The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa.
The fix is to add a check for is_exiting() status into handshake closure do_thread() early.
There following handshake closures are fixed by this update:

  • UpdateForPopTopFrameClosure
  • SetForceEarlyReturn
  • SetFramePopClosure

Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with "assert(java_thread == _state->get_thread()) failed: Must be"

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/6440/head:pull/6440
$ git checkout pull/6440

Update a local copy of the PR:
$ git checkout pull/6440
$ git pull https://git.openjdk.java.net/jdk pull/6440/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 6440

View PR using the GUI difftool:
$ git pr show -t 6440

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/6440.diff

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Nov 17, 2021

👋 Welcome back sspitsyn! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

@openjdk openjdk bot commented Nov 17, 2021

@sspitsyn The following labels will be automatically applied to this pull request:

  • hotspot
  • serviceability

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added serviceability hotspot labels Nov 17, 2021
@openjdk openjdk bot added the rfr label Nov 17, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented Nov 17, 2021

Webrevs

Copy link
Contributor

@TheRealMDoerr TheRealMDoerr left a comment

LGTM. Thanks for fixing it!

@openjdk
Copy link

@openjdk openjdk bot commented Nov 17, 2021

@sspitsyn This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with "assert(java_thread == _state->get_thread()) failed: Must be"

Reviewed-by: mdoerr, lmesnik, dcubed

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 46 new commits pushed to the master branch:

  • 0a9e76c: 8277485: Zero: Fix fast{i,f}access_0 bytecodes handling
  • 1c215f3: 8272773: Configurable card table card size
  • 1d7cef3: 8276662: Scalability bottleneck in SymbolTable::lookup_common()
  • c79a485: 8277494: [BACKOUT] JDK-8276150 Quarantined jpackage apps are labeled as "damaged"
  • 2ab43ec: 8273544: Increase test coverage for snippets
  • 2d4af22: 8277370: configure script cannot distinguish WSL version
  • a3406a1: 8277092: TestMetaspaceAllocationMT2.java#ndebug-default fails with "RuntimeException: Committed seems high: NNNN expected at most MMMM"
  • e47cc81: 8275386: Change nested classes in jdk.jlink to static nested classes
  • f609b8f: 8274946: Cleanup unnecessary calls to Throwable.initCause() in java.rmi
  • 36b59f3: 8274333: Redundant null comparison after Pattern.split
  • ... and 36 more: https://git.openjdk.java.net/jdk/compare/91607436b79126ccb999deaf38a98209dbfe6ec1...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready label Nov 17, 2021
Copy link
Member

@dholmes-ora dholmes-ora left a comment

Hi Serguei, (sorry for initially addressing to Leonid)

Something seems amiss to me.

First the checks for java_thread->threadObj() == NULL should not be necessary as the threadObj can never be NULL once it has been started and a non-started thread should not be possible by the time you reach the code doing the checks. Even if we nulled out threadObj for a terminated thread the is_exiting check would already handle that case.

Second, if the target thread is exiting then surely the suspension check should return false and so we would already give a JVMTI_ERROR_THREAD_NOT_SUSPENDED error?

Thanks,
David

@sspitsyn
Copy link
Contributor Author

@sspitsyn sspitsyn commented Nov 18, 2021

Martin and Leonid, thank you for quick review!

@sspitsyn
Copy link
Contributor Author

@sspitsyn sspitsyn commented Nov 18, 2021

Hi David,

Thank you for reviewing this!
I was also thinking about getting rid of the check java_thread->threadObj() == NULL.
Then I've decided it is safe to keep it as it was in the original UpdateForPopTopFrameClosure implementation (but later in the code). I will remove it and retest the fix.

Second, if the target thread is exiting then surely the suspension check should return
false and so we would already give a JVMTI_ERROR_THREAD_NOT_SUSPENDED error?

The assert
assert(java_thread == _state->get_thread(), "Must be");
is fired one line before the JVMTI_ERROR_THREAD_NOT_SUSPENDED code is returned.

@dholmes-ora
Copy link
Member

@dholmes-ora dholmes-ora commented Nov 18, 2021

Wouldn't it suffice to just move the assert then?

@sspitsyn
Copy link
Contributor Author

@sspitsyn sspitsyn commented Nov 18, 2021

It does not look right to check other conditions if the JavaThread is exiting.
So, I think, the java_thread->is_exiting() has to be checked first.
Please, let me know if I miss anything.

@dholmes-ora
Copy link
Member

@dholmes-ora dholmes-ora commented Nov 18, 2021

IIUC these cases all require that the target is suspended else it is an error. If the target is_exiting then it is not suspended and therefore there is an error. The suspension check should already handle an exiting thread and so there is no need to explicitly add an is_exiting check.

@sspitsyn
Copy link
Contributor Author

@sspitsyn sspitsyn commented Nov 18, 2021

It is not correct.
At least, there is this case:

/* non suspended and exiting thread */
    case 6:
        set_watch_ev(1); /* watch JVMTI events */
        popframe_err = (jvmti->PopFrame(frameThr)); /* explode the bomb */
        set_watch_ev(0); /* ignore again JVMTI events */
        if (popframe_err != JVMTI_ERROR_THREAD_NOT_SUSPENDED &&
            popframe_err != JVMTI_ERROR_THREAD_NOT_ALIVE) {
            printf("TEST FAILED: the function PopFrame() returned the error %d: %s\n",
                popframe_err, TranslateError(popframe_err));
            printf("\tBut it should return the error JVMTI_ERROR_THREAD_NOT_SUSPENDED or JVMTI_ERROR_THREAD_NOT_ALIVE.\n");
            return STATUS_FAILED;
        }
        break;
    }

In other cases, the test constructs cases so that the tested thread is alive when expected.
The test was easily failing before in 10th of runs but now it does not fail in 100 runs.
I'll try to run this test 1000 times on all platforms.

@sspitsyn
Copy link
Contributor Author

@sspitsyn sspitsyn commented Nov 18, 2021

Also, if the target thread is exiting then the PopFrame should return error code JVMTI_ERROR_THREAD_NOT_ALIVE, but not JVMTI_ERROR_THREAD_NOT_SUSPENDED. It does not matter what this test is expecting.

@dholmes-ora
Copy link
Member

@dholmes-ora dholmes-ora commented Nov 18, 2021

It seems somewhat subjective whether a thread that is exiting and thus still on its way to becoming "not alive" needs to report "not alive" versus "not suspended". As there appears to be no synchronization with the target in this case what stops it from transitioning to "is_exiting" the moment after the "is_exiting" check returns false, but before you hit the assertion?

@dholmes-ora
Copy link
Member

@dholmes-ora dholmes-ora commented Nov 18, 2021

Ignore that last question - the target is in a handshake so can't change state.


if (java_thread->is_exiting()) {
return; /* JVMTI_ERROR_THREAD_NOT_ALIVE (default) */
}
assert(java_thread == _state->get_thread(), "Must be");
Copy link
Member

@dcubed-ojdk dcubed-ojdk Nov 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert() is the site of the original test failure. I haven't yet
looked at the locations of the other changes.

The is_exiting() check is made under the protection of the
JvmtiThreadState_lock so an unsuspended target thread that is
exiting cannot reach the point where the _state is updated to
clear the JavaThread* so we can't fail the assert() if the
is_exiting() check has returned false.

Copy link
Contributor Author

@sspitsyn sspitsyn Nov 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dan,
Thank you for reviewing this!
I'm not sure, I correctly understand you here.
Are you saying that you agree with this change?
In fact, the thread state can not be changed (and the assert fired) after the is_exiting() check is made even without JvmtiThreadState_lock protection because it is inside of a handshake execution.

Copy link
Member

@dcubed-ojdk dcubed-ojdk Nov 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the is_exiting() check addition.

I forgot that we're executing a Handshake doit() function. So we have a couple
of reasons why an unsuspended target thread can't change from !is_exiting()
to is_exiting() while we are in this function.

Copy link
Member

@dholmes-ora dholmes-ora Nov 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again this introduces a more precise state check but also changes the behaviour by now reporting NOT_ALIVE instead of NOT_SUSPENDED. The assertion failure can be fixed by simply moving the assertion to after the suspension check.

Copy link
Member

@dcubed-ojdk dcubed-ojdk left a comment

I don't see a reason for the change in SetForceEarlyReturn::doit(),
but I'm okay with the other changes.

if (java_thread->is_exiting()) {
return; /* JVMTI_ERROR_THREAD_NOT_ALIVE (default) */
}
if (!self) {
if (!java_thread->is_suspended()) {
_result = JVMTI_ERROR_THREAD_NOT_SUSPENDED;
Copy link
Member

@dcubed-ojdk dcubed-ojdk Nov 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see an obvious reason for this is_exiting() check.

Copy link
Contributor Author

@sspitsyn sspitsyn Nov 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I see similar check in the force_early_return() function:

  if (state == NULL) {
    return JVMTI_ERROR_THREAD_NOT_ALIVE;
  }

Would it better to replace it with this check instead? :

  if (java_thread->is_exiting()) {
    return JVMTI_ERROR_THREAD_NOT_ALIVE;
  }

Removing this check and keep the one inside the handshake would be even better.

I would also add this line for symmetry with two other cases:

+  MutexLocker mu(JvmtiThreadState_lock);
  SetForceEarlyReturn op(state, value, tos);

Copy link
Member

@dcubed-ojdk dcubed-ojdk Nov 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that I don't see why you added the is_exiting() check
since I don't see a race in that function, i.e., there's no assert() in
this function that you need to protect.

As for adding the MutexLocker mu(JvmtiThreadState_lock), you'll
have to analyze and justify why you would need to add that lock grab
independent of this fix. I'm not seeing a bug there, but I haven't looked
very closely.

Copy link
Member

@dholmes-ora dholmes-ora Nov 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The is_exiting check changes the behaviour from reporting JVMTI_ERROR_THREAD_NOT_SUSPENDED to JVMTI_ERROR_THREAD_NOT_ALIVE. Arguably it is a more precise answer, but it is somewhat splitting hairs. To me it might be clearer to the developer what their logic error is if they get NOT_SUSPENDED rather than NOT_ALIVE. Either way this change is not needed to fix any known bug and the change is behaviour seems questionable.

if (java_thread->is_exiting()) {
return; /* JVMTI_ERROR_THREAD_NOT_ALIVE (default) */
}
assert(_state->get_thread() == java_thread, "Must be");
Copy link
Member

@dcubed-ojdk dcubed-ojdk Nov 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assert() on L1625 is subject to the same race as the original site.
This is_exiting() check is made under the protection of the
JvmtiThreadState_lock so it is sufficient to protect that assert().

Copy link
Contributor Author

@sspitsyn sspitsyn Nov 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thanks!

Copy link
Member

@dholmes-ora dholmes-ora Nov 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above.

if (java_thread->is_exiting()) {
return; /* JVMTI_ERROR_THREAD_NOT_ALIVE (default) */
}
if (!self) {
if (!java_thread->is_suspended()) {
_result = JVMTI_ERROR_THREAD_NOT_SUSPENDED;
Copy link
Member

@dcubed-ojdk dcubed-ojdk Nov 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that I don't see why you added the is_exiting() check
since I don't see a race in that function, i.e., there's no assert() in
this function that you need to protect.

As for adding the MutexLocker mu(JvmtiThreadState_lock), you'll
have to analyze and justify why you would need to add that lock grab
independent of this fix. I'm not seeing a bug there, but I haven't looked
very closely.


if (java_thread->is_exiting()) {
return; /* JVMTI_ERROR_THREAD_NOT_ALIVE (default) */
}
assert(java_thread == _state->get_thread(), "Must be");
Copy link
Member

@dcubed-ojdk dcubed-ojdk Nov 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the is_exiting() check addition.

I forgot that we're executing a Handshake doit() function. So we have a couple
of reasons why an unsuspended target thread can't change from !is_exiting()
to is_exiting() while we are in this function.

@sspitsyn
Copy link
Contributor Author

@sspitsyn sspitsyn commented Nov 19, 2021

Dan, thank you for review!

@sspitsyn
Copy link
Contributor Author

@sspitsyn sspitsyn commented Nov 19, 2021

David,
Thank you for your questions.
I'm not sure if all of them are resolved though. :)
Please, let me know if it is the case.

Copy link
Contributor

@TheRealMDoerr TheRealMDoerr left a comment

Still good. Thumbs up from my side.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Hi Serguei,

I still feel the bug here can be fixed simply by moving assertions, rather than by introducing a change in behaviour as to what error code would be returned.

But I'll leave to serviceability folk to decide.

Thanks,
David

if (java_thread->is_exiting()) {
return; /* JVMTI_ERROR_THREAD_NOT_ALIVE (default) */
}
if (!self) {
if (!java_thread->is_suspended()) {
_result = JVMTI_ERROR_THREAD_NOT_SUSPENDED;
Copy link
Member

@dholmes-ora dholmes-ora Nov 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The is_exiting check changes the behaviour from reporting JVMTI_ERROR_THREAD_NOT_SUSPENDED to JVMTI_ERROR_THREAD_NOT_ALIVE. Arguably it is a more precise answer, but it is somewhat splitting hairs. To me it might be clearer to the developer what their logic error is if they get NOT_SUSPENDED rather than NOT_ALIVE. Either way this change is not needed to fix any known bug and the change is behaviour seems questionable.


if (java_thread->is_exiting()) {
return; /* JVMTI_ERROR_THREAD_NOT_ALIVE (default) */
}
assert(java_thread == _state->get_thread(), "Must be");
Copy link
Member

@dholmes-ora dholmes-ora Nov 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again this introduces a more precise state check but also changes the behaviour by now reporting NOT_ALIVE instead of NOT_SUSPENDED. The assertion failure can be fixed by simply moving the assertion to after the suspension check.

if (java_thread->is_exiting()) {
return; /* JVMTI_ERROR_THREAD_NOT_ALIVE (default) */
}
assert(_state->get_thread() == java_thread, "Must be");
Copy link
Member

@dholmes-ora dholmes-ora Nov 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above.

@sspitsyn
Copy link
Contributor Author

@sspitsyn sspitsyn commented Nov 22, 2021

Hi David,
Thank you for looking at this and your comments.
Exiting thread should not be in suspended state.
If PopFrame reports THREAD_NOT_SUSPENDED instead of THREAD_NOT_ALIVE then it will never report THREAD_NOT_ALIVE as the spec says.
So, I'm pretty sure that the THREAD_NOT_ALIVE error code should normally take priority.
It is why I prefer current fix over moving the assert.
But I kind of understand your concern. Thank you for sharing it!
Thanks,
Serguei

@sspitsyn
Copy link
Contributor Author

@sspitsyn sspitsyn commented Nov 22, 2021

/integrate

@openjdk
Copy link

@openjdk openjdk bot commented Nov 22, 2021

Going to push as commit 32839ba.
Since your change was applied there have been 50 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot closed this Nov 22, 2021
@openjdk openjdk bot added integrated and removed ready rfr labels Nov 22, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Nov 22, 2021

@sspitsyn Pushed as commit 32839ba.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@sspitsyn sspitsyn deleted the br3 branch Nov 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot integrated serviceability
5 participants