Skip to content
This repository has been archived by the owner on Sep 19, 2023. It is now read-only.

8289091: move oop safety check from SharedRuntime::get_java_tid() to JavaThread::threadObj() #69

Closed
wants to merge 5 commits into from

Conversation

dcubed-ojdk
Copy link
Member

@dcubed-ojdk dcubed-ojdk commented Jun 24, 2022

A trivial move of the oop safety check from SharedRuntime::get_java_tid() to
JavaThread::threadObj(). Also made adjustments to the threadObj() calls in
JavaThread::print_on_error() and JavaThread::get_thread_name_string() so
that we don't get secondary crashes when a JavaThread crashes after it has
detached the GC barrier.

Tested with Mach5 Tier[1-7]. A Mach5 Tier8 will be started this weekend.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8289091: move oop safety check from SharedRuntime::get_java_tid() to JavaThread::threadObj()

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk19 pull/69/head:pull/69
$ git checkout pull/69

Update a local copy of the PR:
$ git checkout pull/69
$ git pull https://git.openjdk.org/jdk19 pull/69/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 69

View PR using the GUI difftool:
$ git pr show -t 69

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk19/pull/69.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 24, 2022

👋 Welcome back dcubed! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@dcubed-ojdk
Copy link
Member Author

/label add hotspot-runtime

@dcubed-ojdk dcubed-ojdk marked this pull request as ready for review June 24, 2022 19:53
@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 24, 2022
@dcubed-ojdk
Copy link
Member Author

@dholmes-ora, @fisk, @pchilano, and @robehn - This is a followup from
JDK-8288139 JavaThread touches oop after GC barrier is detached

@openjdk openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label Jun 24, 2022
@openjdk
Copy link

openjdk bot commented Jun 24, 2022

@dcubed-ojdk
The hotspot-runtime label was successfully added.

@mlbridge
Copy link

mlbridge bot commented Jun 24, 2022

Webrevs

Copy link
Contributor

@robehn robehn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me, thanks!

@openjdk
Copy link

openjdk bot commented Jun 26, 2022

@dcubed-ojdk This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8289091: move oop safety check from SharedRuntime::get_java_tid() to JavaThread::threadObj()

Reviewed-by: rehn, dholmes

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 41 new commits pushed to the master branch:

  • dc4edd3: 8289183: jdk.jfr.consumer.RecordedThread.getId references Thread::getId, should be Thread::threadId
  • c4dcce4: 8289619: JVMTI SelfSuspendDisablerTest.java failed with RuntimeException: Test FAILED: Unexpected thread state
  • f5cdaba: 8245268: -Xcomp is missing from java launcher documentation
  • 9515560: 8288703: GetThreadState returns 0 for virtual thread that has terminated
  • cfc9a88: 8288854: getLocalGraphicsEnvironment() on for multi-screen setups throws exception NPE
  • 9925014: 8280320: C2: Loop opts are missing during OSR compilation
  • 8e01ffb: 8289570: SegmentAllocator:allocateUtf8String(String str) default behavior mismatch to spec
  • 20124ac: 8289585: ProblemList sun/tools/jhsdb/JStackStressTest.java on linux-aarch64
  • 604ea90: 8289549: ISO 4217 Amendment 172 Update
  • 9549777: 8284358: Unreachable loop is not removed from C2 IR, leading to a broken graph
  • ... and 31 more: https://git.openjdk.org/jdk19/compare/6458ebc8e4cb11d99f7447e01f890ba36ad41664...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jun 26, 2022
Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main change is fine but the "other adjustments" are not correct/appropriate. The state of the target thread is not the issue.

Thanks.

oop thread_obj = threadObj();
if (thread_obj != NULL) {
if (java_lang_Thread::is_daemon(thread_obj)) st->print(" daemon");
if (is_oop_safe()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong - is_oop_safe only has relevance when called on the current JavaThread. It is the current thread that must be oop_safe, not the target thread we are printing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I had that nagging feeling when I thought about this fix
while working on chores on Sunday...

if (name != NULL) {
if (buf == NULL) {
name_str = java_lang_String::as_utf8_string(name);
if (is_oop_safe()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment - it is the current thread that must be oop_safe. If the target thread has exited then we will detect that via the null threadObj.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

@dcubed-ojdk
Copy link
Member Author

@robehn - Thanks for the review.
@dholmes-ora - Thanks for the review. I need to rework the "other adjustments"
portion of the fix.

@dcubed-ojdk
Copy link
Member Author

Testing the V01 patch with Mach5 Tier1 now; it's 2/3 of the way done and so
far is looking good. Will follow with additional Mach5 Tiers.

@dcubed-ojdk
Copy link
Member Author

Also tested by temporarily reintroducing the bug fixed by:
JDK-8288139 JavaThread touches oop after GC barrier is detached
and verifying that the bad code is still detected and that the
hs_err_pid file looks good.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Dan,
The additional checks to avoid secondary crashes seem like a lot of effort for little benefit, afterall the windows during which they would be applicable is quite small on the thread termination path. Somewhat ironically(?) the primary need for these additional checks would be when the current thread has already performed an unsafe oop access and so hit the guarantee and is now doing the thread dump for the hs_err file. Because of that I'm going to approve this, but in general I don't like us making the code jump through hoops for these kind of secondary failure avoidance issues. The current thread is the primary interest during a crash.

Thanks.

Comment on lines 2150 to 2151
Thread* current = Thread::current_or_null();
if (current != nullptr && (!current->is_Java_thread() || JavaThread::cast(current)->is_oop_safe())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized, we can't have a null current thread if we are calling this as that is checked at a higher level. But we should be using current_or_null_safe() here as we could be in a signal-handling context.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used Thread::current_or_null() just to make the code more bullet proof. In all of
my testing I never ran into a case where Thread::current() returned nullptr.
I'm going to switch back to Thread::current() and remove the extra handling for
the current == nullptr case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the code can be invoked in the context of a signal handler, as can happen for the error reporting path, then you must use Thread::current_or_null_safe() or risk introducing a deadlock or secondary crash.
It occurs to me that this may mean the existing safety-check is not correct because it may end up being checked in that context too.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry Dan have to revoke my approval. There are still some issues here that need fixing and I don't like the impact on the code.

Comment on lines 2218 to 2220
if (current == nullptr) {
// Current thread is not attached so it can't safely determine this
// JavaThread's name so use the default thread name.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be possible, but again current_or_null_safe should be used.

Though this code is used a lot in normal operation so the additional overhead of this is more significant than the print_on_error case.

I think this check should be moved to the caller if needed (ie the print_on_error code), as in normal use it is not possible to fail this check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to switch back to Thread::current() and remove the extra handling for
the current == nullptr case.

@dcubed-ojdk
Copy link
Member Author

@dholmes-ora - Thanks for the re-review. I'm going to update the fix again and
switch back to using Thread::current() instead of Thread::current_or_null().

The reason that this particular handling of a secondary failure is important is
that a secondary failure in printing the name of the failing thread will prevent
entries for following threads from being printed in the thread list in the hs_err_pid
file.

Here's an example to show placement:

Java Threads: ( => current thread )
  0x00007fd64d808a00 JavaThread "Unknown thread" [_thread_blocked, id=7427, stack(0x0000700003c18000,0x0000700003d18000)]
  0x00007fd64e80d400 JavaThread "Unknown thread" [_thread_blocked, id=43011, stack(0x0000700004433000,0x0000700004533000)]
  0x00007fd64e80e200 JavaThread "Unknown thread" [_thread_blocked, id=42755, stack(0x0000700004536000,0x0000700004636000)]
  0x00007fd64e80e800 JavaThread "Unknown thread" [_thread_blocked, id=42243, stack(0x0000700004639000,0x0000700004739000)]
  0x00007fd64e80ee00 JavaThread "Unknown thread" [_thread_blocked, id=22787, stack(0x000070000473c000,0x000070000483c000)]
  0x00007fd64e80f400 JavaThread "Unknown thread" [_thread_blocked, id=41731, stack(0x000070000483f000,0x000070000493f000)]
  0x00007fd64e81e800 JavaThread "Unknown thread" [_thread_blocked, id=41219, stack(0x0000700004942000,0x0000700004a42000)]
  0x00007fd64e81ee00 JavaThread "Unknown thread" [_thread_blocked, id=23299, stack(0x0000700004a45000,0x0000700004b45000)]
  0x00007fd650819000 JavaThread "Unknown thread" [_thread_blocked, id=40451, stack(0x0000700004b48000,0x0000700004c48000)]
  0x00007fd64e851400 JavaThread "Unknown thread" [_thread_blocked, id=23811, stack(0x0000700004c4b000,0x0000700004d4b000)]
  0x00007fd64d84e200 JavaThread "Unknown thread" [_thread_blocked, id=24067, stack(0x0000700004e51000,0x0000700004f51000)]
  0x00007fd64e81c600 JavaThread "Unknown thread" [_thread_blocked, id=39171, stack(0x0000700004f54000,0x0000700005054000)] _threads_hazard_ptr=0x00007fd64e0041b0
  0x00007fd64d80d000 JavaThread "Unknown thread" [_thread_blocked, id=38915, stack(0x0000700005057000,0x0000700005157000)]
=>0x00007fd650812000 JavaThread "<no-name - current JavaThread has exited>" [_thread_in_vm, id=24835, stack(0x000070000515a000,0x000070000525a000)]

The entry prefixed with "=>" is the crashing thread that is past the GC barrier
detach point. In this example, it happens to be the last thread in the Java Threads: ( => current thread ) section of the hs_err_pid file. However, if the failing thread
happened to be earlier in the list and we didn't prevent the secondary error, then
we would be missing entries from the section.

@dcubed-ojdk
Copy link
Member Author

dcubed-ojdk commented Jun 28, 2022

Testing the V02 patch with Mach5 Tier[1-7] now.

Also tested by temporarily reintroducing the bug fixed by:
JDK-8288139 JavaThread touches oop after GC barrier is detached
and verifying that the bad code is still detected and that the
hs_err_pid file looks good.

@dcubed-ojdk
Copy link
Member Author

@dholmes-ora - please re-review when you get the chance.
Mach5 Tier[1-4] has passed and Tier[5-7] are running along nicely.

@dholmes-ora
Copy link
Member

The reason that this particular handling of a secondary failure is important is
that a secondary failure in printing the name of the failing thread will prevent
entries for following threads from being printed in the thread list in the hs_err_pid
file.

Yes I realise that a secondary crash loses some information, but my contention is that:

a) the likelihood of crashing after detaching the GC barrier is very, very small; and
b) in such a crash it is only the crashing thread that is really of interest, not the other threads in the system

So to me trying to make secondary crash handling more robust in the current case is not worth the cost of the extra checks. If the checks were only in the crash reporting path then that would be okay, but not when they impact normal code execution. Sorry.

Comment on lines +2152 to +2153
// Only access threadObj() if current thread is not a JavaThread
// or if it is a JavaThread that can safely access oops.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can a non-JavaThread safely access the oop? Is the only safe case the VMThread at a safepoint?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of the if (!current->is_Java_thread() || part of the if-statement
is to allow the code to work as it did before for the non-JavaThread case. Before
this fix, if a non-JavaThread called into this code, then it was allowed to execute
this code. I've preserved that behavior and I've see no failures that indicate that
this is a problem.

Do I know what non-JavaThreads might wander in here? No I don't.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realise this is pre-existing behaviour. It seems odd that a non-JavaThread can touch an oop when an exiting JavaThread cannot - how are they different? Or do we just cross our fingers and hope for the best with a non-JavaThread because it is rare? Perhaps @fisk can explain?

if (thread_obj != NULL) {
if (java_lang_Thread::is_daemon(thread_obj)) st->print(" daemon");
Thread* current = Thread::current();
if (!current->is_Java_thread() || JavaThread::cast(current)->is_oop_safe()) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really tricky. Why not have threadObj() return null if this is happening. Then you can say why in that function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of the guarantee() in threadObj() is catch bad calls to
threadObj() that a thread makes after it has passed the GC barrier detach
point. Returning nullptr from threadObj() would defeat this purpose.

@dcubed-ojdk
Copy link
Member Author

@dholmes-ora - I'm very glad that I resisted putting the check in threadObj()
when I made the changes for:

JDK-8288139 JavaThread touches oop after GC barrier is detached

The review for that PR took too long and clearly this discussion would have made
it take longer. JDK-8288139 is fixed and Zhengyu can make progress on his issue.

I don't see a good way to make forward progress in this PR. It doesn't look like you
and I will reach a solution that's acceptable to both of us.

The only good news is that my many rounds of testing on this fix have convinced me
threadObj() is not being abused in the current code base except for the very narrow
case where we crash (for whatever reason) after the GC barrier is detached. Should
the GC team chose to put in an appropriate trap in their code that catches a thread
accessing an oop after the GC barrier is detached, then threadObj() will be automatically
sanity checked by that code.

Thanks for your time in doing several rounds of review.

@dcubed-ojdk
Copy link
Member Author

Just to be clear:
@dholmes-ora wrote above:

So to me trying to make secondary crash handling more robust in the current case
is not worth the cost of the extra checks. If the checks were only in the crash
reporting path then that would be okay, but not when they impact normal code
execution. Sorry.

The code we're arguing about is in JavaThread::get_thread_name_string() and
when you look at the PR with white space changes disabled, you'll see this new
code which is executed in the "normal code" case:

  Thread* current = Thread::current();
  if (!current->is_Java_thread() || JavaThread::cast(current)->is_oop_safe()) {
    // Only access threadObj() if current thread is not a JavaThread
    // or if it is a JavaThread that can safely access oops.

which is one Thread::current() call and one if-statement.

There is also this new code block that's only executed in the error case:

  } else {
    // Current JavaThread has exited...
    if (current == this) {
      // ... and is asking about itself:
      name_str = "<no-name - current JavaThread has exited>";
    } else {
      // ... and it can't safely determine this JavaThread's name so
      // use the default thread name.
      name_str = Thread::name();
    }
  }

so none of the else-block matters in the "normal code" case.

@dholmes-ora
Copy link
Member

I'm very glad that I resisted putting the check in threadObj() when I made the changes for ...

The problem has not been (until now) what you put in threadObj but the other checks you decided to put in.

But now the issue has been raised, any code that can be executed in the context of a signal handler must use Thread::current_or_null_safe(). I don't know if the sharedRuntime code falls into that category but a call to threadObj() certainly can.

Sorry this has not been smooth sailing.

@dcubed-ojdk
Copy link
Member Author

I've re-read the history behind:

JDK-8132510 Replace ThreadLocalStorage with compiler/language-based thread-local variables

which is the fix that introduced Thread::current_or_null_safe(). Wow does that fix
and the code review process bring back memories. I remember the struggle to get
the fix in before JDK9 FC... 10 releases ago... yikes!

I'm mulling and researching on what to do...

…rrent() since threadObj() can be called by a signal handler.
@dcubed-ojdk
Copy link
Member Author

This latest version has only been tested with Mach5 Tier1 so far.

@dcubed-ojdk
Copy link
Member Author

In JavaThread::get_thread_name_string() and when you look at the PR with
white space changes disabled, you'll see this new code which is executed
in the "normal code" case:

  Thread* current = Thread::current_or_null_safe();
  assert(current != nullptr, "cannot be called by a detached thread");
  if (!current->is_Java_thread() || JavaThread::cast(current)->is_oop_safe()) {
    // Only access threadObj() if current thread is not a JavaThread
    // or if it is a JavaThread that can safely access oops.

so I've switched to Thread::current_or_null_safe and added an
assert() for the current value not being nullptr.

@dcubed-ojdk
Copy link
Member Author

dcubed-ojdk commented Jul 1, 2022

@dholmes-ora - please re-review when you get the chance. I've started a
new round of testing in Mach5.

Mach5 Tier[1-8] testing is done and there are no related failures.

Copy link
Contributor

@robehn robehn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still good.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes Dan. I can live with this version. :)

Thanks.

@dcubed-ojdk
Copy link
Member Author

@robehn and @dholmes-ora - Thanks for the re-reviews.

/integrate

@openjdk
Copy link

openjdk bot commented Jul 5, 2022

Going to push as commit 30e134e.
Since your change was applied there have been 46 commits pushed to the master branch:

  • 29ea642: 8287847: Fatal Error when suspending virtual thread after it has terminated
  • f640fc5: 8067757: Incorrect HTML generation for copied javadoc with multiple @throws tags
  • 0dff327: 8289569: [test] java/lang/ProcessBuilder/Basic.java fails on Alpine/musl
  • 1a27164: 8287851: C2 crash: assert(t->meet(t0) == t) failed: Not monotonic
  • 5b5bc6c: 8287672: jtreg test com/sun/jndi/ldap/LdapPoolTimeoutTest.java fails intermittently in nightly run
  • dc4edd3: 8289183: jdk.jfr.consumer.RecordedThread.getId references Thread::getId, should be Thread::threadId
  • c4dcce4: 8289619: JVMTI SelfSuspendDisablerTest.java failed with RuntimeException: Test FAILED: Unexpected thread state
  • f5cdaba: 8245268: -Xcomp is missing from java launcher documentation
  • 9515560: 8288703: GetThreadState returns 0 for virtual thread that has terminated
  • cfc9a88: 8288854: getLocalGraphicsEnvironment() on for multi-screen setups throws exception NPE
  • ... and 36 more: https://git.openjdk.org/jdk19/compare/6458ebc8e4cb11d99f7447e01f890ba36ad41664...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jul 5, 2022
@openjdk openjdk bot closed this Jul 5, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jul 5, 2022
@openjdk
Copy link

openjdk bot commented Jul 5, 2022

@dcubed-ojdk Pushed as commit 30e134e.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@dcubed-ojdk dcubed-ojdk deleted the JDK-8289091 branch July 6, 2022 19:03
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
hotspot-runtime hotspot-runtime-dev@openjdk.org integrated Pull request has been integrated
4 participants