8289091: move oop safety check from SharedRuntime::get_java_tid() to JavaThread::threadObj() #69
Conversation
👋 Welcome back dcubed! A progress list of the required criteria for merging this PR into |
/label add hotspot-runtime |
@dholmes-ora, @fisk, @pchilano, and @robehn - This is a followup from |
@dcubed-ojdk |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine to me, thanks!
@dcubed-ojdk This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 41 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main change is fine but the "other adjustments" are not correct/appropriate. The state of the target thread is not the issue.
Thanks.
src/hotspot/share/runtime/thread.cpp
Outdated
oop thread_obj = threadObj(); | ||
if (thread_obj != NULL) { | ||
if (java_lang_Thread::is_daemon(thread_obj)) st->print(" daemon"); | ||
if (is_oop_safe()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wrong - is_oop_safe
only has relevance when called on the current JavaThread. It is the current thread that must be oop_safe, not the target thread we are printing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I had that nagging feeling when I thought about this fix
while working on chores on Sunday...
src/hotspot/share/runtime/thread.cpp
Outdated
if (name != NULL) { | ||
if (buf == NULL) { | ||
name_str = java_lang_String::as_utf8_string(name); | ||
if (is_oop_safe()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment - it is the current thread that must be oop_safe. If the target thread has exited then we will detect that via the null threadObj.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
@robehn - Thanks for the review. |
Testing the V01 patch with Mach5 Tier1 now; it's 2/3 of the way done and so |
Also tested by temporarily reintroducing the bug fixed by: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Dan,
The additional checks to avoid secondary crashes seem like a lot of effort for little benefit, afterall the windows during which they would be applicable is quite small on the thread termination path. Somewhat ironically(?) the primary need for these additional checks would be when the current thread has already performed an unsafe oop access and so hit the guarantee and is now doing the thread dump for the hs_err file. Because of that I'm going to approve this, but in general I don't like us making the code jump through hoops for these kind of secondary failure avoidance issues. The current thread is the primary interest during a crash.
Thanks.
src/hotspot/share/runtime/thread.cpp
Outdated
Thread* current = Thread::current_or_null(); | ||
if (current != nullptr && (!current->is_Java_thread() || JavaThread::cast(current)->is_oop_safe())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realized, we can't have a null current thread if we are calling this as that is checked at a higher level. But we should be using current_or_null_safe()
here as we could be in a signal-handling context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used Thread::current_or_null()
just to make the code more bullet proof. In all of
my testing I never ran into a case where Thread::current()
returned nullptr
.
I'm going to switch back to Thread::current()
and remove the extra handling for
the current == nullptr
case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the code can be invoked in the context of a signal handler, as can happen for the error reporting path, then you must use Thread::current_or_null_safe()
or risk introducing a deadlock or secondary crash.
It occurs to me that this may mean the existing safety-check is not correct because it may end up being checked in that context too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry Dan have to revoke my approval. There are still some issues here that need fixing and I don't like the impact on the code.
src/hotspot/share/runtime/thread.cpp
Outdated
if (current == nullptr) { | ||
// Current thread is not attached so it can't safely determine this | ||
// JavaThread's name so use the default thread name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be possible, but again current_or_null_safe
should be used.
Though this code is used a lot in normal operation so the additional overhead of this is more significant than the print_on_error case.
I think this check should be moved to the caller if needed (ie the print_on_error code), as in normal use it is not possible to fail this check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to switch back to Thread::current()
and remove the extra handling for
the current == nullptr
case.
@dholmes-ora - Thanks for the re-review. I'm going to update the fix again and The reason that this particular handling of a secondary failure is important is Here's an example to show placement:
The entry prefixed with "=>" is the crashing thread that is past the GC barrier |
Testing the V02 patch with Mach5 Tier[1-7] now. Also tested by temporarily reintroducing the bug fixed by: |
@dholmes-ora - please re-review when you get the chance. |
Yes I realise that a secondary crash loses some information, but my contention is that: a) the likelihood of crashing after detaching the GC barrier is very, very small; and So to me trying to make secondary crash handling more robust in the current case is not worth the cost of the extra checks. If the checks were only in the crash reporting path then that would be okay, but not when they impact normal code execution. Sorry. |
// Only access threadObj() if current thread is not a JavaThread | ||
// or if it is a JavaThread that can safely access oops. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can a non-JavaThread safely access the oop? Is the only safe case the VMThread at a safepoint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of the if (!current->is_Java_thread() ||
part of the if-statement
is to allow the code to work as it did before for the non-JavaThread case. Before
this fix, if a non-JavaThread called into this code, then it was allowed to execute
this code. I've preserved that behavior and I've see no failures that indicate that
this is a problem.
Do I know what non-JavaThreads might wander in here? No I don't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realise this is pre-existing behaviour. It seems odd that a non-JavaThread can touch an oop when an exiting JavaThread cannot - how are they different? Or do we just cross our fingers and hope for the best with a non-JavaThread because it is rare? Perhaps @fisk can explain?
if (thread_obj != NULL) { | ||
if (java_lang_Thread::is_daemon(thread_obj)) st->print(" daemon"); | ||
Thread* current = Thread::current(); | ||
if (!current->is_Java_thread() || JavaThread::cast(current)->is_oop_safe()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really tricky. Why not have threadObj() return null if this is happening. Then you can say why in that function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of the guarantee() in threadObj() is catch bad calls to
threadObj() that a thread makes after it has passed the GC barrier detach
point. Returning nullptr from threadObj() would defeat this purpose.
@dholmes-ora - I'm very glad that I resisted putting the check in JDK-8288139 JavaThread touches oop after GC barrier is detached The review for that PR took too long and clearly this discussion would have made I don't see a good way to make forward progress in this PR. It doesn't look like you The only good news is that my many rounds of testing on this fix have convinced me Thanks for your time in doing several rounds of review. |
Just to be clear:
The code we're arguing about is in
which is one Thread::current() call and one if-statement. There is also this new code block that's only executed in the error case:
so none of the else-block matters in the "normal code" case. |
The problem has not been (until now) what you put in threadObj but the other checks you decided to put in. But now the issue has been raised, any code that can be executed in the context of a signal handler must use Sorry this has not been smooth sailing. |
I've re-read the history behind: JDK-8132510 Replace ThreadLocalStorage with compiler/language-based thread-local variables which is the fix that introduced I'm mulling and researching on what to do... |
…rrent() since threadObj() can be called by a signal handler.
This latest version has only been tested with Mach5 Tier1 so far. |
In JavaThread::get_thread_name_string() and when you look at the PR with
so I've switched to |
…n non-release builds.
@dholmes-ora - please re-review when you get the chance. I've started a Mach5 Tier[1-8] testing is done and there are no related failures. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes Dan. I can live with this version. :)
Thanks.
@robehn and @dholmes-ora - Thanks for the re-reviews. /integrate |
Going to push as commit 30e134e.
Your commit was automatically rebased without conflicts. |
@dcubed-ojdk Pushed as commit 30e134e. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
A trivial move of the oop safety check from SharedRuntime::get_java_tid() to
JavaThread::threadObj(). Also made adjustments to the threadObj() calls in
JavaThread::print_on_error() and JavaThread::get_thread_name_string() so
that we don't get secondary crashes when a JavaThread crashes after it has
detached the GC barrier.
Tested with Mach5 Tier[1-7]. A Mach5 Tier8 will be started this weekend.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk19 pull/69/head:pull/69
$ git checkout pull/69
Update a local copy of the PR:
$ git checkout pull/69
$ git pull https://git.openjdk.org/jdk19 pull/69/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 69
View PR using the GUI difftool:
$ git pr show -t 69
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk19/pull/69.diff