-
Notifications
You must be signed in to change notification settings - Fork 5.8k
8267842: SIGSEGV in get_current_contended_monitor #4224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8267842: SIGSEGV in get_current_contended_monitor #4224
Conversation
👋 Welcome back mdoerr! A progress list of the required criteria for merging this PR into |
@TheRealMDoerr The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Martin,
your fix looks good but I'm a little concerned because there are other call sites which us a similar pattern. E.g. in jvmtiEnvBase.cpp
:
vmtiEnvBase::get_current_contended_monitor(JavaThread *calling_thread, JavaThread *java_thread, jobject *monitor_ptr) {
Thread *current_thread = Thread::current();
assert(java_thread->is_handshake_safe_for(current_thread),
"call by myself or at handshake");
oop obj = NULL;
// The ObjectMonitor* can't be async deflated since we are either
// at a safepoint or the calling thread is operating on itself so
// it cannot leave the underlying wait()/enter() call.
ObjectMonitor *mon = java_thread->current_waiting_monitor();
if (mon == NULL) {
// thread is not doing an Object.wait() call
mon = java_thread->current_pending_monitor();
if (mon != NULL) {
// The thread is trying to enter() an ObjectMonitor.
obj = mon->object();
assert(obj != NULL, "ObjectMonitor should have a valid object!");
}
// implied else: no contended ObjectMonitor
} else {
// thread is doing an Object.wait() call
obj = mon->object();
assert(obj != NULL, "Object.wait() should have an object");
}
So I wonder if we shouldn't make current_waiting_monitor()
/current_pending_monitor()
return volatile pointers to make it clear to the callers that these pointers can change at any time?
I'm also not that deep into ThreadService
& al. to understand what happens after your fix. Now you don't reload the waiting monitor but you might use it although it has already been cleared out from the thread (in the case where you previously crashed). Is that still OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this should be changed to perform the read once by using Atomic::load? That's the guidance we've given the last few years. Some background for this: JDK-8234192.
Hi Volker, I had seen the JVMTIEnvBase version of it. The comment says: The ThreadService version's comment says: So the affected code is a special usage. I don't know if a more generic fix would be desirable. |
@TheRealMDoerr - You should also add the Serviceability group for this reivew. |
/label add serviceability |
@TheRealMDoerr |
Hi Stefan, thanks for looking at it and thanks for the pointer. I do remember that Atomic::load should be preferred. But I would have to use it in thread.hpp and I don't know if I should change it in current_waiting_monitor() and current_pending_monitor(). Would it be acceptable to change the general code for this special usage? I'm not against doing it, I just want to double-check. It may improve reliability in general. |
The code that you're fixing is indeed a special case and I wrote the new test The similar usage in JVM/TI is safe for exactly the reasons explained in the comment so a I think your solution of adding Using either |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thumbs up.
ObjectMonitor *wait_obj = thread->current_waiting_monitor(); | ||
ObjectMonitor* volatile wait_obj = thread->current_waiting_monitor(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment above this declaration. Something like:
// Using 'volatile' to prevent the compiler from generating code that
// reloads 'wait_obj' from memory when used below.
ObjectMonitor *enter_obj = thread->current_pending_monitor(); | ||
ObjectMonitor* volatile enter_obj = thread->current_pending_monitor(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment above this declaration. Something like:
// Using 'volatile' to prevent the compiler from generating code that
// reloads 'enter_obj' from memory when used below.
@TheRealMDoerr This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 23 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
Thanks for the review! Comment added. We have only seen the crash on s390. It depends on the platform if the compiler tends to reload values more or less. |
Hi Martin, I'm with @stefank. This special case is a race condition and the solution for that is Atomic::load/store. Given Atomic::load/store don't actually need to do anything in practice (other than act as marker of a race) I don't have any qualms about using them all the time. Seperately, I'm unclear why we allow this race to exist. I thought we took snapshots when threads were known to be safe and stable. But that is a separate issue. Cheers, |
I agree with @stefank and @dholmes-ora . It is nature to happen the change in thread.hpp to fix this problem.
Now we have Thread-Local handshake. I think we should use it at here. |
Thanks, folks, for reviewing! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we add volatile
to both _current_pending_monitor
and _current_waiting_monitor
?
Right. Done. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thumbs up. Just a couple of suggestions about the comments.
src/hotspot/share/runtime/thread.hpp
Outdated
// Using atomic load to prevent compilers from reloading (ThreadService::get_current_contended_monitor). | ||
// In case of concurrent modification, reloading pointer after NULL check must be prevented. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps rewrite the comment like this:
// Use Atomic::load() to prevent data race between concurrent modification and
// concurrent readers, e.g., ThreadService::get_current_contended_monitor().
src/hotspot/share/runtime/thread.hpp
Outdated
@@ -765,10 +767,11 @@ class JavaThread: public Thread { | |||
return _current_pending_monitor_is_from_java; | |||
} | |||
ObjectMonitor* current_waiting_monitor() { | |||
return _current_waiting_monitor; | |||
// Using atomic load as in current_pending_monitor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps:
// See the comment in current_pending_monitor() above.
When we ask for snapshots with stack traces, we use a safepoint to get all of The "other" code path is when we don't ask for stack traces and that path has This duality in code paths is why the new test I wrote: |
Thanks for the review! I have improved the comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thumbs up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Thanks,
David
Thanks for the reviews! |
@TheRealMDoerr Since your change was applied there have been 27 commits pushed to the
Your commit was automatically rebased without conflicts. Pushed as commit 1e29005. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
We need a fix for crashes in get_current_contended_monitor due to concurrent modification of memory locations which are not declared volatile. See bug for details.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/4224/head:pull/4224
$ git checkout pull/4224
Update a local copy of the PR:
$ git checkout pull/4224
$ git pull https://git.openjdk.java.net/jdk pull/4224/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 4224
View PR using the GUI difftool:
$ git pr show -t 4224
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/4224.diff