New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8274196: Crashes in VM_HeapDumper::work after JDK-8252842 #5681
Conversation
/issue 8274245 |
|
@linzang |
Webrevs
|
@@ -748,7 +748,7 @@ class ParDumpWriter : public AbstractDumpWriter { | |||
|
|||
static void before_work() { | |||
assert(_lock == NULL, "ParDumpWriter lock must be initialized only once"); | |||
_lock = new (std::nothrow) PaddedMonitor(Mutex::leaf, "Parallel HProf writer lock", Mutex::_safepoint_check_never); | |||
_lock = new (std::nothrow) PaddedMonitor(Mutex::leaf, "ParallelHProfWriter_lock", Mutex::_safepoint_check_always); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you change these locks to _safepoint_check_always, you have to acquire them without the Mutex::_no_safepoint_check flags so I don't know why you don't get that assert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it may be because this is actually not a JavaThread. So the assert in Mutex::check_no_safepoint_state
would pass.
Moreover, I have tried to use PaddedMonitor(Mutex::nosafepoint, "ParallelHProfWriter_lock", Mutex::_safepoint_check_never);
here, but the slowdebug would report errors as you mentioned in JDK-8274245.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the flag here and at the place of lock acquiring seems problematic. I will try to see whether I can use Mutex::_safepoint_check_never
here and get rid of the assert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
void Mutex::check_no_safepoint_state(Thread* thread) {
check_block_state(thread);
assert(!thread->is_active_Java_thread() || _safepoint_check_required != _safepoint_check_always,
"This lock should always have a safepoint check for Java threads: %s",
name());
}
yes, we exclude the check for a non-java thread, which I thought was an odd exclusion last time I looked. I pass the tests in sun/tools/jmap/BasicJMapTest.java so maybe leave it for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, maybe a new issue could be created for tracking this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/label add serviceability |
@linzang |
The lock stuff looks ok, but please have at least one of the original reviewers review the change.
@@ -44,6 +44,3 @@ serviceability/sa/TestJmapCoreMetaspace.java 8268722,8268636 | |||
serviceability/sa/TestJhsdbJstackMixed.java 8248912 generic-all | |||
serviceability/sa/ClhsdbPstack.java#process 8248912 generic-all | |||
serviceability/sa/ClhsdbPstack.java#core 8248912 generic-all | |||
|
|||
serviceability/dcmd/gc/HeapDumpAllTest.java 8274196 linux-all,windows-all | |||
serviceability/dcmd/gc/HeapDumpTest.java 8274196 linux-all,windows-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before you push, you'll need to do a merge from mainline and should un-ProblemList sun/tools/jmap/BasicJMapTest.java.
@linzang This change now passes all automated pre-integration checks. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 49 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.
|
The root cause for crash in ZGC is that the JNIHandles are processed before object iteration. And ZGC would update the JNIHandles at object iteration with read barrier. So the crash is cause by accessing the invalid address which can be dummy info after zgc, and hence crash.
The fix here should not be to change the order of stuff, so that heap iteration happens first, that will just hide the real bug. The real bug is that the JNIGlobalsDumper::do_oop()
is missing a load barrier. In other words, keep the order and just make sure to add a load barrier, like this:
void JNIGlobalsDumper::do_oop(oop* obj_p) {
oop o = NativeAccess<AS_NO_KEEPALIVE>::oop_load(obj_p);
...
Hi Per @pliden , BRs, |
Thanks @pliden for help review and approve. Dear Chris(@plummercj) and Ralf(@schmelter-sap), may I ask your help to review this fix of JDK-8252842? Thanks! |
Yes, I will have a look at it. |
The dumper threads related fix looks fine. I don't know enough to verify the GC fix, but I think Per has that covered sufficiently. Likewise for the lock rank issue which Coleen has reviewed. Also, I tested your changes with our tier2 and tier3 CI runs, which is where the failures initially turned up, and they are passing now.
Thanks all for your help reviewing this patch. I will integrate it. |
/integrate |
Going to push as commit bfd6163.
Your commit was automatically rebased without conflicts. |
The root cause for crash in ZGC is that the JNIHandles are processed before object iteration. And ZGC would update the JNIHandles at object iteration with read barrier. So the crash is cause by accessing the invalid address which can be dummy info after zgc, and hence crash.
The lock rank issue can be fixed because the related mutexes are acquired in safepoint. so the safepoint_check_required could be safepoint_check_always.
The Epsilon issue is caused by wrong _num_dumper_thread calculated when the gang==NULL.
Progress
Issues
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/5681/head:pull/5681
$ git checkout pull/5681
Update a local copy of the PR:
$ git checkout pull/5681
$ git pull https://git.openjdk.java.net/jdk pull/5681/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 5681
View PR using the GUI difftool:
$ git pr show -t 5681
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/5681.diff