New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8264760: JVM crashes when two threads encounter the same resolution error #3392
Conversation
|
/contributor add Wang Huang whuang@openjdk.org |
@Wanghuang-Huawei |
@Wanghuang-Huawei |
@Wanghuang-Huawei The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Wang,
Thanks for reporting this and offering to fix it. I have a comment on the actual fix and a few issues with the testcase - though thanks for adding one. With a race condition like this it is hard to write a test that will fail reliably without the fix.
Thanks,
David
@@ -0,0 +1,114 @@ | |||
class HostNoNestMember { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new file needs a copyright notice and GPL header.
Also please add a comment stating what this jcod file describes - see other jcod files in the directory for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I will fix that.
|
||
import java.util.concurrent.CountDownLatch; | ||
|
||
class HostNoNestMember { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment describing how this class is replaced with the jcode version, and the difference in that version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I split this class into a single file HostNoNestMember.java
as @hseigel suggested and add a comment.
TestNestHostErrorWithMultiThread t = new TestNestHostErrorWithMultiThread(); | ||
t.test(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need to create an instance - just use a static test method.
The test code could be placed directly in main in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. It will be deleted in next commit.
private int value; | ||
} | ||
|
||
public int foo() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please call this "test".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Thank you.
new Thread(() -> { | ||
try { | ||
latch.await(); | ||
HostNoNestMember h = new HostNoNestMember(); | ||
h.foo(); | ||
} catch (IllegalAccessError expected) { | ||
System.out.println("OK - got expected exception: " + expected); | ||
} catch (InterruptedException e) {} | ||
}).start(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please extract the logic to a Runnable so that both threads can use that one runnable rather than duplicating the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your review. It will be more clear after extracting the logic. I will refact that.
|
||
public void test() throws Throwable { | ||
|
||
CountDownLatch latch = new CountDownLatch(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of CDL doesn't guarantee that both threads have reached the await() before the main thread does the countDown(). If you use a CyclicBarrier instead it will trip when the second thread arrives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tested CyclicBarrier
here and found that CountDownLatch
could trigger this bug fast while CyclicBarrier
couldn't always trigger this bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of CDL doesn't guarantee that both threads have reached the await() before the main thread does the countDown(). If you use a CyclicBarrier instead it will trip when the second thread arrives.
I have tested a small case the CyclicBarrier
version and CountDownLatch
version on JDK17
& JDK11
. It seems that :
- For
JDK17
, the wake-up time between two threads in theCyclicBarrier
version is bigger than theCountDownLatch
version. In other words, theCountDownLatch
version can trigger this bug more easily . - the
CyclicBarrier
version and theCountDownLatch
version running underJDK11
have the same performance.
// CountDownLatch version
public class TestCountDownLatch {
public static void main(String args[]) {
CountDownLatch latch1 = new CountDownLatch(1);
CountDownLatch latch2 = new CountDownLatch(2);
MyThread t1 = new MyThread(latch1, latch2);
MyThread t2 = new MyThread(latch1, latch2);
t1.start();
t2.start();
try {
// waiting thread creation
latch2.await();
latch1.countDown();
t1.join();
t2.join();
} catch (InterruptedException e) {}
System.out.println(Math.abs(t1.getAwakeTime() - t2.getAwakeTime()));
}
static class MyThread extends Thread {
private CountDownLatch latch1;
private CountDownLatch latch2;
private long awaketime;
MyThread(CountDownLatch latch1, CountDownLatch latch2) {
this.latch1 = latch1;
this.latch2 = latch2;
}
@Override
public void run() {
try {
latch2.countDown();
// Try to have all threads trigger the nesthost check at the same time
latch1.await();
awaketime = System.nanoTime();
} catch (InterruptedException e) {}
}
public long getAwakeTime() {
return awaketime;
}
}
}
// CyclicBarrier version
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.CyclicBarrier;
public class TestCyclicBarrier {
public static void main(String args[]) {
CyclicBarrier barrier = new CyclicBarrier(2);
MyThread t1 = new MyThread(barrier);
MyThread t2 = new MyThread(barrier);
t1.start();
t2.start();
try {
t1.join();
t2.join();
} catch (InterruptedException e) {}
System.out.println(Math.abs(t1.getAwakeTime() - t2.getAwakeTime()));
}
static class MyThread extends Thread {
private CyclicBarrier barrier;
private long awaketime;
MyThread(CyclicBarrier barrier) {
this.barrier = barrier;
}
@Override
public void run() {
try {
// Try to have all threads trigger the nesthost check at the same time
barrier.await();
awaketime = System.nanoTime();
} catch (Exception e) {}
}
public long getAwakeTime() {
return awaketime;
}
}
}
- results:
CountDownLatch
againstCyclicBarrier
CountDownLatch: 3960
CyclicBarrier: 3361280
CountDownLatch: 6120
CyclicBarrier: 3337540
CountDownLatch: 6160
CyclicBarrier: 3462920
CountDownLatch: 9150
CyclicBarrier: 3328090
CountDownLatch: 6580
CyclicBarrier: 3345450
CountDownLatch: 6340
CyclicBarrier: 3342900
CountDownLatch: 1330
CyclicBarrier: 3379210
CountDownLatch: 7780
CyclicBarrier: 3219020
CountDownLatch: 2460
CyclicBarrier: 3297020
CountDownLatch: 7320
CyclicBarrier: 3332770
JDK17
againstCyclicBarrier
CyclicBarrier jdk17: 3188590
CyclicBarrier jdk11: 49090
CyclicBarrier jdk17: 3123340
CyclicBarrier jdk11: 14680
CyclicBarrier jdk17: 3107910
CyclicBarrier jdk11: 780
CyclicBarrier jdk17: 3072600
CyclicBarrier jdk11: 1720
CyclicBarrier jdk17: 3164340
CyclicBarrier jdk11: 41020
CyclicBarrier jdk17: 3098490
CyclicBarrier jdk11: 7060
CyclicBarrier jdk17: 3058220
CyclicBarrier jdk11: 14750
CyclicBarrier jdk17: 3052460
CyclicBarrier jdk11: 660
CyclicBarrier jdk17: 3083650
CyclicBarrier jdk11: 14670
CyclicBarrier jdk17: 3116260
CyclicBarrier jdk11: 850
|
||
new Thread(() -> { | ||
try { | ||
latch.await(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment:
// Try to have all threads trigger the nesthost check at the same time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I will add that in next commit.
try { | ||
latch.await(); | ||
HostNoNestMember h = new HostNoNestMember(); | ||
h.foo(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please follow this by:
throw new RuntimeException("Did not get expected IllegalAccessError");
@@ -48,12 +48,12 @@ public int foo() { | |||
|
|||
public class TestNestHostErrorWithMultiThread { | |||
|
|||
public static void main(String args[]) { | |||
public static void main(String args[]) throws Throwable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"throws Throwable" is only necessary if you want unlisted checked exceptions to propagate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your review. throws Throwable
will be removed in my next commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Class HostNoNestMember could be removed entirely from class TestNestHostErrorWithMultiThread by doing the following:
Generate a jcod representation for class HostNoNestMember$Member and add it to HostNoNestMember.jcod.
Delete class HostNoNestMember from TestNestHostErrorWithMultiThread.java.
Change the test to compile HostNoNestMember.jcod before compiling TestNestHostErrorWithMultiThread.java.
Doing this removes the confusion of class HostNoNestMember being compiled twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the code you have is correct. I've suggested some additional commentary. There are a couple of races possible here, but I think the remaining ones are benign and we do not need to try and impose any stricter synchronization.
Thanks,
David
@@ -1935,8 +1935,7 @@ void SystemDictionary::add_nest_host_error(const constantPoolHandle& pool, | |||
{ | |||
MutexLocker ml(Thread::current(), SystemDictionary_lock); | |||
ResolutionErrorEntry* entry = resolution_errors()->find_entry(index, hash, pool, which); | |||
if (entry != NULL) { | |||
assert(entry->nest_host_error() == NULL, "Nest host error message already set!"); | |||
if (entry != NULL && entry->nest_host_error() == NULL) { | |||
entry->set_nest_host_error(message); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment before line 1939:
// An existing entry means we had a true resolution failure (LinkageError) with our nest host, but we
// still want to add the error message for the higher-level access checks to report. We should
// only reach here under the same error condition, so we can ignore the potential race with setting
// the message. If we see it is already set then we can ignore it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Thank you for your advice. I will add that.
Mailing list message from David Holmes on hotspot-runtime-dev: On 17/04/2021 5:42 pm, Wang Huang wrote:
Not sure what commentary you are looking for. CyclicBarrier should work David |
Mailing list message from David Holmes on hotspot-runtime-dev: On 19/04/2021 2:38 pm, Wang Huang wrote:
So ... thinking about what can happen here Case 1: countdown latch The thread that does the countdown() could do that before either thread Case 2: cyclic barrier The first thread to arrive has to block - no choice. The last thread to So while logically the cyclic barrier releases all threads "at the time Conversely the CDL is not guaranteed to show better bug reproducability Okay - the CDL version is what we should go with. Sorry to have taken up so much time on this. I didn't expect you to Thanks, |
1 similar comment
Mailing list message from David Holmes on hotspot-runtime-dev: On 19/04/2021 2:38 pm, Wang Huang wrote:
So ... thinking about what can happen here Case 1: countdown latch The thread that does the countdown() could do that before either thread Case 2: cyclic barrier The first thread to arrive has to block - no choice. The last thread to So while logically the cyclic barrier releases all threads "at the time Conversely the CDL is not guaranteed to show better bug reproducability Okay - the CDL version is what we should go with. Sorry to have taken up so much time on this. I didn't expect you to Thanks, |
Thank you for your review.
I think a java file is more clear?
I split
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
Sorry for the delayed re-review. There are still a bunch of nits with the test logic that I'd like to see addressed. Sorry I should have flagged them earlier.
Thanks,
David
CountDownLatch latch1 = new CountDownLatch(1); | ||
CountDownLatch latch2 = new CountDownLatch(2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rename latch1 as run, and latch2 as started. The names should aid in understanding the logic of the synchronization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. It will make the code more clear. I will do that.Thank you.
new Thread(new Test(latch1, latch2)).start(); | ||
new Thread(new Test(latch1, latch2)).start(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: you only need a single Runnable instance as it is stateless. The Runnable could even be defined inline rather than as a separate named class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's my fault. I will adjust that. Thank you.
// waiting thread creation | ||
latch2.await(); | ||
latch1.countDown(); | ||
} catch (InterruptedException e) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Catch block should have:
throw new Error("Unexpected interrupt");
msg + "\"", expected); | ||
} | ||
System.out.println("OK - got expected exception: " + expected); | ||
} catch (InterruptedException e) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Catch block should have:
throw new Error("Unexpected interrupt");
// Try to have all threads trigger the nesthost check at the same time | ||
latch1.await(); | ||
HostNoNestMember h = new HostNoNestMember(); | ||
h.test(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the call to test you should have:
throw new Error("IllegalAccessError was not thrown as expected");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I will change that.
* @bug 8264760 | ||
* @summary JVM crashes when two threads encounter the same resolution error | ||
* | ||
* @library /test/lib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't use the test library
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Thank you for your review.
* @compile HostNoNestMember.java | ||
* @compile HostNoNestMember.jcod | ||
* | ||
* @run main/othervm TestNestHostErrorWithMultiThread |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for othervm in this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I will do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thanks for your patience here.
David
@Wanghuang-Huawei This change now passes all automated pre-integration checks. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 497 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@dholmes-ora, @hseigel) but any other Committer may sponsor as well.
|
The new test is failing for a reason that is not clear to me yet. |
I was unable to reproduce the test failure locally. |
I will run regression and check this thing. |
Thank you for your feedback. I have reproduced this problem (~ 10% ) and fixed it. Could you give me a favor to review new commit ? Thank you very much. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see exactly why the lack of join() would lead to the test failure, but doing the join() is cleaner and consistent with what other tests do.
Thanks,
David
The reason is that the main thread needs to exit and wait for the other two threads to exit. However, the |
/integrate |
@Wanghuang-Huawei |
We actually need a second review here (hotspot group rules). Thanks, |
Sure. It's my fault. Is there any way to revoke |
No need to revoke it, the sponsor just needs to hold off until there is a second review. |
* | ||
* @run main TestNestHostErrorWithMultiThread | ||
*/ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment here that HostNoNestMember.jcod must be compiled after HostNoNestMember.java because the class file from the jcod file must replace the HostNoNestMember class file generated from HostNoNestMember.java.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your review. I will add that.
* NestMembers attribute to make it empty. The class | ||
* HostNoNestMember$Member has a class HostNoNestMember as its | ||
* NestHost, which will trigger an error when resolving . | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you modify the above comment to something such as:
* This class was used to produce a jcod file in which the
* NestMembers attribute was modified to make it empty. Class
* HostNoNestMember$Member has class HostNoNestMember as its
* NestHost, which will trigger an error when resolving.
*
* When compiled, this generates a HostNoNestMember class and
* a HostNoNestMember$Menber class. The former class, HostNoNestMember,
* gets overwritten when HostNoNestMember.jcod gets compiled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I will fix that. Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good. Thanks for fixing this.
Harold
Thank you for your approval. @dholmes-ora @hseigel |
/integrate |
@Wanghuang-Huawei |
/sponsor |
@hseigel @Wanghuang-Huawei Since your change was applied there have been 509 commits pushed to the
Your commit was automatically rebased without conflicts. Pushed as commit 9a19a0c. |
As shown in JDK-8264760, I changed notes with @dholmes-ora and only fixed this issue by deleting the assert.
The other whole bigger issue will be fixed in the other issue.Progress
Issue
Reviewers
Contributors
<whuang@openjdk.org>
<wuyan34@huawei.com>
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/3392/head:pull/3392
$ git checkout pull/3392
Update a local copy of the PR:
$ git checkout pull/3392
$ git pull https://git.openjdk.java.net/jdk pull/3392/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 3392
View PR using the GUI difftool:
$ git pr show -t 3392
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/3392.diff