New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8260366: ExtendedSocketOptions <clinit> can deadlock in some circumstances #2601
Conversation
|
Webrevs
|
Hi Jaikiran,
Thanks for taking a go at this one.
At a glance - the proposed fix seems reasonable to me. It would be good to have another pair of eyes (and analysis) on this though. Some issues with the test.
* @bug 8260366 | ||
* @summary Verify that concurrent classloading of sun.net.ext.ExtendedSocketOptions and | ||
* jdk.net.ExtendedSocketOptions doesn't lead to a deadlock | ||
* @run testng/othervm --add-exports=java.base/sun.net.ext=ALL-UNNAMED ExtendedSocketOptionsTest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The --add-exports should not be needed - have you tried simply adding:
@modules java.base/sun.net.ext:+open
jdk.net
before @run
? This also has the additional benefit to declare which modules are required to run the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the test should be fast, I'd also advise to stick several identical @run lines (maybe e.g. 5 of them) to increase the probability of the deadlock to happen...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the test should be fast, I'd also advise to stick several identical @run lines (maybe e.g. 5 of them) to increase the probability of the deadlock to happen...
Done. The PR has been updated to run this test multiple times now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The --add-exports should not be needed - have you tried simply adding:
@modules java.base/sun.net.ext:+open jdk.net
before
@run
? This also has the additional benefit to declare which modules are required to run the test.
I had forgotten about the @modules
directive. I have now updated the PR use that instead. The only minor difference between what you suggested and my updated PR is that I decided to use :open
instead of :+open
for the sun.net.ext
package, since I don't use the types in that package at compile time, in that test.
* published by the Free Software Foundation. Oracle designates this | ||
* particular file as subject to the "Classpath" exception as provided | ||
* by Oracle in the LICENSE file that accompanied this code. | ||
* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests do not need - nor should they have - the "Classpath" exception - please see Copyright notice of other tests in the vicinity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for catching this. I've fixed it in the latest update of the PR.
…ports option while launching the test
…mprove the chances of reproducing any potential deadlock
Hello Daniel, thank you for the review. I have updated this PR to incorporate your review suggestions for the test. |
Changes looks OK to me, any specific reason for removing "final" specifier from "getInstance" & "register" methods ?. |
@@ -167,27 +167,34 @@ protected ExtendedSocketOptions(Set<SocketOption<?>> options) { | |||
|
|||
private static volatile ExtendedSocketOptions instance; | |||
|
|||
public static final ExtendedSocketOptions getInstance() { return instance; } | |||
public static ExtendedSocketOptions getInstance() { | |||
if (instance != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be it's worth to avoid reading volatile
field twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @turbanoff, do you mean why read it twice - once here and once inside the synchronized
block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Once here and once inside synchronized
block.
Reading volatile
fields cost something on some architectures, so I think we could optimize it a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @turbanoff, the double read is necessary to correctly avoid any race conditions and is a typical strategy used in cases like these. I am not aware of any other alternate more performant strategy, for code like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me be more clear: I think that it's enough to have only 2 volatile
field reads in worst case. We can use local variable to avoid more than needed reads.
Current code can perform read 4 times in worst case: twice outside synchronized
block and twice inside synchronized
block.
There are many examples of similar code in the JDK:
https://github.com/openjdk/jdk/blob/master/src/jdk.crypto.cryptoki/share/classes/sun/security/pkcs11/P11Util.java#L48
https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/StackFrameInfo.java#L125
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! of course, I now see what you mean. I misunderstood your previous comment. I'll update this PR shortly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now updated this PR which this suggested change.
Hello Vyom, thank you for the review. Since those two methods are |
I think it's OK for me. What about improving the test little bit, your test wants to load both classes at the same time. Please have a look on modified test. /*
import java.util.concurrent.Callable; /**
|
…hreads as concurrently as possible
Hello Vyom, I think that's a good suggestion to use a latch for deciding when to trigger the classloading. I've taken your input and have made some relatively minor change to the way that latch gets used and updated my PR with that change. The latch now waits for both the tasks to reach the point where they are going to do a |
I think below change will address Andrey's concern |
instance = new NoExtendedSocketOptions(); | ||
} | ||
} | ||
return instance; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest changing these two lines as well:
// the jdk.net module is not present => no extended socket options
ext = instance = new NoExtendedSocketOptions();
}
}
return ext;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello Daniel, I had thought about it in my previous commit. But this won't work, since in the normal case where the ClassNotFoundException
doesn't get thrown, the instance
is actually set in the register
method which gets triggered due to the class load on jdk.net.ExtendedSocketOptions
. As a result, returning the local ext
variable won't work in that case, unless of course I do ext = instance
in both the catch block and outside of it, which would, IMO, defeat the purpose of optimization in that last commit.
I decided to "prove" it with some test case and while doing so I just uncovered that my whole patch has just moved the deadlock to a new location - thread T1 calling sun.net.ext.ExtendedSocketOptions#getInstance()
and thread T2 calling Class.forName("jdk.net.ExtendedSocketOptions")
ends up in a deadlock. It's clears why that happens.
I am going to take a step back and come back with a different fix for this one. Thank you everyone for these reviews - they definitely helped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Jaikiran,
I tested with my suggested change and i did not see any deadlock at my local Linux environment. I just ran test in loop and it worked as expected.
Thanks,
Vyom
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello Vyom,
The trick is to change the test case to have something like this in the "task":
diff --git a/test/jdk/sun/net/ext/ExtendedSocketOptionsTest.java b/test/jdk/sun/net/ext/ExtendedSocketOptionsTest.java
index 0702abf5279..26c8a1384a2 100644
--- a/test/jdk/sun/net/ext/ExtendedSocketOptionsTest.java
+++ b/test/jdk/sun/net/ext/ExtendedSocketOptionsTest.java
@@ -96,7 +96,11 @@ public class ExtendedSocketOptionsTest {
classLoadingTriggerLatch.countDown();
// wait for the other task to let us know it's ready too, to load the class
classLoadingTriggerLatch.await();
- return Class.forName(this.className);
+ final Class<?> c = Class.forName(this.className);
+ // let's call getInstance on sun.net.ext.ExtendedSocketOptions
+ final Class<?> k = Class.forName("sun.net.ext.ExtendedSocketOptions");
+ final Object extSocketOptions = k.getDeclaredMethod("getInstance").invoke(null);
+ return c;
} catch (Exception e) {
System.err.println("Failed to load " + this.className);
throw new RuntimeException(e);
Essentially, trigger a call to sun.net.ext.ExtendedSocketOptions#getInstance
and a classload of jdk.net.ExtendedSocketOptions
simultaneously from different threads. That will end up with:
"pool-1-thread-1" #16 prio=5 os_prio=31 cpu=18.25ms elapsed=120.03s tid=0x00007ff9008c8a00 nid=0x6203 waiting for monitor entry [0x0000700010c8b000]
java.lang.Thread.State: BLOCKED (on object monitor)
at sun.net.ext.ExtendedSocketOptions.register(java.base@17-internal/ExtendedSocketOptions.java:197)
- waiting to lock <0x000000070f9f8e68> (a java.lang.Class for sun.net.ext.ExtendedSocketOptions)
at jdk.net.ExtendedSocketOptions.<clinit>(jdk.net@17-internal/ExtendedSocketOptions.java:234)
at java.lang.Class.forName0(java.base@17-internal/Native Method)
at java.lang.Class.forName(java.base@17-internal/Class.java:375)
at ExtendedSocketOptionsTest$Task.call(ExtendedSocketOptionsTest.java:100)
at ExtendedSocketOptionsTest$Task.call(ExtendedSocketOptionsTest.java:84)
at java.util.concurrent.FutureTask.run(java.base@17-internal/FutureTask.java:264)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17-internal/ThreadPoolExecutor.java:1135)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17-internal/ThreadPoolExecutor.java:635)
at java.lang.Thread.run(java.base@17-internal/Thread.java:831)
"pool-1-thread-2" #17 prio=5 os_prio=31 cpu=8.76ms elapsed=120.03s tid=0x00007ff90102be00 nid=0x6403 in Object.wait() [0x0000700010d8e000]
java.lang.Thread.State: RUNNABLE
at java.lang.Class.forName0(java.base@17-internal/Native Method)
- waiting on the Class initialization monitor for jdk.net.ExtendedSocketOptions
at java.lang.Class.forName(java.base@17-internal/Class.java:375)
at sun.net.ext.ExtendedSocketOptions.getInstance(java.base@17-internal/ExtendedSocketOptions.java:185)
- locked <0x000000070f9f8e68> (a java.lang.Class for sun.net.ext.ExtendedSocketOptions)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@17-internal/Native Method)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@17-internal/NativeMethodAccessorImpl.java:78)
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@17-internal/DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(java.base@17-internal/Method.java:566)
at ExtendedSocketOptionsTest$Task.call(ExtendedSocketOptionsTest.java:102)
at ExtendedSocketOptionsTest$Task.call(ExtendedSocketOptionsTest.java:84)
at java.util.concurrent.FutureTask.run(java.base@17-internal/FutureTask.java:264)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17-internal/ThreadPoolExecutor.java:1135)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17-internal/ThreadPoolExecutor.java:635)
at java.lang.Thread.run(java.base@17-internal/Thread.java:831)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello Daniel, I had thought about it in my previous commit. But this won't work, since in the normal case where the
ClassNotFoundException
doesn't get thrown, theinstance
is actually set in theregister
method which gets triggered due to the class load onjdk.net.ExtendedSocketOptions
. As a result, returning the localext
variable won't work in that case, unless of course I doext = instance
in both the catch block and outside of it, which would, IMO, defeat the purpose of optimization in that last commit.
Oh - right - still - if you'd stick with that fix - I believe you should do that:
try {
// If the class is present, it will be initialized which
// triggers registration of the extended socket options.
Class<?> c = Class.forName("jdk.net.ExtendedSocketOptions");
ext = instance;
} catch (ClassNotFoundException e) {
// the jdk.net module is not present => no extended socket options
ext = instance = new NoExtendedSocketOptions();
}
}
return ext;
Anyway: waiting for the next fix :-)
I'm closing this for now, for the reason stated here #2601 (comment). |
…ce the testcase to be more robust in catching the deadlocks
After thinking a bit more about this issue and the patch I had proposed, I realized what I did wrong in this patch. The I've now updated and reopened this PR for further reviews. |
Hi Jaikiran, I looked into your previous patch and i believe why you were observing the deadlock because you make "sun.net.ext.ExtendedSocketOption.register" thread safe(synchronize). When you are loading both the classes simultaneously one thread is calling getInstance() and other thread while loading the class(jdk.net.ExtendedSocketOptions) calls register and you are observing the deadlock. Although sun.net.ext.ExtendedSocketOptions.register is public method but it will be called only from jdk.net.ExtendedSocketOption static block, so we don't need it to be thread safe. Let's wait for what others people say. |
Any reviews/comments, in addition to what Vyom mentioned, to the updated PR please? |
public static synchronized void register(ExtendedSocketOptions extOptions) { | ||
if (instance != null) | ||
throw new InternalError("Attempting to reregister extended options"); | ||
|
||
instance = extOptions; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arguably, because instance
is volatile, you could also use the double locking mechanism here to throw before synchronizing if instance is already set, and check again after synhronizing. However, it's probably not worth it since this method is expected to be called (and should always be called) only once - so I believe what you have here should be enough.
@jaikiran This change now passes all automated pre-integration checks. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 109 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.
|
Thank you for the reviews, Daniel. I'll integrate this shortly. |
/integrate |
@jaikiran Since your change was applied there have been 138 commits pushed to the
Your commit was automatically rebased without conflicts. Pushed as commit 240f2a1. |
Can I please get a review for this change which proposes to fix the issue reported in https://bugs.openjdk.java.net/browse/JDK-8260366?
The issue relates to the concurrent classloading of
sun.net.ext.ExtendedSocketOptions
andjdk.net.ExtendedSocketOptions
leading to a deadlock. This is because thesun.net.ext.ExtendedSocketOptions
in its static block does aClass.forName("jdk.net.ExtendedSocketOptions")
. Thejdk.net.ExtendedSocketOptions
in its own static block calls theregister
method onsun.net.ext.ExtendedSocketOptions
. If 2 threads concurrently try loading these classes (one loading thesun.net.ext.ExtendedSocketOptions
and the other loadingjdk.net.ExtendedSocketOptions
), it can so happen that each one ends up holding a classloading lock in the static block of the respective class, while waiting for the other thread to release the lock, thus leading to a deadlock. The stacktrace posted in the linked JBS issue has the necessary details on the deadlock.The commit here breaks this deadlock by moving out the
Class.forName("jdk.net.ExtendedSocketOptions")
call from the static block ofsun.net.ext.ExtendedSocketOptions
to thegetInstance()
method, thus lazily loading (on first call togetInstance()
) and registering thejdk.net.ExtendedSocketOptions
.Extra attention needs to be given to the
sun.net.ext.ExtendedSocketOptions#register(ExtendedSocketOptions extOptions)
method. Before the change in this PR, when thesun.net.ext.ExtendedSocketOptions
would successfully complete loading, it was guaranteed that the registeredExtendedSocketOptions
would either be the one registered from thejdk.net.ExtendedSocketOptions
or aNoExtendedSocketOptions
. There wasn't any window of chance for any code (be it in the JDK or in application code) to call thesun.net.ext.ExtendedSocketOptions#register
to register any different/other implementation/instance of theExtendedSocketOptions
. However, with this change in the PR, there is now a window of chance where any code in the JDK (or even application code?) can potentially call thesun.net.ext.ExtendedSocketOptions#register
before either thejdk.net.ExtendedSocketOptions
is loaded or thesun.net.ext.ExtendedSocketOptions#getInstance()
method is called, thus allowing for some other implementation of theExtendedSocketOptions
to be registered. However, I'm not sure if it's a practical scenario - although thesun.net.ext.ExtendedSocketOptions#register
is markedpublic
, the comment on that method and the fact that it resides in an internal, not exposed by default class/module, makes me believe that thisregister
method isn't supposed to be called by anyone other than thejdk.net.ExtendedSocketOptions
. If at all thisregister
method is allowed to be called from other places, then due to the change in this PR, additional work needs to be probably done in its implementation to allow for thejdk.net.ExtendedSocketOptions
to be given first priority(?) to be registered first. I'll need input on whether I should worry about this case or if it's fine in its current form in this PR.This PR also contains a jtreg test which reproduces the issue and verifies the fix.
Progress
Issue
Reviewers
Download
$ git fetch https://git.openjdk.java.net/jdk pull/2601/head:pull/2601
$ git checkout pull/2601