Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8266310: deadlock between System.loadLibrary and JNI FindClass loading another class #3976

Closed
wants to merge 9 commits into from

Conversation

voitylov
Copy link

@voitylov voitylov commented May 11, 2021

Please review this PR which fixes the deadlock in ClassLoader between the two lock objects - a lock object associated with the class being loaded, and the ClassLoader.loadedLibraryNames hash map, locked during the native library load operation.

Problem being fixed:

The initial reproducer demonstrated a deadlock between the JarFile/ZipFile and the hash map. That deadlock exists even when the ZipFile/JarFile lock is removed because there's another lock object in the class loader, associated with the name of the class being loaded. Such objects are stored in ClassLoader.parallelLockMap. The deadlock occurs when JNI_OnLoad() loads exactly the same class, whose signature is being verified in another thread.

Proposed fix:

The proposed patch suggests to get rid of locking loadedLibraryNames hash map and synchronize on each entry name, as it's done with class names in see ClassLoader.getClassLoadingLock(name) method.

The patch introduces nativeLibraryLockMap which holds the lock objects for each library name, and the getNativeLibraryLock() private method is used to lazily initialize the corresponding lock object. nativeLibraryContext was changed to ThreadLocal, so that in any concurrent thread it would have a NativeLibrary object on top of the stack, that's being currently loaded/unloaded in that thread. nativeLibraryLockMap accumulates the names of all native libraries loaded - in line with class loading code, it is not explicitly cleared.

Testing: jtreg and jck testing with no regressions. A new regression test was developed.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8266310: deadlock between System.loadLibrary and JNI FindClass loading another class

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/3976/head:pull/3976
$ git checkout pull/3976

Update a local copy of the PR:
$ git checkout pull/3976
$ git pull https://git.openjdk.java.net/jdk pull/3976/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 3976

View PR using the GUI difftool:
$ git pr show -t 3976

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/3976.diff

@bridgekeeper
Copy link

bridgekeeper bot commented May 11, 2021

👋 Welcome back avoitylov! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label May 11, 2021
@openjdk
Copy link

openjdk bot commented May 11, 2021

@voitylov The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the core-libs core-libs-dev@openjdk.org label May 11, 2021
@mlbridge
Copy link

mlbridge bot commented May 11, 2021

}

// thread local native libraries stack
private static final ThreadLocal<Deque<NativeLibraryImpl>> nativeLibraryThreadContext =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer not see thread locals introduced here. We've put a lot of effort into recent releases to eliminate the use of TLs from java.base.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update the PR to avoid ThreadLocal and have ArrayDeque deallocated once its last element has been popped out.

@AlanBateman
Copy link
Contributor

/reviewers 2

@openjdk
Copy link

openjdk bot commented May 11, 2021

@AlanBateman
The number of required reviews for this PR is now set to 2 (with at least 1 of role reviewers).

@plevart
Copy link
Contributor

plevart commented May 12, 2021

Hi Aleksei,
If I understand David Holmes correctly (on the mailing list), there is a concern that using a per-library-name exclusive lock might not be enough to prevent deadlock with class loading lock. You say: "The deadlock occurs when JNI_OnLoad() loads exactly the same class, whose signature is being verified in another thread." ... Could you explain a little bit more about this scenario? As I understand the JNI_OnLoad() is called as part of native library loading and at that time the thread that is loading native library holds a nativeLibraryLock(libraryName). Now if JNI_OnLoad() triggers loading of a class, whose signature is being verified in another thread (do you mean: which is already being loaded in another thread?) then JNI_OnLoad() thread will block on classLoadingLock(className). But that is not a dead-lock yet. The loading of a class in another thread now has to attempt to load the same native library that is being loaded in the JNI_OnLoad() thread.
So before the proposed patch it was enough for the other thread which was loading the class to trigger loading of any native library, while after the proposed patch, the deadlock would occur only if it triggered loading of the same native library. Am I correctly assessing the problem?

@voitylov
Copy link
Author

Peter, that's right. It is still possible to have the same pair of objects being locked with per-library lock, namely when a library being loaded explicitly loads a class that requires the same library to be loaded. I think this type of design flaw cannot be solved by changing locking algorithm. It's clearly not the goal of this patch.

@ChrisHegarty
Copy link
Member

Hi Aleksei,

As you may know, I looked into a similar issue recently and put together a reproducer [1] (which is probably similar to what you have). The reproducer, run at the time against Oracle 11u, demonstrates the issue on the mainline too, but the deadlock is slightly different. The reason I mention it here is that the reproducer encounters the issue whether there is an attempt to load the same class or another class ( from the same jar file ). In fact, the issue is even more general, the problem is with trying to load a class, not already loaded, from a jar further down on the class path ( the class may not even exist, just that it causes the loader to walk the class path up to the jar being verified ).

I filed an issue for this, which may need to be closed as a duplicate depending on the outcome of this PR [2].

[1] https://github.com/ChrisHegarty/deadlock
[2] https://bugs.openjdk.java.net/browse/JDK-8266350

@mlbridge
Copy link

mlbridge bot commented May 14, 2021

Mailing list message from David Holmes on core-libs-dev:

Hi Aleksei,

On 11/05/2021 11:19 pm, Aleksei Voitylov wrote:

Please review this PR which fixes the deadlock in ClassLoader between the two lock objects - a lock object associated with the class being loaded, and the ClassLoader.loadedLibraryNames hash map, locked during the native library load operation.

Problem being fixed:

The initial reproducer demonstrated a deadlock between the JarFile/ZipFile and the hash map. That deadlock exists even when the ZipFile/JarFile lock is removed because there's another lock object in the class loader, associated with the name of the class being loaded. Such objects are stored in ClassLoader.parallelLockMap. The deadlock occurs when JNI_OnLoad() loads exactly the same class, whose signature is being verified in another thread.

Proposed fix:

The proposed patch suggests to get rid of locking loadedLibraryNames hash map and synchronize on each entry name, as it's done with class names in see ClassLoader.getClassLoadingLock(name) method.

The ClassLoader per-name locking map has a number of problems so I'm not
sure using it as a model is a good idea. In particular your new map only
grows, with entries never being removed AFAICS.

This is a difficult deadlock to solve. Even using a per-entry lock it
isn't obvious to me that you can't still get a deadlock.

My own thoughts on this problem were that we should not be calling
JNI_OnLoad with any lock held. But that risks use of a library in a
separate thread before the JNI+OnLoad has completed. As I said this is a
difficult deadlock to solve - and may not have a complete solution.

David
-----

@mlbridge
Copy link

mlbridge bot commented May 14, 2021

Mailing list message from Aleksei Voitylov on core-libs-dev:

Hi David,

On 12/05/2021 10:56, David Holmes wrote:

Hi Aleksei,

On 11/05/2021 11:19 pm, Aleksei Voitylov wrote:

Please review this PR which fixes the deadlock in ClassLoader between
the two lock objects - a lock object associated with the class being
loaded, and the ClassLoader.loadedLibraryNames hash map, locked
during the native library load operation.

Problem being fixed:

The initial reproducer demonstrated a deadlock between the
JarFile/ZipFile and the hash map. That deadlock exists even when the
ZipFile/JarFile lock is removed because there's another lock object
in the class loader, associated with the name of the class being
loaded. Such objects are stored in ClassLoader.parallelLockMap. The
deadlock occurs when JNI_OnLoad() loads exactly the same class, whose
signature is being verified in another thread.

Proposed fix:

The proposed patch suggests to get rid of locking loadedLibraryNames
hash map and synchronize on each entry name, as it's done with class
names in see ClassLoader.getClassLoadingLock(name) method.

The ClassLoader per-name locking map has a number of problems so I'm
not sure using it as a model is a good idea. In particular your new
map only grows, with entries never being removed AFAICS.

OK. I'm working on a version of a patch which addresses this concern.

This is a difficult deadlock to solve. Even using a per-entry lock it
isn't obvious to me that you can't still get a deadlock.

I believe the patch solves the deadlock when different libraries are in
play. Yes, it's still possible to get a deadlock when a library with a
JNI_Onload instantiates a class that requires the same library to be
loaded. Is this a valid use case to begin with for future use and why
such a recursion should be allowed? If not, I'd like to file a CSR for
JNI specification with the intent to forbid such behavior. It's obvious
there are no current users of such behavior because it deadlocks.

My own thoughts on this problem were that we should not be calling
JNI_OnLoad with any lock held. But that risks use of a library in a
separate thread before the JNI+OnLoad has completed. As I said this is
a difficult deadlock to solve - and may not have a complete solution.

Thanks, this is a very appealing idea. While it's true dlopen on Linux
and Mac OS are explicitly marked thread safe (and LoadLibrary on Windows
can probably be assumed so as well, though is not marked so in Microsoft
documentation), all implementations use reference counting techniques,
and the number of calls to open has to be equal to the number of calls
to close. I don't know how to achieve that without locks.

-Aleksei

David
-----

@mlbridge
Copy link

mlbridge bot commented May 14, 2021

Mailing list message from David Holmes on core-libs-dev:

Hi Aleksei,

On 13/05/2021 9:54 pm, Aleksei Voitylov wrote:

Hi David,

On 12/05/2021 10:56, David Holmes wrote:

Hi Aleksei,

On 11/05/2021 11:19 pm, Aleksei Voitylov wrote:

Please review this PR which fixes the deadlock in ClassLoader between
the two lock objects - a lock object associated with the class being
loaded, and the ClassLoader.loadedLibraryNames hash map, locked
during the native library load operation.

Problem being fixed:

The initial reproducer demonstrated a deadlock between the
JarFile/ZipFile and the hash map. That deadlock exists even when the
ZipFile/JarFile lock is removed because there's another lock object
in the class loader, associated with the name of the class being
loaded. Such objects are stored in ClassLoader.parallelLockMap. The
deadlock occurs when JNI_OnLoad() loads exactly the same class, whose
signature is being verified in another thread.

Proposed fix:

The proposed patch suggests to get rid of locking loadedLibraryNames
hash map and synchronize on each entry name, as it's done with class
names in see ClassLoader.getClassLoadingLock(name) method.

The ClassLoader per-name locking map has a number of problems so I'm
not sure using it as a model is a good idea. In particular your new
map only grows, with entries never being removed AFAICS.
OK. I'm working on a version of a patch which addresses this concern.

This is a difficult deadlock to solve. Even using a per-entry lock it
isn't obvious to me that you can't still get a deadlock.

I believe the patch solves the deadlock when different libraries are in
play. Yes, it's still possible to get a deadlock when a library with a
JNI_Onload instantiates a class that requires the same library to be
loaded. Is this a valid use case to begin with for future use and why
such a recursion should be allowed? If not, I'd like to file a CSR for
JNI specification with the intent to forbid such behavior. It's obvious
there are no current users of such behavior because it deadlocks.

So in the current case we have:

Thread 1:
- loads native lib and acquires load-library lock
- JNI_Onload tries to initialize class X but can't get the class init lock

Thread 2:
- initializes class X and has the class init lock
- <clinit> tries to load a native library and can't get the
load-library lock

If the library lock is finer granularity then we can reduce the risk of
deadlock but not remove it.

I don't think we can try to forbid anything through the JNI
specification in such a case. This is really just one of a number of
class initialization deadlocks that are possible. At most the spec could
caution against such things, but there is no way to detect it and so
"forbid" it. And you can construct similar deadlocks without class
initialization being involved.

My own thoughts on this problem were that we should not be calling
JNI_OnLoad with any lock held. But that risks use of a library in a
separate thread before the JNI+OnLoad has completed. As I said this is
a difficult deadlock to solve - and may not have a complete solution.

Thanks, this is a very appealing idea. While it's true dlopen on Linux
and Mac OS are explicitly marked thread safe (and LoadLibrary on Windows
can probably be assumed so as well, though is not marked so in Microsoft
documentation), all implementations use reference counting techniques,
and the number of calls to open has to be equal to the number of calls
to close. I don't know how to achieve that without locks.

Not every problem has a solution :) If JNI_OnLoad has to execute
atomically with respect to loading a library then there will always be a
deadlock potential. The only complete solution would not hold a lock
while JNI_OnLoad is executed, but that completely changes the semantics
of native library loading.

Cheers,
David

-Aleksei

David
-----

@mlbridge
Copy link

mlbridge bot commented May 14, 2021

Mailing list message from Mandy Chung on core-libs-dev:

On 5/13/21 6:05 AM, David Holmes wrote:

Not every problem has a solution :) If JNI_OnLoad has to execute
atomically with respect to loading a library then there will always be
a deadlock potential. The only complete solution would not hold a lock
while JNI_OnLoad is executed, but that completely changes the
semantics of native library loading.

I have been giving some thought on this issue.? I agree with David that
we should look into a solution not holding a lock while JNI_OnLoad is
executed.? It might be possible to put all threads that trying to load
the same native library that is being loaded to wait and only one thread
is loading it and executing JNI_OnLoad (e.g. add a state in
NativeLibrary to indicate it is being loaded, already loaded, being
unloaded).? I haven't had the cycle to think through it in details though.

Mandy

@mlbridge
Copy link

mlbridge bot commented May 14, 2021

Mailing list message from David Holmes on core-libs-dev:

On 14/05/2021 7:20 am, Mandy Chung wrote:

On 5/13/21 6:05 AM, David Holmes wrote:

Not every problem has a solution :) If JNI_OnLoad has to execute
atomically with respect to loading a library then there will always be
a deadlock potential. The only complete solution would not hold a lock
while JNI_OnLoad is executed, but that completely changes the
semantics of native library loading.

I have been giving some thought on this issue.? I agree with David that
we should look into a solution not holding a lock while JNI_OnLoad is
executed.? It might be possible to put all threads that trying to load
the same native library that is being loaded to wait and only one thread
is loading it and executing JNI_OnLoad (e.g. add a state in
NativeLibrary to indicate it is being loaded, already loaded, being
unloaded).? I haven't had the cycle to think through it in details though.

Any mechanism that blocks threads from accessing the library while
another thread is performing JNI_OnLoad, suffers from the same deadlock
problem.

David

Mandy

@openjdk openjdk bot removed the rfr Pull request is ready for review label May 19, 2021
@openjdk openjdk bot added the rfr Pull request is ready for review label May 19, 2021
@voitylov
Copy link
Author

Dear colleagues,

The updated PR addresses review comment regarding ThreadLocal as well as David' concern around the lock being held during JNI_OnLoad/JNI_OnUnload calls, and ensures all lock objects are deallocated. Multiple threads are allowed to enter NativeLibrary.load() to prevent any thread from locking while another thread loads a library. Before the update, there could be a class loading lock held by a parallel capable class loader, which can deadlock with the library loading lock. As proposed by David Holmes, the library loading lock was removed because dlopen/LoadLibrary are thread safe and they maintain internal reference counters on libraries. There's still a lock being held while a pair of containers are read/updated. It's not going to deadlock as there's no lock/wait operation performed while that lock is held. Multiple threads may create their own copies of NativeLibrary object and register it for auto unloading.

Tests for auto unloading were added along with the PR update. There are now 3 jtreg tests:

  • one checks for deadlock, similar to the one proposed by Chris Hegarty
  • two other tests are for library unload.

The major side effect of that multiple threads are allowed to enter is that JNI_OnLoad/JNI_OnUnload may be called multiple (but same) number of times from concurrent threads. In particular, the number of calls to JNI_OnLoad must be equal to the number of calls to JNI_OnUnload after the relevant class loader is garbage collected. This may affect the behaviour that relies on specific order or the number of JNI_OnLoad/JNI_OnUnload calls. The current JNI specification does not mandate how many times JNI_OnLoad/JNI_OnUnload are called. Also, we could not locate tests in jck/jtreg/vmTestbase that would rely on the specific order or number of calls to JNI_OnLoad/JNI_OnUnload.

Thank you Alan Bateman, David Holmes and Chris Hegarty for your valuable input.

@mlbridge
Copy link

mlbridge bot commented May 19, 2021

Mailing list message from Aleksei Voitylov on core-libs-dev:

Hi David, Mandy,

In the updated PR I removed the lock held during the load/unload calls.
Our testing confirmed that without that removal the deadlock can easily
be reproduced, even without signed jars. Now the lock is only held to
prevent races during access to "libraries" and "loadedLibraryNames".

Thank you,
-Aleksei

On 14/05/2021 03:28, David Holmes wrote:

On 14/05/2021 7:20 am, Mandy Chung wrote:

On 5/13/21 6:05 AM, David Holmes wrote:

Not every problem has a solution :) If JNI_OnLoad has to execute
atomically with respect to loading a library then there will always
be a deadlock potential. The only complete solution would not hold a
lock while JNI_OnLoad is executed, but that completely changes the
semantics of native library loading.

I have been giving some thought on this issue.? I agree with David
that we should look into a solution not holding a lock while
JNI_OnLoad is executed.? It might be possible to put all threads that
trying to load the same native library that is being loaded to wait
and only one thread is loading it and executing JNI_OnLoad (e.g. add
a state in NativeLibrary to indicate it is being loaded, already
loaded, being unloaded).? I haven't had the cycle to think through it
in details though.

Any mechanism that blocks threads from accessing the library while
another thread is performing JNI_OnLoad, suffers from the same
deadlock problem.

David

Mandy

1 similar comment
@mlbridge
Copy link

mlbridge bot commented May 19, 2021

Mailing list message from Aleksei Voitylov on core-libs-dev:

Hi David, Mandy,

In the updated PR I removed the lock held during the load/unload calls.
Our testing confirmed that without that removal the deadlock can easily
be reproduced, even without signed jars. Now the lock is only held to
prevent races during access to "libraries" and "loadedLibraryNames".

Thank you,
-Aleksei

On 14/05/2021 03:28, David Holmes wrote:

On 14/05/2021 7:20 am, Mandy Chung wrote:

On 5/13/21 6:05 AM, David Holmes wrote:

Not every problem has a solution :) If JNI_OnLoad has to execute
atomically with respect to loading a library then there will always
be a deadlock potential. The only complete solution would not hold a
lock while JNI_OnLoad is executed, but that completely changes the
semantics of native library loading.

I have been giving some thought on this issue.? I agree with David
that we should look into a solution not holding a lock while
JNI_OnLoad is executed.? It might be possible to put all threads that
trying to load the same native library that is being loaded to wait
and only one thread is loading it and executing JNI_OnLoad (e.g. add
a state in NativeLibrary to indicate it is being loaded, already
loaded, being unloaded).? I haven't had the cycle to think through it
in details though.

Any mechanism that blocks threads from accessing the library while
another thread is performing JNI_OnLoad, suffers from the same
deadlock problem.

David

Mandy

@mlbridge
Copy link

mlbridge bot commented May 19, 2021

Mailing list message from David Holmes on core-libs-dev:

On 20/05/2021 2:29 am, Aleksei Voitylov wrote:

On Wed, 19 May 2021 16:21:41 GMT, Aleksei Voitylov <avoitylov at openjdk.org> wrote:

Please review this PR which fixes the deadlock in ClassLoader between the two lock objects - a lock object associated with the class being loaded, and the ClassLoader.loadedLibraryNames hash map, locked during the native library load operation.

Problem being fixed:

The initial reproducer demonstrated a deadlock between the JarFile/ZipFile and the hash map. That deadlock exists even when the ZipFile/JarFile lock is removed because there's another lock object in the class loader, associated with the name of the class being loaded. Such objects are stored in ClassLoader.parallelLockMap. The deadlock occurs when JNI_OnLoad() loads exactly the same class, whose signature is being verified in another thread.

Proposed fix:

The proposed patch suggests to get rid of locking loadedLibraryNames hash map and synchronize on each entry name, as it's done with class names in see ClassLoader.getClassLoadingLock(name) method.

The patch introduces nativeLibraryLockMap which holds the lock objects for each library name, and the getNativeLibraryLock() private method is used to lazily initialize the corresponding lock object. nativeLibraryContext was changed to ThreadLocal, so that in any concurrent thread it would have a NativeLibrary object on top of the stack, that's being currently loaded/unloaded in that thread. nativeLibraryLockMap accumulates the names of all native libraries loaded - in line with class loading code, it is not explicitly cleared.

Testing: jtreg and jck testing with no regressions. A new regression test was developed.

Aleksei Voitylov has updated the pull request incrementally with one additional commit since the last revision:

address review comments, add tests

Dear colleagues,

The updated PR addresses review comment regarding ThreadLocal as well as David' concern around the lock being held during JNI_OnLoad/JNI_OnUnload calls, and ensures all lock objects are deallocated. Multiple threads are allowed to enter NativeLibrary.load() to prevent any thread from locking while another thread loads a library. Before the update, there could be a class loading lock held by a parallel capable class loader, which can deadlock with the library loading lock. As proposed by David Holmes, the library loading lock was removed because dlopen/LoadLibrary are thread safe and they maintain internal reference counters on libraries. There's still a lock being held while a pair of containers are read/updated. It's not going to deadlock as there's no lock/wait operation performed while that lock is held. Multiple threads may create their own copies of NativeLibrary object and register it for auto unloading.

Tests for auto unloading were added along with the PR update. There are now 3 jtreg tests:
- one checks for deadlock, similar to the one proposed by Chris Hegarty
- two other tests are for library unload.

The major side effect of that multiple threads are allowed to enter is that JNI_OnLoad/JNI_OnUnload may be called multiple (but same) number of times from concurrent threads. In particular, the number of calls to JNI_OnLoad must be equal to the number of calls to JNI_OnUnload after the relevant class loader is garbage collected. This may affect the behaviour that relies on specific order or the number of JNI_OnLoad/JNI_OnUnload calls. The current JNI specification does not mandate how many times JNI_OnLoad/JNI_OnUnload are called. Also, we could not locate tests in jck/jtreg/vmTestbase that would rely on the specific order or number of calls to JNI_OnLoad/JNI_OnUnload.

But you can't make such a change! That was my point. To fix the deadlock
we must not hold a lock. But we must ensure only a single call to
JNI_OnLoad is possible. It is an unsolvable problem with those
constraints. You can't just change the behaviour of JNI_OnLoad like that.

David
-----

1 similar comment
@mlbridge
Copy link

mlbridge bot commented May 19, 2021

Mailing list message from David Holmes on core-libs-dev:

On 20/05/2021 2:29 am, Aleksei Voitylov wrote:

On Wed, 19 May 2021 16:21:41 GMT, Aleksei Voitylov <avoitylov at openjdk.org> wrote:

Please review this PR which fixes the deadlock in ClassLoader between the two lock objects - a lock object associated with the class being loaded, and the ClassLoader.loadedLibraryNames hash map, locked during the native library load operation.

Problem being fixed:

The initial reproducer demonstrated a deadlock between the JarFile/ZipFile and the hash map. That deadlock exists even when the ZipFile/JarFile lock is removed because there's another lock object in the class loader, associated with the name of the class being loaded. Such objects are stored in ClassLoader.parallelLockMap. The deadlock occurs when JNI_OnLoad() loads exactly the same class, whose signature is being verified in another thread.

Proposed fix:

The proposed patch suggests to get rid of locking loadedLibraryNames hash map and synchronize on each entry name, as it's done with class names in see ClassLoader.getClassLoadingLock(name) method.

The patch introduces nativeLibraryLockMap which holds the lock objects for each library name, and the getNativeLibraryLock() private method is used to lazily initialize the corresponding lock object. nativeLibraryContext was changed to ThreadLocal, so that in any concurrent thread it would have a NativeLibrary object on top of the stack, that's being currently loaded/unloaded in that thread. nativeLibraryLockMap accumulates the names of all native libraries loaded - in line with class loading code, it is not explicitly cleared.

Testing: jtreg and jck testing with no regressions. A new regression test was developed.

Aleksei Voitylov has updated the pull request incrementally with one additional commit since the last revision:

address review comments, add tests

Dear colleagues,

The updated PR addresses review comment regarding ThreadLocal as well as David' concern around the lock being held during JNI_OnLoad/JNI_OnUnload calls, and ensures all lock objects are deallocated. Multiple threads are allowed to enter NativeLibrary.load() to prevent any thread from locking while another thread loads a library. Before the update, there could be a class loading lock held by a parallel capable class loader, which can deadlock with the library loading lock. As proposed by David Holmes, the library loading lock was removed because dlopen/LoadLibrary are thread safe and they maintain internal reference counters on libraries. There's still a lock being held while a pair of containers are read/updated. It's not going to deadlock as there's no lock/wait operation performed while that lock is held. Multiple threads may create their own copies of NativeLibrary object and register it for auto unloading.

Tests for auto unloading were added along with the PR update. There are now 3 jtreg tests:
- one checks for deadlock, similar to the one proposed by Chris Hegarty
- two other tests are for library unload.

The major side effect of that multiple threads are allowed to enter is that JNI_OnLoad/JNI_OnUnload may be called multiple (but same) number of times from concurrent threads. In particular, the number of calls to JNI_OnLoad must be equal to the number of calls to JNI_OnUnload after the relevant class loader is garbage collected. This may affect the behaviour that relies on specific order or the number of JNI_OnLoad/JNI_OnUnload calls. The current JNI specification does not mandate how many times JNI_OnLoad/JNI_OnUnload are called. Also, we could not locate tests in jck/jtreg/vmTestbase that would rely on the specific order or number of calls to JNI_OnLoad/JNI_OnUnload.

But you can't make such a change! That was my point. To fix the deadlock
we must not hold a lock. But we must ensure only a single call to
JNI_OnLoad is possible. It is an unsolvable problem with those
constraints. You can't just change the behaviour of JNI_OnLoad like that.

David
-----

@voitylov
Copy link
Author

Sorry, let me get back to this after I talk with the chalkboard.

@voitylov voitylov marked this pull request as draft May 20, 2021 12:40
Copy link
Member

@mlchung mlchung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good in general with a minor suggestion to rename NativeLibraryContext::get to NativeLibraryContext::current which would be clearer as it returns the current native library context (of the current thread).

@@ -202,14 +204,13 @@ private NativeLibrary loadLibrary(Class<?> fromClass, String name, boolean isBui
* another loadLibrary invocation that should succeed.
*
* We use a static stack to hold the list of libraries we are
* loading because this can happen only when called by the
* same thread because this block is synchronous.
* loading, so that each thread maintains its own stack.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 206: no longer a static stack. line 206-207 can be replaced with:

             * Each thread maintains its own stack to hold the list of libraries it is loading.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

*/
for (NativeLibraryImpl lib : nativeLibraryContext) {
for (NativeLibraryImpl lib : NativeLibraryContext.get()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to rename the get method of NativeLibraryContext to current that returns the current thread's context.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

} catch (ClassNotFoundException |
InstantiationException |
IllegalAccessException ignore) {
System.out.println("Class Class1 not found.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general a test should let the exception to propagate. In this case, the threads (both t1 and t2) can warp the exception and throw RuntimeException as it's an error.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Class c2 = Class.forName("p.Class2");
System.out.println("Signed jar loaded.");
} catch (ClassNotFoundException ignore) {
System.out.println("Class Class2 not found.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

});
try {
return future.get(1000, TimeUnit.MILLISECONDS);
} catch (Exception ignoreAll) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is an unexpected error case, it should throw an exception.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this threw an exception, it wouldn't be possible to collect all available data from the process which is not yet completed. Throwing an exception would defeat the purpose of readAvailable() method.

/*
* @test
* @bug 8266310
* @summary deadlock while loading the JNI code
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update @summary with a description of the test (rather than the synposis of the bug)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

/*
* @test
* @bug 8266310
* @summary deadlock while loading the JNI code
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please update @summary with a description of the test (rather than the synposis of the bug).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

/*
* @test
* @bug 8266310
* @summary deadlock while loading the JNI code
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please update @summary with a description of the test (rather than the synposis of the bug).

}

private static OutputAnalyzer genKey() throws Throwable {
runCommandInTestClassPath("rm", "-f", KEYSTORE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use jdk.test.lib.util.FileUtils instead of exec rm -f.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

private static OutputAnalyzer createJar(String outputJar, String... classes) throws Throwable {
String jar = JDKToolFinder.getJDKTool("jar");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can consider using jdk.test.lib.util.JarUtils to reduce the test execution time so that it can create the jar without exec'ing another process.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried, but this doesn't work that well unfortunately with JarUtils.createJar() as it only produces a valid jar-file capable of triggering the issue when the compiled class files (jtreg @build output) are in the same directory as the current directory of the process calling JarUtils.createJar(). Jtreg @build outputs the compiled classes to a separate directory. Creation of a new jar process allows to change the current directory so that the relative paths to the class files are such that it would form the valid output jar-file. Alternatively we could copy the class files to the current directory prior to creating the jar-file, but that would introduce an extra step. Would you mind me keeping the createJar() in the test?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is jar -C <dir> option what you want? FYI. jar tool is a tool provider which is a simple way to run jar tool in process (lookup the tool provider by calling ToolProvider.findFirst("jar")).

Yes I am fine with what you have if this does not work.

@voitylov
Copy link
Author

Mandy, thank you for review! I used all of your suggestions except for two places. Please see a responses in the conversation.

Copy link
Member

@mlchung mlchung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the change. I have a few minor comments and looking good.

I think the synopsis of the JBS issue should be updated to reflect this problem. What about "deadlock between System::loadLibrary and JNI FindClass loading another class"?

}

private static OutputAnalyzer createJar(String outputJar, String... classes) throws Throwable {
String jar = JDKToolFinder.getJDKTool("jar");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is jar -C <dir> option what you want? FYI. jar tool is a tool provider which is a simple way to run jar tool in process (lookup the tool provider by calling ToolProvider.findFirst("jar")).

Yes I am fine with what you have if this does not work.

public void run() {
try {
// an instance of unsigned class that loads a native library
Class c1 = Class.forName("Class1");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/Class/Class<?>/ avoid raw type (same in line 58)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed as suggested.

@Override
public Class<?> loadClass(String name) throws ClassNotFoundException {
synchronized (getClassLoadingLock(name)) {
Class clazz = findLoadedClass(name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/Class/Class<?>/

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed as suggested.

try {
System.loadLibrary("loadLibraryUnload");
System.out.println("Native library loaded from Class1.");
} catch (Exception ignore) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this exception just be thrown?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Removed try-catch.

@voitylov
Copy link
Author

Thanks Mandy. I used your suggestions in the updated version.

List<String> args = new ArrayList<>();
Collections.addAll(args, "cvf", Paths.get(testClassPath, outputJar).toString());
for (String c : classes) {
Collections.addAll(args, "-C", testClassPath, c);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe only one single -C option is good enough for this case since the classes are all in one single directory.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contrary to the man which says "-C DIR Changes the specified directory and includes the files specified at the end of the command line.", here is how it works:

bash-3.2$ find .
.
./dir
./dir/B.class
./dir/A.class
bash-3.2$ jar cvf a.jar -C dir A.class B.class
B.class : no such file or directory
added manifest
adding: A.class(in = 176) (out= 149)(deflated 15%)

and no jar is produced. Lines 614 and 647 of src/jdk.jartool/share/classes/sun/tools/jar/Main.java explain why. IIRC the intent was to mimic tar -C behaviour where -C applies to all files that follow, but jar does not work that way currently.

It's probably worth reaching consensus that it's a problem worth fixing first (a scary one to fix by the way, provided how many tools could rely on the current behaviour). Alternatively, man could be adjusted to reflect the current order. I could work on that as a follow up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are probably right. I have no more comment.

@mlchung
Copy link
Member

mlchung commented Jun 16, 2021

Looks good. Do you want to fix this in JDK 17? You will need to create a new PR for jdk17 repo and withdraw this one.

@voitylov
Copy link
Author

Thanks Mandy for the thorough review, I'll definitely do that. Shall we wait for Alan opinion as well?

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Jun 17, 2021
@mlchung
Copy link
Member

mlchung commented Jun 17, 2021

Thanks Mandy for the thorough review, I'll definitely do that. Shall we wait for Alan opinion as well?

I think you can proceed posting a PR for jdk17 as the new version has addressed Alan's concern of using ThreadLocal. Alan can review the new PR.

@voitylov
Copy link
Author

/integrate

@openjdk
Copy link

openjdk bot commented Jun 25, 2021

@voitylov This PR has not yet been marked as ready for integration.

@voitylov voitylov changed the title 8266310: deadlock while loading the JNI code 8266310: deadlock between System.loadLibrary and JNI FindClass loading another class Jun 25, 2021
@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jun 25, 2021
@voitylov
Copy link
Author

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Jun 26, 2021
@openjdk
Copy link

openjdk bot commented Jun 26, 2021

@voitylov
Your change (at version b6ceda3) is now ready to be sponsored by a Committer.

@AlexanderScherbatiy
Copy link

/sponsor

@openjdk
Copy link

openjdk bot commented Jul 6, 2021

Going to push as commit e47803a.
Since your change was applied there have been 820 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot closed this Jul 6, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jul 6, 2021
@openjdk
Copy link

openjdk bot commented Jul 6, 2021

@AlexanderScherbatiy @voitylov Pushed as commit e47803a.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@openjdk openjdk bot removed the sponsor Pull request is ready to be sponsored label Jul 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org integrated Pull request has been integrated
8 participants