Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dead Lock in class initialization #652

Closed
bschwert opened this issue May 3, 2016 · 23 comments
Closed

Dead Lock in class initialization #652

bschwert opened this issue May 3, 2016 · 23 comments

Comments

@bschwert
Copy link

bschwert commented May 3, 2016

I have observed a dead lock in class initialization, if two threads try to initialize a COM-Connection. My analysis is based on version 4.1.0. I have observed some bigger delay (~6s) in Thread-1 before the lock, while the Application is found in the RunningObjectTable. This is because of doors, which delays until the login is shown, but register in the ROT before.

Thread-1 is in Native.loadLibraryInstance with lock on libraries but waits for DISPPARAMS containing class OleAuto:
sun.misc.Unsafe.ensureClassInitialized(Class) Unsafe.java (native)
sun.reflect.UnsafeFieldAccessorFactory.newFieldAccessor(Field, boolean) UnsafeFieldAccessorFactory.java:43
sun.reflect.ReflectionFactory.newFieldAccessor(Field, boolean) ReflectionFactory.java:140
java.lang.reflect.Field.acquireFieldAccessor(boolean) Field.java:1057
java.lang.reflect.Field.getFieldAccessor(Object) Field.java:1038
java.lang.reflect.Field.get(Object) Field.java:379
com.sun.jna.Native.loadLibraryInstance(Class) Native.java:447
com.sun.jna.Native.getLibraryOptions(Class) Native.java:508
com.sun.jna.Native.getStructureAlignment(Class) Native.java:596
com.sun.jna.Structure.setAlignType(int) Structure.java:251
com.sun.jna.Structure.(Pointer, int, TypeMapper) Structure.java:176
com.sun.jna.Structure.(Pointer, int) Structure.java:172
com.sun.jna.Structure.(int) Structure.java:159
com.sun.jna.Structure.() Structure.java:151
com.sun.jna.platform.win32.OleAuto$DISPPARAMS.() OleAuto.java:379
com.sun.jna.platform.win32.OleAuto$DISPPARAMS$ByReference.() OleAuto.java:359
com.sun.jna.platform.win32.COM.util.ProxyObject.oleMethod(int, Variant$VARIANT$ByReference, IDispatch, OaIdl$DISPID, Variant$VARIANT[]) ProxyObject.java:596
com.sun.jna.platform.win32.COM.util.ProxyObject.oleMethod(int, Variant$VARIANT$ByReference, IDispatch, String, Variant$VARIANT[]) ProxyObject.java:574
com.sun.jna.platform.win32.COM.util.ProxyObject.getProperty(Class, String, Object[]) ProxyObject.java:381
com.sun.jna.platform.win32.COM.util.ProxyObject.invokeSynchronised(Object, Method, Object[]) ProxyObject.java:260
com.sun.jna.platform.win32.COM.util.ProxyObject.invoke(Object, Method, Object[]) ProxyObject.java:220
[…]

Thread-2 tries to initialize Class OleAuto but this requires the libraries Lock:
com.sun.jna.Native.cacheOptions(Class, Map, Object) Native.java:1547
com.sun.jna.Native.loadLibrary(String, Class, Map) Native.java:428
com.sun.jna.platform.win32.OleAuto.() OleAuto.java:102
com.sun.jna.platform.win32.Variant$VARIANT.(String) Variant.java:198
com.sun.jna.platform.win32.COM.util.Convert.toVariant(Object) Convert.java:41
com.sun.jna.platform.win32.COM.util.ProxyObject.invokeMethod(Class, String, Object[]) ProxyObject.java:407
com.sun.jna.platform.win32.COM.util.ProxyObject.invokeSynchronised(Object, Method, Object[]) ProxyObject.java:267
com.sun.jna.platform.win32.COM.util.ProxyObject.invoke(Object, Method, Object[]) ProxyObject.java:220

@neilcsmith-net
Copy link
Contributor

We've had similar reports of a deadlock in the GStreamer bindings. Calling field.get(...) inside the synchronized block is really problematic in a large JNA codebase as it can set off a massive train of class initializations that can lead to deadlock with other threads. Is it possible to bring class initialization out of synchronized blocks at all?

@twall
Copy link
Contributor

twall commented May 9, 2016

Perhaps invokeSynchronized needs to push its synchronization block down to a more primitive level, after its converted all of its arguments into most primitive form?On May 7, 2016, at 11:57 AM, Neil C Smith notifications@github.com wrote:We've had similar reports of a deadlock in the GStreamer bindings. Calling field.get(...) inside the synchronized block is really problematic in a large JNA codebase as it can set off a massive train of class initializations that can lead to deadlock with other threads. Is it possible to bring class initialization out of synchronized blocks at all?—You are receiving this because you are subscribed to this thread.Reply to this email directly or view it on GitHub

@neilcsmith-net
Copy link
Contributor

Perhaps invokeSynchronized needs to push its synchronization block down to a more primitive level, after its converted all of its arguments into most primitive form?

In my opinion, that's not the issue. It would possibly solve @bschwert immediate problem, but the root cause is forcing class initialization with the libraries lock in Native. We don't actually use the ProxyObject stuff, incidentally, it's a deadlock with something else.

@matthiasblaesing
Copy link
Member

I'm still trying to understand the need for the synchronized block, bug asuming this is "only" to protected the libraries map, I see this option:

    private static void loadLibraryInstance(Class<?> cls) {
        if(cls == null) {
            return;
        }
        synchronized(libraries) {
            if(libraries.containsKey(cls)) {
                return;
            }
        }

        Object staticFieldValue = null;

        try {
            Field[] fields = cls.getFields();
            for (int i = 0; i < fields.length; i++) {
                Field field = fields[i];
                if (field.getType() == cls
                        && Modifier.isStatic(field.getModifiers())) {
                    // Ensure the field gets initialized by reading it
                    staticFieldValue = field.get(null);
                    break;
                }
            }
        } catch (Exception e) {
            throw new IllegalArgumentException("Could not access instance of "
                    + cls + " (" + e + ")");
        }

        synchronized (libraries) {
            if (!libraries.containsKey(cls)) {
                libraries.put(cls, new WeakReference<Object>(staticFieldValue));
            }
        }
    }

At least this method should now be save - I did not offer this as a PR, as I'm new to the code and wonder whether this lock is really only held to protect the libraries map in this case or if there are side effects I'm overlooking.

@SevenOf9Sleeper
Copy link
Contributor

@matthiasblaesing Is this really a good idea? You separate the "contains" and the "put" calls in two different synchronized-blocks. So two threads can have a race condition now and both assume that the class isn't initialized and are both initialize the class and the last one wins with his entry in the libraries-map... or doesn't the duplicate initialization matter at all?

@neilcsmith-net
Copy link
Contributor

neilcsmith-net commented May 9, 2016

On 9 May 2016 21:15, "Mathias Mehrmann" notifications@github.com wrote:
So two threads can have a race condition now and both assume that the class isn't initialized and are both initialize the class and the last one wins with his entry in the libraries-map... or doesn't the duplicate initialization matter at all?

a) in that code, first one wins, although it wouldn't matter ...
b) class initialisation is guaranteed to only happen once by the VM.

So, if that is the only reason for the lock, maybe a ConcurrentHashMap
would be better?

@SevenOf9Sleeper
Copy link
Contributor

Your are right. The VM initializes static members only once... :-]
So both threads should read the same value when a race condition would occur.

For your suggestion of a ConcurrentHashMap: I do not know the JNA code for long, but it seems that there are situations where 'libraries' and 'typeOptions' are written in conjunction and are protected together with the one monitor 'libraries'. I don't know wether that is a must be, but a ConcurrentHashMap instead of the monitor object would get rid of this consistent behaviour.

@SevenOf9Sleeper
Copy link
Contributor

I have done some experiments. And with the following class you can reproduce the deadlock of @bschwert quite reliable (tested with JDK1.7.0_79, 64 Bit, JDK1.8.0_66, 32 Bit and 64 Bit on Windows 7 Home Premium Service Pack 1):

package sogrades.jnatest;

import com.sun.jna.platform.win32.OleAuto;
import com.sun.jna.platform.win32.OleAuto.DISPPARAMS;

public class TestJNA {

    public static void main(String[] args) {
        Runnable runnable1 = getRunnable1();
        Runnable runnable2 = getRunnable2();

        Thread t1 = new Thread(runnable1);
        Thread t2 = new Thread(runnable2);
        t1.start();
        t2.start();
        System.out.println("main-method ends...");
    }

    private static Runnable getRunnable1() {
        return new Runnable() {
            @Override
            public void run() {
                OleAuto oleAuto = OleAuto.INSTANCE;
                System.out.println("OleAuto init ends...");
            }
        };
    }

    private static Runnable getRunnable2() {
        return new Runnable() {
            @Override
            public void run() {
                DISPPARAMS dispParams = new OleAuto.DISPPARAMS.ByReference();
                System.out.println("DISPPARAMS init ends...");
            }
        };
    }
}

So you get these thread dumps:

Stack Trace
Thread-0 [9] (BLOCKED)
   com.sun.jna.Native.getLibraryOptions line: 606 
   com.sun.jna.Native.getStructureAlignment line: 708 
   com.sun.jna.Structure.setAlignType line: 258 
   com.sun.jna.Structure.<init> line: 175 
   com.sun.jna.Structure.<init> line: 171 
   com.sun.jna.Structure.<init> line: 158 
   com.sun.jna.Structure.<init> line: 150 
   com.sun.jna.Union.<init> line: 35 
   com.sun.jna.platform.win32.Variant$VARIANT.<init> line: 158 
   com.sun.jna.platform.win32.Variant$VARIANT.<clinit> line: 150 
   java.lang.Class.forName0 line: not available [native method]
   java.lang.Class.forName line: 191 
   com.sun.proxy.$Proxy0.<clinit> line: not available 
   sun.reflect.NativeConstructorAccessorImpl.newInstance0 line: not available [native method]
   sun.reflect.NativeConstructorAccessorImpl.newInstance line: 57 
   sun.reflect.DelegatingConstructorAccessorImpl.newInstance line: 45 
   java.lang.reflect.Constructor.newInstance line: 526 
   java.lang.reflect.Proxy.newInstance line: 764 
   java.lang.reflect.Proxy.newProxyInstance line: 755 
   com.sun.jna.Native.loadLibrary line: 519 
   com.sun.jna.platform.win32.OleAuto.<clinit> line: 48 
   sogrades.jnatest.TestJNA$1.run line: 25 
   java.lang.Thread.run line: 745 

Stack Trace
Thread-1 [10] (RUNNABLE)
   sun.misc.Unsafe.ensureClassInitialized line: not available [native method]
   sun.reflect.UnsafeFieldAccessorFactory.newFieldAccessor line: 43 
   sun.reflect.ReflectionFactory.newFieldAccessor line: 140 
   java.lang.reflect.Field.acquireFieldAccessor line: 1057 
   java.lang.reflect.Field.getFieldAccessor line: 1038 
   java.lang.reflect.Field.get line: 379 
   com.sun.jna.Native.loadLibraryInstance line: 539 
   com.sun.jna.Native.getLibraryOptions line: 614 
   com.sun.jna.Native.getStructureAlignment line: 708 
   com.sun.jna.Structure.setAlignType line: 258 
   com.sun.jna.Structure.<init> line: 175 
   com.sun.jna.Structure.<init> line: 171 
   com.sun.jna.Structure.<init> line: 158 
   com.sun.jna.Structure.<init> line: 150 
   com.sun.jna.platform.win32.OleAuto$DISPPARAMS.<init> line: 579 
   com.sun.jna.platform.win32.OleAuto$DISPPARAMS$ByReference.<init> line: 558 
   sogrades.jnatest.TestJNA$2.run line: 35 
   java.lang.Thread.run line: 745 

I have then implemented the proposal of @matthiasblaesing. And: the above stated deadlock is away... but: there seems to be a new one:

Stack Trace
Thread-0 [11] (RUNNABLE)
   com.sun.jna.Native.initIDs line: not available [native method]
   com.sun.jna.Native.<clinit> line: 154 
   com.sun.jna.platform.win32.OleAuto.<clinit> line: 48 
   sogrades.jnatest.TestJNA$1.run line: 12 
   java.lang.Thread.run line: 745 

Stack Trace
Thread-1 [12] (RUNNABLE)
   com.sun.jna.Pointer.<clinit> line: 43 
   com.sun.jna.Structure.<clinit> line: 2120 
   sogrades.jnatest.TestJNA$2.run line: 20 
   java.lang.Thread.run line: 745

So Pointer is waiting for "Native" to be loaded. But "Native" class initialization does not end because of... what? initIDs() wants to access the class Pointer as well? Why is this a problem? (ok... the class Pointer isn't initialized until right now, but in the Java Sourcecode these bidirectional dependencies are not a problem, aren't they?)

@twall
Copy link
Contributor

twall commented May 10, 2016

In general the synchronization is only there to protect the libraries map;
it's important that a given library mapping only get initialized once, and
that the native-side initialization only happens once, and consistently,
mostly so that any options associated with the library are consistent
across structures and callbacks.

Then, of course, it's important that the Native class is consistently
initialized. All native functions were moved into that class to make it
easier to reason about its consistency (for loading as well as unloading).

On Mon, May 9, 2016 at 7:20 PM, Mathias Mehrmann notifications@github.com
wrote:

I have done some experiments. And with the following class you can
reproduce the deadlock of @bschwert https://github.com/bschwert quite
reliable (tested with JDK1.7.0_79, 64 Bit, JDK1.8.0_66, 32 Bit and 64 Bit
on Windows 7 Home Premium Service Pack 1):

package sogrades.jnatest;

import com.sun.jna.platform.win32.OleAuto;
import com.sun.jna.platform.win32.OleAuto.DISPPARAMS;

public class TestJNA {

public static void main(String[] args) {
    Runnable runnable1 = getRunnable1();
    Runnable runnable2 = getRunnable2();

    Thread t1 = new Thread(runnable1);
    Thread t2 = new Thread(runnable2);
    t1.start();
    t2.start();
    System.out.println("main-method ends...");
}

private static Runnable getRunnable1() {
    return new Runnable() {
        @Override
        public void run() {
            OleAuto oleAuto = OleAuto.INSTANCE;
            System.out.println("OleAuto init ends...");
        }
    };
}

private static Runnable getRunnable2() {
    return new Runnable() {
        @Override
        public void run() {
            DISPPARAMS dispParams = new OleAuto.DISPPARAMS.ByReference();
            System.out.println("DISPPARAMS init ends...");
        }
    };
}

}

So you get these thread dumps:

Stack Trace
Thread-0 9
com.sun.jna.Native.getLibraryOptions line: 606
com.sun.jna.Native.getStructureAlignment line: 708
com.sun.jna.Structure.setAlignType line: 258
com.sun.jna.Structure. line: 175
com.sun.jna.Structure. line: 171
com.sun.jna.Structure. line: 158
com.sun.jna.Structure. line: 150
com.sun.jna.Union. line: 35
com.sun.jna.platform.win32.Variant$VARIANT. line: 158
com.sun.jna.platform.win32.Variant$VARIANT. line: 150
java.lang.Class.forName0 line: not available [native method]
java.lang.Class.forName line: 191
com.sun.proxy.$Proxy0. line: not available
sun.reflect.NativeConstructorAccessorImpl.newInstance0 line: not available [native method]
sun.reflect.NativeConstructorAccessorImpl.newInstance line: 57
sun.reflect.DelegatingConstructorAccessorImpl.newInstance line: 45
java.lang.reflect.Constructor.newInstance line: 526
java.lang.reflect.Proxy.newInstance line: 764
java.lang.reflect.Proxy.newProxyInstance line: 755
com.sun.jna.Native.loadLibrary line: 519
com.sun.jna.platform.win32.OleAuto. line: 48
sogrades.jnatest.TestJNA$1.run line: 25
java.lang.Thread.run line: 745

Stack Trace
Thread-1 10
sun.misc.Unsafe.ensureClassInitialized line: not available [native method]
sun.reflect.UnsafeFieldAccessorFactory.newFieldAccessor line: 43
sun.reflect.ReflectionFactory.newFieldAccessor line: 140
java.lang.reflect.Field.acquireFieldAccessor line: 1057
java.lang.reflect.Field.getFieldAccessor line: 1038
java.lang.reflect.Field.get line: 379
com.sun.jna.Native.loadLibraryInstance line: 539
com.sun.jna.Native.getLibraryOptions line: 614
com.sun.jna.Native.getStructureAlignment line: 708
com.sun.jna.Structure.setAlignType line: 258
com.sun.jna.Structure. line: 175
com.sun.jna.Structure. line: 171
com.sun.jna.Structure. line: 158
com.sun.jna.Structure. line: 150
com.sun.jna.platform.win32.OleAuto$DISPPARAMS. line: 579
com.sun.jna.platform.win32.OleAuto$DISPPARAMS$ByReference. line: 558
sogrades.jnatest.TestJNA$2.run line: 35
java.lang.Thread.run line: 745

I have then implemented the proposal of @matthiasblaesing
https://github.com/matthiasblaesing. And: the above stated deadlock is
away... but: there seems to be a new one:

Stack Trace
Thread-0 11
com.sun.jna.Native.initIDs line: not available [native method]
com.sun.jna.Native. line: 154
com.sun.jna.platform.win32.OleAuto. line: 48
sogrades.jnatest.TestJNA$1.run line: 12
java.lang.Thread.run line: 745

Stack Trace
Thread-1 12
com.sun.jna.Pointer. line: 43
com.sun.jna.Structure. line: 2120
sogrades.jnatest.TestJNA$2.run line: 20
java.lang.Thread.run line: 745

So Pointer is waiting for "Native" to be loaded. But "Native" class
initialization does not end because of... what? initIDs() wants to access
the class Pointer as well? Why is this a problem? (ok... the class Pointer
isn't initialized until right now, but in the Java Sourcecode these
bidirectional dependencies are not a problem, aren't they?)


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#652 (comment)

@rdicroce
Copy link

rdicroce commented Dec 5, 2016

I've gotten hit by this issue as well. After some research, I have a theory about what's happening, though I have no idea how to verify it.

Section 5.5 of the JVM specification for Java 8 states:

For each class or interface C, there is a unique initialization lock LC. The mapping from C to LC is left to the discretion of the Java Virtual Machine implementation. For example, LC could be the Class object for C, or the monitor associated with that Class object. The procedure for initializing C is then as follows:

  1. Synchronize on the initialization lock, LC, for C. This involves waiting until the current thread can acquire LC.
  2. If the Class object for C indicates that initialization is in progress for C by some other thread, then release LC and block the current thread until informed that the in-progress initialization has completed, at which time repeat this procedure. Thread interrupt status is unaffected by execution of the initialization procedure.
  3. If the Class object for C indicates that initialization is in progress for C by the current thread, then this must be a recursive request for initialization. Release LC and complete normally.

In light of the above, consider the following sequence:

  1. Thread A starts initializing the Pointer class, but has not yet reached the point where it needs Native.
  2. Thread B starts initializing the Native class, but has not yet reached the call to initIDs().
  3. Both threads now proceed.

If I'm interpreting the JVM spec correctly, this results in a deadlock. Thread B is initializing Native and tries to initialize Pointer (because of LOAD_CREF for Pointer in initIDs()), but discovers that thread A is already initializing Pointer, so it waits for thread A to complete. Meanwhile, thread A is initializing Pointer and tries to initialize Native (because it needs to read the POINTER_SIZE field), but discovers that thread B is already initializing Native, so it waits for thread B to complete.

This would also explain why there's no problem when there's only one thread in play, because point 3 from the JVM spec handles that case.

Again, this is mostly guesswork, but it fits the available evidence. Anyone have an idea how we would find out for sure?

@pjdarton
Copy link

pjdarton commented Apr 7, 2017

Hi folks.
I've been encountering this issue when running Jenkins and I believe that it's the underlying cause of https://issues.jenkins-ci.org/browse/JENKINS-39179 and https://issues.jenkins-ci.org/browse/JENKINS-16070 .
Whilst Jenkins has its own fork of JNA (https://github.com/jenkinsci/jna), it's not so different that it doesn't suffer from the same issues, so we we share a common interest in getting this fixed...

  1. Jenkins class hudson.util.jna.Kernel32Utils depends on Jenkins class hudson.util.jna.Kernel32 depends on JNA class com.sun.jna.Native which depends on com.sun.jna.Pointer (which depends on com.sun.jna.Native)
  2. Jenkins class hudson.node_monitors.SwapSpaceMonitor depends on Jenkins class org.jvnet.hudson.Windows depends on JNA class com.sun.jna.Structure which depends on com.sun.jna.Pointer (which depends on com.sun.jna.Native)

What I'm seeing is that I have two separate threads causing classloading of these two independently (see stacktraces below), where the first one ("pool-1-thread-3") has started initializing Native and not got as far as Pointer, and the second thread ("pool-1-thread-9") has started initializing Pointer and not got as far as Native, then they'll deadlock waiting for the other thread to finish classloading.

"pool-1-thread-3 for Channel to jenkins.mydomain.com/1.2.3.4 id=3616289" Id=17 Group=main WAITING on java.lang.J9VMInternals$ClassInitializationLock@1ae67f08 (in native)
    at java.lang.Object.wait(Native Method)
    -  waiting on java.lang.J9VMInternals$ClassInitializationLock@1ae67f08
    at java.lang.Object.wait(Object.java:167)
    at java.lang.J9VMInternals.initialize(J9VMInternals.java:274)
    -  locked java.lang.J9VMInternals$ClassInitializationLock@1ae67f08
    at com.sun.jna.Native.initIDs(Native Method)
    at com.sun.jna.Native.<clinit>(Native.java:148)
    at java.lang.J9VMInternals.initializeImpl(Native Method)
    at java.lang.J9VMInternals.initialize(J9VMInternals.java:237)
    at hudson.util.jna.Kernel32Utils.load(Kernel32Utils.java:112)
    at hudson.util.jna.Kernel32.<clinit>(Kernel32.java:37)
    at java.lang.J9VMInternals.initializeImpl(Native Method)
    at java.lang.J9VMInternals.initialize(J9VMInternals.java:237)
    at hudson.util.jna.Kernel32Utils.getWin32FileAttributes(Kernel32Utils.java:77)
    at hudson.util.jna.Kernel32Utils.isJunctionOrSymlink(Kernel32Utils.java:98)
    at hudson.Util.isSymlink(Util.java:507)
    at hudson.FilePath.deleteRecursive(FilePath.java:1199)
    at hudson.FilePath.access$1000(FilePath.java:195)
    at hudson.FilePath$14.invoke(FilePath.java:1179)
    at hudson.FilePath$14.invoke(FilePath.java:1176)
    at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2731)
    at hudson.remoting.UserRequest.perform(UserRequest.java:153)
    at hudson.remoting.UserRequest.perform(UserRequest.java:50)
    at hudson.remoting.Request$2.run(Request.java:336)
    at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
    at java.util.concurrent.FutureTask.run(FutureTask.java:273)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1156)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:626)
    at hudson.remoting.Engine$1$1.run(Engine.java:94)
    at java.lang.Thread.run(Thread.java:804)

    Number of locked synchronizers = 1
    - java.util.concurrent.ThreadPoolExecutor$Worker@819d87b4
"pool-1-thread-9 for Channel to jenkins.mydomain.com/1.2.3.4 id=3616789" Id=24 Group=main WAITING on java.lang.J9VMInternals$ClassInitializationLock@fe8f4030 (in native)
    at java.lang.Object.wait(Native Method)
    -  waiting on java.lang.J9VMInternals$ClassInitializationLock@fe8f4030
    at java.lang.Object.wait(Object.java:167)
    at java.lang.J9VMInternals.initialize(J9VMInternals.java:274)
    -  locked java.lang.J9VMInternals$ClassInitializationLock@fe8f4030
    at com.sun.jna.Pointer.<clinit>(Pointer.java:41)
    at java.lang.J9VMInternals.initializeImpl(Native Method)
    at java.lang.J9VMInternals.initialize(J9VMInternals.java:237)
    at java.lang.J9VMInternals.initialize(J9VMInternals.java:204)
    at com.sun.jna.Structure.<clinit>(Structure.java:2078)
    at java.lang.J9VMInternals.initializeImpl(Native Method)
    at java.lang.J9VMInternals.initialize(J9VMInternals.java:237)
    at java.lang.J9VMInternals.initialize(J9VMInternals.java:204)
    at org.jvnet.hudson.Windows.monitor(Windows.java:42)
    at hudson.node_monitors.SwapSpaceMonitor$MonitorTask.call(SwapSpaceMonitor.java:124)
    at hudson.node_monitors.SwapSpaceMonitor$MonitorTask.call(SwapSpaceMonitor.java:114)
    at hudson.remoting.UserRequest.perform(UserRequest.java:153)
    at hudson.remoting.UserRequest.perform(UserRequest.java:50)
    at hudson.remoting.Request$2.run(Request.java:336)
    at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
    at java.util.concurrent.FutureTask.run(FutureTask.java:273)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1156)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:626)
    at hudson.remoting.Engine$1$1.run(Engine.java:94)
    at java.lang.Thread.run(Thread.java:804)

    Number of locked synchronizers = 1
    - java.util.concurrent.ThreadPoolExecutor$Worker@bf193c54

I think that what's needed is to eliminate circular dependencies that exist at classloading time.
i.e. either Native's static initialization functionality must stop depending on Pointer, or Pointer's static initialization must stop depending on Native.

@rdicroce
Copy link

rdicroce commented Apr 7, 2017

I agree this should be fixed properly. Until that happens, I have been using a workaround and haven't seen any deadlocks since. The trick is to do something once, at a point before anything tries to use JNA, that causes these two classes to get initialized. In my case, I'm calling Native#getDefaultStringEncoding() in the main() of my application. This causes both classes to get initialized on the same thread before any other threads can compete to initialize them, thus avoiding the deadlock.

@pjdarton
Copy link

pjdarton commented Apr 7, 2017

One can do a similar workaround in Jenkins by passing -Dhudson.remoting.RemoteClassLoader.force=com.sun.jna.Native to the JRE when starting the Jenkins slave processes, which should result in much the same effect (I'm in the process of rolling out that change right now...).
However, this kind of workaround is messy and should not be required - the code should be thread-safe without having to do this kind of underhand trickery.

@neilcsmith-net
Copy link
Contributor

Isn't the Native and Pointer deadlock a different issue? The OP (and my) issue are with class initialization of classes with inter-dependencies that use JNA happening under the libraries lock, causing deadlock. This is well after the internal Native and Pointer classes have been initialized?!

@pjdarton
Copy link

It's possible...

I arrived here because I found the similarity between what I was seeing and the @rdicroce's comment 5 Dec 2016 - that comment perfectly describes what I had. If that comment is inapplicable to the OP then my comment is similarly misplaced...

That said, my inclination would be that resolving the circular dependencies such that there's a very clear hierarchy of classes & initialization would likely make it far more apparent where the runtime deadlock came from - circular dependencies make for much confusion.

@twall
Copy link
Contributor

twall commented Apr 12, 2017 via email

@matthiasblaesing
Copy link
Member

I'd like you to have a look at this branch:

https://github.com/matthiasblaesing/jna/tree/unlockedLoad

This branch is a proposal, to reduce the problem surface. There are two vectors addressed:

  • Pointer and Native classes are prone to deadlocks when loaded from different Threads. The cycle can be broken by removing the Pointer#SIZE attribute and instead using the Native#POINTER_SIZE value. From the commit message:

Remove Pointer#SIZE replaced by Native#POINTER_SIZE to prevent classloading deadlock

If Pointer and Native class are concurrently initialized, a deadlock
results:

  • The static initializer block of Pointer uses Native#POINTER_SIZE
    to initialize its SIZE value
  • The Native#initIDs method loads the Pointer class via JNI to find
    its ClassID and MethodIDs

Both threads enter the initialization lock for their corresponding class
for initialization to succeed both classes also need an initialized
version of the counter part. Thus each thread needed to acquire the
others initialization lock.

  • The synchronized blocks in Native, that synchronize on Native#library are to broad and prone to deadlocks, as further class loading is triggered. From the commit message:

Use a synchronizedMap for typeOptions and libraries instead of seperate lock

The currently used lock is to broad and can cause a deadlock when
loadLibrary is called from different threads, that need
recursive initialization of dependend classes.

The codepaths that hold the libraries lock to prevent duplicate
execution of initialization code. The lock can be dropped in favor
of a Collections#synchronizedMap construct, that only protects the
Map structure.

In detail:

  • loadLibraryInstance: The lock on libraries prevents a duplicate put of
    WeakReference to the static singleton instance of the library.
    This is uncritical, as the References are equal from the outside view.

    The library won't be initialized twice, as the JLS defines, that
    class initialization must happend under a lock.

    The codepath can be entered multiple times without negative consequences.

  • findEnclosingLibrary: The lock on libraries prevents recursive calls
    to Native#findEnclosingLibrary and CallbackReference#findCallbackClass.

    These paths have no side effects apart from caching, so can be
    entered multiple times.

  • getLibraryOptions: The lock on libraries prevents building the library
    options map multiple times. As there are no side effects, locking is
    not necessary.

  • cacheOptions: The lock on libraries prevents multiple puts of the
    options map into the typeOptions and libraries maps.
    As there are no side effects, locking is not necessary.

This causes an API break, removing the Pointer#SIZE member. Any thoughts on this?

@rdicroce
Copy link

I haven't tested the changes but they look fine to me. Removing Pointer#SIZE is an API break but doesn't seem like a big deal to me since Native#POINTER_SIZE has existed since 4.0 according to Git. So anyone using Pointer#SIZE can just do search and replace and still be compatible with any 4.x version of JNA.

@mmitche
Copy link

mmitche commented Sep 8, 2017

What's status of that potential fix. Has anyone attempted to use it?

@matthiasblaesing
Copy link
Member

After the release of 4.5.0 (just happened) I just merged the JNA-5.0.0 branch into master. As I did not hear any complaints, I'll asume this fixes the issue. If not, please reopen.

I don't want to break API in each version, so this would be a good point in time to raise problems.

facebook-github-bot pushed a commit to facebook/buck that referenced this issue May 26, 2018
Summary: context: java-native-access/jna#652

Reviewed By: jtorkkola

fbshipit-source-id: 7e34248
@OlegYch
Copy link

OlegYch commented Sep 30, 2018

is there a release planned with this?
still happening with 4.5.0 and 4.5.2 here sbt/io#152

@matthiasblaesing
Copy link
Member

@OlegYch
Copy link

OlegYch commented Sep 30, 2018

thanks, it seems to work

jasonnam added a commit to jasonnam/buck that referenced this issue Jan 10, 2021
java-native-access/jna#652

This issue was resolved and property SIZE does not exist with JNA version 5.6.0.
rajyengi pushed a commit to facebook/buck that referenced this issue Jan 12, 2021
* Update JNA to resolve framework loading issue on Big Sur

java-native-access/jna#1215

* Remove workaround initializing JNA early

java-native-access/jna#652

This issue was resolved and property SIZE does not exist with JNA version 5.6.0.

* Update NuProcess to 2.0.1 for compatibility with JNA

* Build NuProcess with openjdk version "1.8.0_275"

jetty/jetty.project#3244

* Cherry pick a2912b9

a2912b9
Bencodes pushed a commit to lyft/buck that referenced this issue Feb 10, 2021
* Update JNA to resolve framework loading issue on Big Sur

java-native-access/jna#1215

* Remove workaround initializing JNA early

java-native-access/jna#652

This issue was resolved and property SIZE does not exist with JNA version 5.6.0.

* Update NuProcess to 2.0.1 for compatibility with JNA

* Build NuProcess with openjdk version "1.8.0_275"

jetty/jetty.project#3244

* Cherry pick a2912b9

facebook@a2912b9
shepting pushed a commit to airbnb/buck that referenced this issue Nov 11, 2021
* Update JNA to resolve framework loading issue on Big Sur

java-native-access/jna#1215

* Remove workaround initializing JNA early

java-native-access/jna#652

This issue was resolved and property SIZE does not exist with JNA version 5.6.0.

* Update NuProcess to 2.0.1 for compatibility with JNA

* Build NuProcess with openjdk version "1.8.0_275"

jetty/jetty.project#3244

* Cherry pick a2912b9

facebook@a2912b9
shepting pushed a commit to airbnb/buck that referenced this issue Nov 11, 2021
* Update JNA to resolve framework loading issue on Big Sur

java-native-access/jna#1215

* Remove workaround initializing JNA early

java-native-access/jna#652

This issue was resolved and property SIZE does not exist with JNA version 5.6.0.

* Update NuProcess to 2.0.1 for compatibility with JNA

* Build NuProcess with openjdk version "1.8.0_275"

jetty/jetty.project#3244

* Cherry pick a2912b9

facebook@a2912b9
shepting pushed a commit to airbnb/buck that referenced this issue Nov 12, 2021
* Update JNA to resolve framework loading issue on Big Sur

java-native-access/jna#1215

* Remove workaround initializing JNA early

java-native-access/jna#652

This issue was resolved and property SIZE does not exist with JNA version 5.6.0.

* Update NuProcess to 2.0.1 for compatibility with JNA

* Build NuProcess with openjdk version "1.8.0_275"

jetty/jetty.project#3244

* Cherry pick a2912b9

facebook@a2912b9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants