Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-19453] Interrupted class loading can lead to NoClassDefFoundError #19

Closed
wants to merge 3 commits into from

Conversation

jglick
Copy link
Member

@jglick jglick commented Nov 8, 2013

JENKINS-19453: interrupting class loading irrecoverably breaks the class being loaded. Filing for evaluation; so far just have a failing test.

@jglick
Copy link
Member Author

jglick commented Nov 8, 2013

I get what I think is the expected stack trace when the new test fails:

Remote call on north failed
java.io.IOException
    at hudson.remoting.Channel.call(Channel.java:731)
    at hudson.remoting.ClassRemotingTest.testInterruption(ClassRemotingTest.java:148)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at junit.framework.TestCase.runTest(TestCase.java:176)
    at junit.framework.TestCase.runBare(TestCase.java:141)
    at junit.framework.TestResult$1.protect(TestResult.java:122)
    at junit.framework.TestResult.runProtected(TestResult.java:142)
    at junit.framework.TestResult.run(TestResult.java:125)
    at junit.framework.TestCase.run(TestCase.java:129)
    at junit.framework.TestSuite.runTest(TestSuite.java:255)
    at junit.framework.TestSuite.run(TestSuite.java:250)
    at junit.framework.TestSuite.runTest(TestSuite.java:255)
    at junit.framework.TestSuite.run(TestSuite.java:250)
    at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84)
    at org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53)
    at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:123)
    at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:104)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:164)
    at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:110)
    at org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:172)
    at org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcessWhenForked(SurefireStarter.java:104)
    at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:70)
Caused by: java.lang.NoClassDefFoundError: Could not initialize class hudson.rem0ting.TestLinkage$A
    at hudson.rem0ting.TestLinkage.call(TestLinkage.java:30)
    at hudson.remoting.UserRequest.perform(UserRequest.java:118)
    at hudson.remoting.UserRequest.perform(UserRequest.java:48)
    at hudson.remoting.Request$2.run(Request.java:328)
    at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)

@jglick
Copy link
Member Author

jglick commented Nov 8, 2013

I feel like the right solution would be that when we catch InterruptedException we also unexport this RemoteClassLoader from Channel.exportedObjects. Unfortunately it seems that the code catching the exception is running on the wrong side of the channel, or at least the channel object is the wrong one.

@jglick
Copy link
Member Author

jglick commented Nov 8, 2013

Tried

diff --git a/src/main/java/hudson/remoting/RemoteClassLoader.java b/src/main/java/hudson/remoting/RemoteClassLoader.java
index 1b11ac4..fcba3f4 100644
--- a/src/main/java/hudson/remoting/RemoteClassLoader.java
+++ b/src/main/java/hudson/remoting/RemoteClassLoader.java
@@ -236,6 +236,7 @@ final class RemoteClassLoader extends URLClassLoader {
                         } catch (IOException x) {
                             throw new ClassNotFoundException(name,x);
                         } catch (InterruptedException x) {
+                            channel.importedClassLoaders.classLoaders.values().remove(this);
                             throw new ClassNotFoundException(name,x);
                         }
                     }

without success.

}
}
});
Thread.sleep(2500);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so that the first two class loads succeed but the third fails. A better test would use semaphores rather than timing (cf. the test before this one).

@jenkinsadmin
Copy link
Member

Thank you for a pull request! Please check this document for how the Jenkins project handles pull requests

@kohsuke
Copy link
Member

kohsuke commented Nov 9, 2013

So the problem here is that if the thread gets interrupted while RemoteClassLoader is loading a class from somewhere else, then it throws ClassNotFoundException, which marks the class as "failed to load" once and for all.

I think the crux of the issue is that, in general, a failed class loading
will never get retried. It is certainly the case when another class loading has triggered a
recurisve class loading as in this test case.

In #19 Jesse proposed to fix
this by dropping the whole RemoteClassLoader. But this will not work
since the said RemoteClassLoader cloud have loaded other classes that
might be already running. Dropping a classloader will not drop these
classes, and we end up just creating another classloader that loads
incompatible classes. That is a disaster waiting to happen.

In this fix, we simply make findClass non-interruptible. That is, if a
blocking remote operation gets interrupted, we catch that and simply
retry and refuse to give up. We'll remember to set the interrupt flag
back on, however, so that the interrupt signal doesn't disappear.

The net effect is as if the delivery of the interrupt was bit late than
usual. The way I see it, this is the only way to fix this.

The nesting of control structures here is horrible, but I can't think of
any better way to write this.
Future<byte[]> img = cr.classImage.resolve(channel, name.replace('.', '/') + ".class");
if (img.isDone()) {
boolean interrupted = false;
try {//
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to add a comment here?

@jglick
Copy link
Member Author

jglick commented Nov 11, 2013

I think your approach is right. Since unfortunately Java class loading is not atomic with respect to static initializers—i.e. the JVM fails to cleanly roll back to the prior state if some classes are loaded but then an initializer blocks loading of one—probably the best we can do is to just doggedly force the class to be loaded.

kohsuke added a commit that referenced this pull request Nov 11, 2013
I think the crux of the issue is that, in general, a failed class loading
will never get retried. It is certainly the case when another class loading has triggered a
recurisve class loading as in this test case.

In #19 Jesse proposed to fix
this by dropping the whole RemoteClassLoader. But this will not work
since the said RemoteClassLoader cloud have loaded other classes that
might be already running. Dropping a classloader will not drop these
classes, and we end up just creating another classloader that loads
incompatible classes. That is a disaster waiting to happen.

In this fix, we simply make findClass non-interruptible. That is, if a
blocking remote operation gets interrupted, we catch that and simply
retry and refuse to give up. We'll remember to set the interrupt flag
back on, however, so that the interrupt signal doesn't disappear.

The net effect is as if the delivery of the interrupt was bit late than
usual. The way I see it, this is the only way to fix this.

The nesting of control structures here is horrible, but I can't think of
any better way to write this.
@kohsuke
Copy link
Member

kohsuke commented Nov 11, 2013

Rebased and integrated toward 2.33.

@kohsuke kohsuke closed this Nov 11, 2013
@jeffret-b jeffret-b deleted the interrupted-loading-JENKINS-19453 branch January 27, 2020 22:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants