Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-28844] Fix memory leak #54

Merged
merged 12 commits into from Jul 13, 2015
Merged

[JENKINS-28844] Fix memory leak #54

merged 12 commits into from Jul 13, 2015

Conversation

@stephenc
Copy link
Member

stephenc commented Jun 10, 2015

See JENKINS-28844

  • In larger installations of Jenkins the complex cycles of object references containing classloaders containing references to the channel which contains references back to the classloaders can confuse the garbage collector.
  • The confusion arrises because classloaders are considered live while there are objects from the classloader live
    and the RemoteClassLoader and RemoteInvocationHandler's proxy both need a reference to the Channel
    which typically can be storing either some of the objects from the RemoteClassLoader or
    the RemoteInvocationHandler's proxy instances in the Channel properties. Thus although the entire subgraph
    is no longer live, the cycle is not detected as ClassLoaders are treated separately. The issue is worse in
    Java 7 where the ClassLoaders are on the PermGen heap though it is also present in Java 8.
  • Typically multiple out of memory errors can cause the cycle to ultimately get cleaned up as one half
    gets cleared out breaking the cycle enough so that a subsequent out of memory removes the second half leaving
    a third out of memory to clear out the ClassLoaders... but after three out of memory errors in a row, Jenkins
    itself is typically well dead.

This PR (I believe) fixes these issues:

  • We apply Wheeler's theorem to
    the Channel references. This allows us to clear out the reference instance as soon as the Channel is terminated
    thus simplifying the work of the garbage collector.
  • We clear out some unnecessary Channel references that were hiding in ThreadLocals
  • We move from using an Object.finalize() implementation to a PhantomReference implementation of the
    unexporting operation. In my experiments the call to Object.finalize() on a running Jenkins instance would
    very often get postponed to almost never, and more critically, the presence of a finalizer means that all
    instances of RemoteInvocationHandler were put in the finalizer queue as opposed to only those that
    need to be tracked for auto unexporting. We can (and do) proactively remove the references from the
    reference queue as soon as the channel is closed, further reducing the work that is required by the
    garbage collector.
  • Some other findbugs related fixes and adding a thread name since I copied the naming thread factory to provide a name for the unexporter thread

@reviewbybees (who may want to know that this references ZD-26018)

stephenc added 8 commits May 7, 2015
- In larger installations of Jenkins the complex cycles of object references containing classloaders containing references to the channel which contains references back to the classloaders can confuse the garbage collector.
- The confusion arrises because classloaders are considered live while there are objects from the classloader live
  and the RemoteClassLoader and RemoteInvocationHandler's proxy both need a reference to the Channel
  which typically can be storing either some of the objects from the RemoteClassLoader or
  the RemoteInvocationHandler's proxy instances in the Channel properties. Thus although the entire subgraph
  is no longer live, the cycle is not detected as ClassLoaders are treated separately. The issue is worse in
  Java 7 where the ClassLoaders are on the PermGen heap though it is also present in Java 8.
- Typically multiple out of memory errors can cause the cycle to ultimately get cleaned up as one half
  gets cleared out breaking the cycle enough so that a subsequent out of memory removes the second half leaving
  a third out of memory to clear out the ClassLoaders... but after three out of memory errors in a row, Jenkins
  itself is typically well dead.

This commit (I believe) fixes these issues:
- We apply [Wheeler's theorem](http://en.wikipedia.org/wiki/Fundamental_theorem_of_software_engineering) to
  the Channel references. This allows us to clear out the reference instance as soon as the Channel is terminated
  thus simplifying the work of the garbage collector.
- We move from using an Object.finalize() implementation to a PhantomReference implementation of the
  unexporting operation. In my experiments the call to Object.finalize() on a running Jenkins instance would
  very often get postponed to almost never, and more critically, the presence of a finalizer means that all
  instances of RemoteInvocationHandler were put in the finalizer queue as opposed to only those that
  need to be tracked for auto unexporting. We can (and do) proactively remove the references from the
  reference queue as soon as the channel is closed, further reducing the work that is required by the
  garbage collector.
@jenkinsadmin

This comment has been minimized.

Copy link
Member

jenkinsadmin commented Jun 10, 2015

Thank you for a pull request! Please check this document for how the Jenkins project handles pull requests

@jglick

This comment has been minimized.

Copy link
Member

jglick commented Jun 11, 2015

The test build got aborted for some reason. Timeout? Other cancellation?

@jglick jglick changed the title [FIXED JENKINS-28844] Fix memory leak [JENKINS-28844] Fix memory leak Jun 11, 2015
@jglick

This comment has been minimized.

Copy link
Member

jglick commented Jun 11, 2015

BTW removing FIXED annotation (which may not be honored by the JIRA link daemon anyway) since an actual fix involves integrating a new remoting release to core, so the issue should stay open until that happens.

* these typically arrise when an {@link #export(Class, Object)} is {@link #setProperty(Object, Object)}
* (a supported and intended use case), the {@link Ref} allows us to break the object cycle on channel
* termination and simplify the circles into chains which can then be collected easily by the garbage collector.
* @since FIXME after merge

This comment has been minimized.

Copy link
@jglick

jglick Jun 11, 2015

Member

No @since needed on non-API members. Anyone doing code archaeology should just git-blame.

@@ -814,6 +829,10 @@ public void terminate(IOException e) {
synchronized (this) {
if (e == null) throw new IllegalArgumentException();
outClosed = inClosed = e;
// we need to clear these out early in order to ensure that a GC operation while
// proceding with the close does not result in a batch of UnexportCommand instances

This comment has been minimized.

Copy link
@jglick
@@ -1484,4 +1516,77 @@ public static Channel current() {
// to avoid situations like this, create proxy classes that we need during the classloading
jarLoaderProxy=RemoteInvocationHandler.getProxyClass(JarLoader.class);
}

private static class ThreadLastIoId extends ThreadLocal<int[]> {

This comment has been minimized.

Copy link
@jglick

jglick Jun 11, 2015

Member

IIRC this refactoring was to avoid a $this reference from an anonymous inner class, right? Might be worth it to mention that in Javadoc.

* @since FIXME after merge
* @see #reference
*/
/*package*/ static final class Ref {

This comment has been minimized.

Copy link
@jglick

jglick Jun 11, 2015

Member

BTW this whole class could probably be replaced with AtomicReference<Channel>, at the loss of a little clarity in naming.

* Thread factory that sets thread name so we know who is responsible for so many threads being created.
* @since FIXME after merge
*/
public class NamingThreadFactory implements ThreadFactory {

This comment has been minimized.

Copy link
@jglick

jglick Jun 11, 2015

Member

I wonder if we should create a utilities module that can be shared with Jenkins core.

This comment has been minimized.

Copy link
@stephenc

stephenc Jun 15, 2015

Author Member

Yep... or maybe just have Jenkins use this one directly now that it exists

This comment has been minimized.

Copy link
@kohsuke

kohsuke Jul 13, 2015

Member

I'm more inclined just to hide it so that Jenkins core wouldn't start accidentally relying on this. Minor point.

@@ -108,15 +133,43 @@
// if the type is a JDK-defined type, classloader should be for IReadResolve
if(cl==null || cl==ClassLoader.getSystemClassLoader())
cl = IReadResolve.class.getClassLoader();
return type.cast(Proxy.newProxyInstance(cl, new Class[]{type,IReadResolve.class},
new RemoteInvocationHandler(channel,id,userProxy,autoUnexportByCaller,type)));
RemoteInvocationHandler handler = new RemoteInvocationHandler(channel, id, userProxy, autoUnexportByCaller, type);

This comment has been minimized.

Copy link
@jglick

jglick Jun 11, 2015

Member

It is legal to pass a null channel to this constructor?

This comment has been minimized.

Copy link
@stephenc

stephenc Jun 15, 2015

Author Member

I couldn't find a case to prove it wasn't. We are dealing with remoting which will involve a network round-trip, so we are not CPU cycle critical and safer is better

if (async) channel.callAsync(req);
else return channel.call(req);
if (async) channel().callAsync(req);
else return channel().call(req);

This comment has been minimized.

Copy link
@jglick

jglick Jun 11, 2015

Member

And it it is null? Maybe there should be a channelOrFail method throwing IOException?

/**
* Our executor service, we use at most one thread for all {@link Channel} instances in the current classloader.
*/
private final ExecutorService svc = new AtmostOneThreadExecutor(

This comment has been minimized.

Copy link
@jglick

jglick Jun 11, 2015

Member

Should this be static?

This comment has been minimized.

Copy link
@stephenc

stephenc Jun 15, 2015

Author Member

No need because UNEXPORTER is static. This leaves us with just the one static to track

}
inQueue.set(false); // we have started execution and running is true, so queued can be reset
try {
while (!referenceLists.isEmpty()) {

This comment has been minimized.

Copy link
@jglick

jglick Jun 11, 2015

Member

Too tricky to follow easily, so just trusting this is all correct…

This comment has been minimized.

Copy link
@stephenc

stephenc Jun 15, 2015

Author Member

395-426 gets one reference from the queue and cleans it up. 429-439 scans for any channels that have closed and removes their objects from even the phantom reference queue thus making it super easy for the GC to collect them... though having looked at that code in the cold light of day I now want to wrap the second one in a periodic run basis rather than every loop in case there are a large number of references cleared when there are many channels open

@jglick

This comment has been minimized.

Copy link
Member

jglick commented Jun 11, 2015

Minor comments only, 👍 nice work!

@@ -158,7 +158,7 @@ public void setConnectTo(String target) {
System.err.println("Illegal parameter: "+target);
System.exit(1);
}
connectionTarget = new InetSocketAddress(tokens[0],Integer.valueOf(tokens[1]));
connectionTarget = new InetSocketAddress(tokens[0],Integer.parseInt(tokens[1]));

This comment has been minimized.

Copy link
@oleg-nenashev

oleg-nenashev Jul 13, 2015

Member

Makes sense to process NumberFormatException and exit as well

@oleg-nenashev

This comment has been minimized.

Copy link
Member

oleg-nenashev commented Jul 13, 2015

The PR description has @reviewbyees, but should contain @reviewbybees
🐝 BTW

@reviewbybees

This comment has been minimized.

Copy link

reviewbybees commented Jul 13, 2015

This pull request originates from a CloudBees employee. At CloudBees, we require that all pull requests be reviewed by other CloudBees employees before we seek to have the change accepted. If you want to learn more about our process please see this explanation.

kohsuke added a commit that referenced this pull request Jul 13, 2015
[JENKINS-28844] Fix memory leak
@kohsuke kohsuke merged commit bf81838 into master Jul 13, 2015
1 check passed
1 check passed
Jenkins This pull request looks good
Details
@kohsuke kohsuke deleted the fix-memory-leak branch Jul 13, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.