Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

FinalizableReferenceQueue keeps ClassLoaders around. #92

Closed
gissuebot opened this Issue Oct 31, 2014 · 110 comments

Comments

Projects
None yet
3 participants
Collaborator

gissuebot commented Oct 31, 2014

Original issue created by nairb774 on 2008-07-28 at 05:39 PM


If the class loader which loaded the google-collections jar is let go
(consider redeploying a war to Apache Geronimo) the class loader will not
be able to be garbage collected. This is caused by the thread started by
FinalizableReferenceQueue never ending. In other words the garbage
collection (for the class loader) can only happen when the thread stops.
The solution is to stop the thread when there are no more objects on the
queue. This will happen eventually if the rest of the app behaves
correctly by not placing state outside of its class loader's environment.
Attached is a version of the class which I have used to this effect.

This is also a problem in the OSGi space where every jar gets its own class
loader and they can be refreshed easily at runtime to do rolling
upgrades/bug fixes.

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by kevinb9n on 2008-07-28 at 07:00 PM


Thanks very much! Bob has agreed to fix this issue. He's decided not to use your
patch, and will fix it a slightly different way, but thanks for going to the extra
effort anyway.


Owner: crazyboblee

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by nairb774 on 2008-07-28 at 07:43 PM


A fix is a fix is a fix... :) There is always more then one way - I just took the
fastest and least complicated way I could think of.

Thanks.

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by mcculls on 2008-11-26 at 05:24 PM


FYI: I just tested the new version of FinalizableReferenceQueue with Guice on an OSGi
container, and I can confirm it now allows the classloader to be garbage collected :)

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by gili.tzabari on 2008-11-27 at 02:31 AM


This isn't solved for me (under Glassfish) but I'm not 100% sure I am interpreting
the results correctly. See
http://groups.google.com/group/google-guice/msg/ff53f95f422edabf

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by mcculls on 2008-11-27 at 03:01 AM


I'd need to know more about the way you're loading Guice into the classloader
hierarchy to comment, and also if you're seeing any warnings or exceptions relating
to the Finalizer thread. What I can say is that it does unload OK under OSGi, which
suggests that something else is keeping the classloader alive in your case.

Note that the Finalizer thread won't shutdown until the Guice classloader is eligible
for GC, so it will appear as a GC root until that happens - you may think it looks
like a leak, but actually it will clear itself once other references to the Guice
classloader are cleared. So I'd dig deeper into the heapdump for other causes.

You may also need to force finalization and GC a few times using the System API.

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by mcculls on 2008-11-28 at 03:14 PM


I did some digging and found the problem - when running under JEE it seems there are
several additional references from the Finalizer thread to the container classloader
via access control contexts (presumably because of additional protection domains).

Here's the complete list of references I found and some possible solutions:

  1. Finalizer -> Thread.inheritedAccessControlContext -> ... -> Web CL

   Set in Thread constructor to 'AccessController.getContext()' ie. the
   context of the calling thread, this may refer to the Web classloader.

   Solution: wrap creation of Finalizer thread in doPrivileged block

  1. URLClassLoader -> URLClassLoader.acc -> ... -> Web CL

   As above URLClassLoader constructor sets 'acc' field to current context.

   Solution: doPrivileged block doesn't help here, the only way I found to
   avoid this reference was to use reflection to null the 'acc' field which
   is yucky, anyone know a better option? Perhaps we could grab defineClass
   from the system classloader (again with reflection) and use that to load
   the Finalizer into the system classloader, but this is also yucky :(

  1. Finalizer -> Thread.inheritableThreadLocals -> ... -> Web CL

   Each thread inherits an inheritableThreadLocals map from its parent, and
   this could include a reference back to the container classloader (depends
   on the app) - again no easy fix for this one, had to resort to reflection
   to null out the field :(

  1. Finalizer -> Thread.contextClassLoader -> Web CL

   Yet another setting inherited from the calling thread, but at least here
   there's a public API 'setContextClassLoader()' we can use to clear it :)

With these references removed I can completely unload Guice from GlassFish v2.

I've attached a patch in case people are interested - but because of the hacky
use of reflection to clear certain private fields, I'm not too happy with it.

I'm wondering instead whether it would be better to provide a public API people
could use to shutdown the thread, which would then make a lot of this code moot.

Anyway, Happy Thanksgiving!

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by crazyboblee on 2008-11-30 at 02:38 AM


Nice detective work, mcculls!

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by mcculls on 2008-12-02 at 09:55 AM


FYI here are the safe fixes that I think should be used in google-collections, such
as the use of doPrivileged() when creating the classloader and the Finalizer thread.
The TCCL should also be cleared, just in case.

The two other references (possible inherited thread-local and access control context)
are best fixed in the web container - certainly the thread-local issue is fixable in
GlassFish. However, still not sure about the remaining access control context issue.

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by crazyboblee on 2009-03-17 at 09:48 PM


We'll address all of this in time for 1.0. I'm in favor of the reflection hacks over
an explicit shutdown hook.

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by kevinb9n on 2009-03-18 at 02:21 AM


(No comment entered for this change.)


Labels: 1.0

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by kevinb9n on 2009-09-17 at 05:45 PM


(No comment entered for this change.)


Labels: Milestone-1.0

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by kevinb9n on 2009-09-17 at 05:46 PM


(No comment entered for this change.)


Labels: -1.0

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by kevinb9n on 2009-10-16 at 09:01 PM


Seems this won't be in 1.0 after all.


Labels: -Milestone-1.0, Milestone-Post1.0

Collaborator

gissuebot commented Oct 31, 2014 edited by cgdecker

Original comment posted by syva...@yahoo.com on 2009-10-26 at 02:06 PM


Could you please reconsider this? Resource leak like this leads to frequent
PermGenErrors and thus to app server restarts, causing developers to need to wait for
several minutes on each restart. Additionally, there are other redeployment issues at
least under Windows because of file-locking (jar files can't be deleted or moved).
Unfortunately there aren't any (usable) workarounds, so it would be really great if
this could be included in 1.0.

Collaborator

gissuebot commented Oct 31, 2014 edited by cgdecker

Original comment posted by alen_vre...@yahoo.com on 2009-11-06 at 06:13 PM


As for workarounds: I've experimented with a ServletContextListener that finds any
Finalizer Threads and clears all the fields that Stuart mentioned.
http://pastie.org/686765 . I've tested for Guice Finalizer under Jetty and Tomcat and
it works.

For Servlet environments afaik ServletContextListener is the only place you're not
discouraged or explicitly forbidden to spawn threads. Therefore I am using a custom
Finalizer that creates just one Finalizer in SCL and makes sure it is disposed at the
end.

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by pseudonymorama on 2009-11-12 at 12:26 AM


Will this cause a resource leak merely by including the jar on the classpath? Or
will it only cause a problem if FinalizableReferenceQueue or classes using it (e.g.
MapMaker) is used/referenced? If the latter is the case, one could presumably work
around the issue by making sure that the class isn't referenced.

Some sort of fix or workaround seems like a fairly critical thing to have, since I
imagine many (most?) people who would want to use the google collections library are
running under JEE. The PermGen issue won't necessarily occur immediately as they
may not redeploy right away, so users may not even know why PermGen errors suddenly
started happening when they redeploy their war files.

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by terciofilho on 2010-04-11 at 09:49 PM


Any news on this issue?

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by kevinb@google.com on 2010-04-12 at 11:43 PM


No. Bob was going to do it but... then he didn't. Bob?

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by pawjanssen on 2010-05-29 at 09:48 PM


Today I received a PermGen error from my beloved Tomcat, for wich the
JreMemoryLeakPreventionListener logged the error message "SEVERE: A web application
appears to have started a thread named [com.google.inject.internal.Finalizer] but has
failed to stop it. This is very likely to create a memory leak.".

After some research, I figured this has to do with the issue addressed here, so I'm
wondering, is it about to be fixed, or is there some API to close the thread started by
Guice?

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by kevinb@google.com on 2010-07-30 at 03:53 AM


(No comment entered for this change.)


Labels: -Milestone-Post1.0

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by kevinb@google.com on 2010-07-30 at 03:56 AM


(No comment entered for this change.)


Labels: -Priority-Medium

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by kevinb@google.com on 2010-09-08 at 07:00 PM


Issue #382 has been merged into this issue.

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by terciofilho on 2010-09-09 at 08:14 PM


No news about this issue? Does anybody have a workaround?

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by kevinb@google.com on 2010-09-10 at 06:36 AM


I talked to Bob about it today. We're considering reading a system property -- shocking, I know -- that would tell MapMaker not to start any background thread (but perform cleanup in the user thread). Using that would make this problem go away for you.


Status: Accepted
Owner: kev...@google.com

Collaborator

gissuebot commented Oct 31, 2014 edited by cgdecker

Original comment posted by syva...@yahoo.com on 2010-09-10 at 06:57 AM


I might be missing something, but couldn't you just add a shutdown method?

The problem is most severe in app servers and often you can't add system properties there, so the workaround wouldn't be very widely applicable.

You could just document the shutdown method as unsupported and not part of public API. Perhaps even advice that it's existence should be checked by reflection (and thus also called by reflection).

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by holger.hoffstaette on 2010-09-10 at 09:08 AM


The suggested solution of using a System property is unfortunately useless for a vast number of deployment scenarios and does nothing to fix the problem. +1 for just not creating a Finalizer thread in a library and instead piggybacking expiry on user-level operations every <hopefully customizable n> operations instead.

Here's another idea that builds on this. How about requiring a user-supplied Executor in which to run instead, as additional argument to the expiration() builder method or the public FinalizerQueue class? This decouples enabled expiry from having to create a physical thread at all, and allows proper lifecycle control by the caller, instead of forcing the library to fly blind and make guesses about what to do. The caller then is not only responsible, but also now actually in a position (!) to do environment-specific lifecycle control: quiesced waiting, forced shutdown, Bundle.stop(), ServletContextListener, whatever.

While we're at it, the same Executor could also be used to invoke the listener, if added by the builder.

One new caveat would be that a user can now share an Executor between multiple CCHMs. If the Executor had really only one or any other fixed number of threads, then too many continuously running-and-blocking-without-timeout Finalizer.run() in their current form might start clogging the Executor. If reference cleaning were batched per CCHM, the run() loop could terminate after doing whatever is necessary.

As far as I can tell this solves the Finalizer thread lifecycle problem (caused really by the nondeterministic behaviour of its ref queue polling and therefore non-termination) and also the thread-per-CCHM overhead (n CCHMs use n Finalizer threads for no really good reason).

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by terciofilho on 2010-09-10 at 01:00 PM


+1 for a shutdown method, this is much better as in my deployment scenario I already have a shutdown for my stuffs and I think that the ContextDestroyed could do this job.

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by kevinb@google.com on 2010-09-10 at 01:48 PM


I'm sorry, if it's true that our intended solution is "useless" and "does nothing to fix the problem" would it be too much trouble to explain why that is?

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by terciofilho on 2010-09-10 at 02:13 PM


Well, I don't think that this solution is useless nor does nothing to fix the problem, I just think that isn't the best way to solve it, just that.

Maybe people upon can explain their reasons about that...

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by holger.hoffstaette on 2010-09-10 at 02:19 PM


Piggybacking on access is not useless (which is what I wrote), System properties are - for the reasons that other people have written as well. You don't know whether the System properties are accessible at all (security), up to date, or have just changed underneath you on the fly. Different applications or even different client classes in the same VM can have different opinions which setting is appropriate for them.

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by eclipseguru on 2010-09-10 at 05:07 PM


Comment #30 made a good point. A system property provides less flexibility if two or more (web) apps are running on the same VM (server). One app might rely on the background processing wheres another is ok with running in the user thread.

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by kevinb@google.com on 2011-01-12 at 08:36 PM


Issue #517 has been merged into this issue.

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by fry@google.com on 2011-01-13 at 07:39 PM


Hi everyone, I know it's been a long time getting this resolved, and I'm committed to dealing with this over the next few months.

One question though: looking at FinalizableReferenceQueue leads me to believe that this whole problem can be avoided by loading Guava in the system class loader. Is this an option for you?

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by yrfselrahc on 2011-01-13 at 08:11 PM


(No comment entered for this change.)


Owner: yrfselrahc

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by cowwoc030 on 2011-01-13 at 08:23 PM


Not really, each webapp should run in its own ClassLoader. You might end up with two different webapps depending on different versions of Guice.

I find the history of this bug quite silly. We could have added a shutdown method back in 2008 and been done with it. If you magically came up with a cleaner solution at a later time you could have deprecated and removed this method. Seeing as this method will be invoked at most once in every application, removing it (when the time comes) should cause minimal disturbance.

Collaborator

gissuebot commented Oct 31, 2014 edited by cgdecker

Original comment posted by blair-ol...@orcaware.com on 2011-01-16 at 07:28 AM


Agreed that Guava should be loaded at the webapp level and not the system class loader.

I'm guessing that the scale that Google operates, you may have one process or web application per system. Smaller deployments will may have many web applications loaded in a single application server, e.g. Tomcat, so here you would want each one to have its own copy of Guava. Or even Google Collections, if they are very old and haven't been worked on to be updated to use Guava. You wouldn't want to be constrained in one web application which Guava version to use in another web application.

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by mcculls on 2011-01-17 at 02:04 PM


FWIW I've put together a potential patch for the related Guice Finalizer issue:

  http://code.google.com/p/google-guice/issues/detail?id=288#c30

It uses an Executor approach instead of a shutdown method because that provides the container with much more flexibility in how it schedules the background work. It also avoids the potential pitfalls of a shutdown method - namely who's responsible for calling it, what happens if more work comes in after the shutdown begins, what happens if the shutdown call blocks, etc. (i.e. same reasons why Thread.stop() is deprecated).

The patch currently uses a system property to configure the executor - while this is not ideal, it does allow people to experiment with the feature without (yet) committing to an API (as this is an implementation detail for Guice it's harder to come up with something as neat as a builder method on MapMaker...).

Experimenting with the various options I think the Executor approach is definitely the way to go - how this is configured is another question. Using something like a system property might be a reasonable compromise if people are loathe to commit to a particular API at the moment.

Collaborator

gissuebot commented Oct 31, 2014 edited by cgdecker

Original comment posted by blair-ol...@orcaware.com on 2011-01-19 at 12:53 AM


We don't use Guice, but Spring for dependency injection, throughout our webapps, so as a little background for us, how would the patch you're working on impact Guava? I saw a mention that Guice includes a copy of portions of Guava?

Since we use Spring, I wouldn't mind a public shutdown method that I can use to properly clean everything up in Guava.

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by mcculls on 2011-01-19 at 09:38 PM


As Holger mentioned back in comment #26, Guava could add selection of the Executor to the MapMaker API which would be better than using System.properties - the basic implementation behind the scenes would be similar to the proposed Guice patch and would work regardless of whether you're using Guice, Spring, or new.

Executors are better than a single static shutdown method for the same reason that Thread.stop is deprecated.

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by kevinb@google.com on 2011-01-27 at 01:33 PM


(No comment entered for this change.)


Labels: Milestone-Release09

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by yrfselrahc on 2011-01-27 at 03:11 PM


(No comment entered for this change.)


Owner: ---

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by fry@google.com on 2011-03-22 at 06:27 PM


(No comment entered for this change.)


Labels: -Milestone-Release09, Milestone-Release10

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by fry@google.com on 2011-03-23 at 01:49 AM


(No comment entered for this change.)


Labels: -Milestone-Release10

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by tj.rothwell on 2011-03-23 at 02:53 AM


@fry does this mean there isn't a solution in sight?

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by fry@google.com on 2011-03-23 at 10:38 AM


It means that I'm tired of changing the milestone. :-)

In practice I have a working solution, but it breaks many internal tests so it will take a while to work through the carnage. I have high hopes for release 10.

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by holger.hoffstaette on 2011-03-23 at 10:54 AM


May I ask that you please share the approach (or the code) with us so that we can complain in advance, not just after it has been committed? :-)

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by fry@google.com on 2011-03-23 at 10:58 AM


Sure, the idea is to have a ReferenceQueue inside of each MapMaker (instead of globally), and to drain it on write operations (or on occasional reads in the absence of writes).

The internal testing issue is that we have tests which assume that simply allowing time to pass will result in cleanup, which will no longer be the case.

Does this approach also sound problematic in your use cases?

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by cowwoc030 on 2011-03-23 at 12:09 PM


Sounds good to me :)

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by holger.hoffstaette on 2011-03-23 at 12:22 PM


Hitching a ride on other operations (just like in lockfree algorithms) should work as long as the Map is occasionally used, and getting rid of the threads & shared queues should fix all the stale/cross classloader issues like web app undepoy leaking. Very nice! Like it.

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by fry@google.com on 2011-05-27 at 11:50 AM


I know it's been a long time coming, but we currently plan to fix this for release 10.

One question, for those involved here: would the cleanup thread as implemented in http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/extra166y/CustomConcurrentHashMap.java?revision=1.17&view=markup pose the same problem described here? I don't have a good sense of the boundaries of this problem.

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by holger.hoffstaette on 2011-05-27 at 12:32 PM


The jsr116y CCHM ReclamationThread itself is not held in a static (good) and instead just seems to float around until the static refqueue is no longer reachable (which will never happen since the JDK has no lifecycles). It also seems to be unstoppable.
This might work OK if it is part of the JDK, giving all CCHMs the same shared refqueue to plug into. However, as far as I can tell this would suffer from the same problems as before when e.g. part of a webapp or bundle because the lifecycle of the thread cannot be controlled, references the classloader and hence keeps it and everything it ever touched around.
This problem is inherently impossible to implement correcty unless either there is no extra thread (which is fine for drain on accumulated read/write but making idle-expiry impossible) or the thread/Executor is injected, with a controllable lifecycle from the outside. The latter solution would allow a client to get idle-expiry either via a client-supplied timer calling expireNow() or periodically doing "dummy reads".

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by cowwoc030 on 2011-05-27 at 01:46 PM


I'm in favor of the following:

No cleanup thread by default, but the users may specify an Executor at Map construction time if he wants idle-expiry. The key point here is that an Executor should be passed into the constructor, not injected statically across the entire JVM (otherwise you get problems in web containers).

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by fry@google.com on 2011-06-08 at 06:16 PM


The fix for this has been submitted. Please let me know if you experience any problems with it.


Status: Fixed
Labels: Milestone-Release10

Collaborator

gissuebot commented Oct 31, 2014 edited by cgdecker

Original comment posted by cow...@bbs.darktech.org on 2011-06-08 at 06:47 PM


Fry,

Can you please explain what you ended up doing for the fix? Also, it's not clear what specific commit revision we should be looking at.

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by fry@google.com on 2011-06-08 at 07:11 PM


The change is http://code.google.com/p/guava-libraries/source/detail?r=445 and it involves doing reference cleanup on user threads rather than relying on a global background thread.

Collaborator

gissuebot commented Oct 31, 2014 edited by cgdecker

Original comment posted by cow...@bbs.darktech.org on 2011-06-08 at 07:36 PM


If I understand the code correctly, you check for unused references when executing map methods (similar to WeakHashMap). An idle map would not clean up its references. Is that correct? If so, I'm okay with this fix.

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by fry@google.com on 2011-06-08 at 07:52 PM


Correct.

Collaborator

gissuebot commented Oct 31, 2014 edited by cgdecker

Original comment posted by d...@greywallsoftware.com on 2011-09-22 at 08:39 PM


I'm still seeing this issue with version 10.0-rc2. Undeploying a webapp in Tomcat still results in the following:

SEVERE: The web application [/app] appears to have started a thread named [com.google.common.base.internal.Finalizer] but has failed to stop it. This is very likely to create a memory leak.

The link in comment #56 is a 404, so I can't see what the changes were. Are others still noticing this problem with Tomcat webapps?

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by fry@google.com on 2011-09-22 at 09:35 PM


Perhaps you are still using the deprecated FinalizableReferenceQueue or Finalizable*Reference?

We no longer use them internally, but if you do you'll pay a price for it...

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by cheriomushnikoff on 2011-10-05 at 10:06 PM


Still noticing with tomcat 7.0.21. I actually explicitly clean the maps when my Spring container stops. So by the time tomcat performs memory leaks validations the maps are empty.

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by yrfselrahc on 2011-10-06 at 01:58 AM


Are you using Guava release 10.0? Because we don't have code that uses FinalizableReferenceQueue any more.

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by joe.j.kearney on 2011-10-07 at 08:12 AM


Be aware that FRQ is no longer deprecated:

http://groups.google.com/group/guava-discuss/browse_thread/thread/fbf9ed43fe3e19ac

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by mcculls on 2011-10-07 at 10:18 AM


Just to clarify: while FRQ is being un-deprecated, it is no longer used in MapMaker or any other collection class.

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by stevenkadams on 2011-11-25 at 03:16 AM


We've started using guava 10.0.1 in a new project, and are specifically using FRQ to help manage certain object reference counts in an application server environment where guava is included as part of application code.

After digging through heap logs in search of classloader memory leaks that occur on application undeployment I've noted that FRQ still leaks because of reasons (1) and (2) given in comment #6 above (posted 3 years ago I might add).

Attached is a patch I wrote against the current source tree, partly based on the patch given in comment #6. With this patch, those stray references from the Finalizer thread to the classloader containing the FRQ class no longer exist, and the classloader becomes eligible for garbage collection.

Incidentally we're also using guice-3.0 which contains a similar (but not identical) implementation of FRQ, and had to patch that in the same way at the same time. Issue 288 in google-guice appears to be pretty much a duplicate of this thread.

Hopefully we can get this thing sorted out once and for all!

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by kevinb@google.com on 2011-11-29 at 03:58 PM


Over to you, Bob.


Status: New
Owner: crazyboblee
Labels: -Milestone-Release10

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by fry@google.com on 2011-12-10 at 03:34 PM


Bob, can you please take a look at this patch? Even better, would you be willing to apply the patch to a git clone that so we could suck it in from there once you were happy with it?


Labels: Package-Base

Collaborator

gissuebot commented Oct 31, 2014 edited by cgdecker

Original comment posted by dpr...@vast.com on 2011-12-27 at 07:58 PM


Is there any activity on this? We have a project using guava that we'd really like to be able to redeploy into tomcat, and this is the last blocking issue for us.

I'm not so much interested in having it out ASAP, but rather just an estimate on when you guys think it'll make it into mainline guava - if it's more than a month or so away, I'm just going to patch guava myself and use that binary.

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by fry@google.com on 2012-01-03 at 01:40 PM


Bob, can you please provide an update on this?

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by kevinb@google.com on 2012-02-16 at 06:32 PM


I don't think Bob is going to fix this. Bob, reopen if you really are going to.


Status: WontFix

Collaborator

gissuebot commented Oct 31, 2014 edited by cgdecker

Original comment posted by blair-ol...@orcaware.com on 2012-02-16 at 07:01 PM


Is closing the ticket the appropriate action, even if Bob will not fix it? The issue is still valid and keeping it open will hopefully inspire somebody to work on it ;)

Collaborator

gissuebot commented Oct 31, 2014 edited by cgdecker

Original comment posted by cow...@bbs.darktech.org on 2012-02-16 at 07:24 PM


I agree. This issue should remain open unless it is fixed or there is a good reason for not fixing it.

Collaborator

gissuebot commented Oct 31, 2014 edited by cgdecker

Original comment posted by dpr...@vast.com on 2012-02-16 at 07:25 PM


I agree - this is a fairly large issue for anybody that uses Gauva in a managed environment.

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by aki.kaivola on 2012-02-16 at 07:35 PM


After following this issue for the last six months we had enough and switched to sisu-guice where this issue has been fixed a long time ago.

Collaborator

gissuebot commented Oct 31, 2014 edited by cgdecker

Original comment posted by dpr...@vast.com on 2012-02-16 at 08:08 PM


I've done this for several years myself, so I know that this is likely never true, but it seems that the fix for this issue is both quite small and already complete. Are there any blockers to just putting it in the next release, and if there are, what can the community to do help alleviate them?

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by fry@google.com on 2012-02-16 at 09:38 PM


Our dilemma here is that we don't have anyone to evaluate and test the patch.

But since there seems to be so much community support for this, let me turn it over to you. If the three of you who just spoke up can confirm that stevenkadams's patch of Nov 24, 2011 will fix this then we'd be glad to push that patch into release 12.


Status: Accepted

Collaborator

gissuebot commented Oct 31, 2014 edited by cgdecker

Original comment posted by dpr...@vast.com on 2012-03-06 at 04:09 PM


I've tested this in our alpha and beta environments, and it works.

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by emcmanus@google.com on 2012-03-09 at 11:11 PM


I'm rather uncomfortable with the proposed patch. In particular calling AccessController.doPrivileged should always give pause for thought and I don't think it is justified in this case. Additionally, the dependency on JDK implementation details of URLClassLoader seems gratuitous. A couple of alternatives seem preferable to me:

(1) Make FinalizableReferenceQueue implement Closeable and have its close() method dispose of the thread. This seems preferable to having a hidden and unstoppable thread that only goes away when exactly the right reference magic happens.

(2) Instead of bashing a private field of URLClassLoader, supply an explicit AccessControlContext argument to AccessController.doPrivileged that excludes the ProtectionDomains whose references are causing the problem. That would at least remove the JDK dependency.

Option (1) seems greatly preferable if it does address the problems people have been seeing.

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by kevinb@google.com on 2012-03-11 at 11:11 PM


One way or another, we should really resolve this for 12.0, so if everyone can get behind (1) or (2), fantastic. It's also possible we may have to go with the patch if that's the solution everyone still wants to converge on.


Owner: ---
Labels: Milestone-Release12

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by terciofilho on 2012-03-12 at 12:05 PM


I'd rather go for (1), the patch seems more like a hack than a solution itself.

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by joe.j.kearney on 2012-03-12 at 12:19 PM


I'd support (1) at least. If you get to the point of worrying that the FRQ won't go away by itself then there should be a way to do it manually. If the present reference magic works in most (all?) other cases then that's great.

I suppose there will be issues with early shutdown of the FRQ racing with subsequently released objects that then won't be finalised. It seems to me that that's not something that can be fixed in this context - rather manage these objects' lifecycles explicitly. So I wouldn't consider that a reason not to provide close().

Collaborator

gissuebot commented Oct 31, 2014 edited by cgdecker

Original comment posted by cow...@bbs.darktech.org on 2012-03-12 at 01:29 PM


The safest approach would be to add (1) and continue working on automatic cleanup. If you ever get it working then you could deprecate and remove (1) in the future. Instead we ended up waiting over six months for the automatic cleanup to materialize and it never really did.

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by zeratul021 on 2012-03-12 at 02:26 PM


I' propose (1) on the same basis as cow, it shouldn't hurt nobody. Automatic cleanup may be resolved at later point but I'd prefer even bullet-dodge solution for now.

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by emcmanus@google.com on 2012-03-12 at 05:20 PM


OK, I will investigate this further. The tough part is writing a test that checks that the FinalizableReferenceQueue class can be unloaded after all of its instances have been closed. That means doing the test in a separate ClassLoader world that is accessed through reflection. A variant of the test should be able to show the problem that the patch we were talking about was trying to fix.

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by emcmanus@google.com on 2012-03-21 at 06:44 AM


I have spent some time investigating the problem and here is what I found.

I wrote the unit test I described, which creates a ClassLoader with a parallel copy of the Guava classes, makes a parallel FRQ, queues a FinalizableWeakReference against that queue, then drops the references to everything and waits for the ClassLoader to be GC'd. What I see is that the ClassLoader is successfully GC'd if there is no SecurityManager, but not if there is. Does this correspond to what people are seeing?

I abandoned the idea of adding a close() method because it did not help. Even in the SecurityManager case, I see that the FRQ instance is successfully GC'd and its associated Finalizer thread correctly terminates. I am not seeing a problem with instances of anything, but only with classes. There are two problems, in fact. One is the problem that the earlier patch fixes through reflection, namely that the URLClassLoader used to load the Finalizer class has a reference back to its creator ClassLoader through its captured AccessControlContext. But given that the FRQ instance has successfully been GC'd and its associated Finalizer thread has been terminated, this should not prevent class unloading because there are no longer any instances of classes from the Finalizer loader and no GC root should be referencing the classes themselves or their ClassLoader. So if nothing outside is referencing either the Guava loader or the Finalizer loader, it doesn't matter if the latter references the former. And that is exactly what I see when there is no SecurityManager.

The second problem I see occurs when there is a SecurityManager. java.lang.Thread has a field subclassAudits that is a "SoftCache". There have been memory leaks associated with this SoftCache before, for example with ResourceBundle and ObjectOutputStream. These have been fixed, but the Thread one has not. The SoftCache remembers the result of reflection on subclasses of Thread, to determine whether they override certain security-sensitive methods. It could and should be a WeakHashMap but it is not. So its entries are only cleared when the GC is strongly motivated to do so under memory pressure. That is typically not going to happen at the point where you want your webapp to be unloaded.

The subclassAudits business only happens when there is a SecurityManager, which explains why my test fails only in that case. Since Finalizer is a subclass of Thread, it is being remembered in the subclassAudits cache, and that is what is preventing FInalizer's ClassLoader from being GC'd. Because of the AccessControlContext problem that also prevents Guava's ClassLoader from being GC'd. Notice that even if we hack away the AccessControlContext problem, the Finalizer ClassLoader will still leak.

Fortunately there is a very simple fix to the subclassAudits problem. There is no reason why Finalizer has to be a subclass of Thread. It can just be a Runnable, and make a plain java.lang.Thread. (That is recommended practice anyway.) Plain java.lang.Thread does not have to be audited for method overrides so nothing is added to subclassAudits.

With that change my test passes even with a SecurityManager.

I'm attaching a patch with the changes to FRQ, Finalizer, and FRQTest. I would be very interested in hearing whether it solves the problems people have been seeing.

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by mcculls on 2012-03-21 at 07:31 AM


Another benefit of Runnable (over extending Thread) is that it enables the use of custom Executors:

http://code.google.com/p/google-guice/issues/attachmentText?id=288&aid=-559927296892959590&name=GUICE_ISSUE_288_20110119.patch&token=Jv9dV_1_UgCI2Zb_7oQVxj46gp8%3A1332314277994

which lets you truly decouple the execution of the background thread from the library.

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by emcmanus@google.com on 2012-03-21 at 11:19 PM


Interesting! Although I think my patch is the Simplest Thing That Could Possibly Work for this issue, we could later consider extending it, perhaps with an Executor parameter to FRQ rather than a system property. Were you able to observe whether your change alone was enough to fix the issue here?

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by fry@google.com on 2012-03-22 at 03:42 PM


Wow Eamonn, terrific detective work!

We're trying to cut the first RC for Guava 12, so unless we receive more feedback today I'd favor getting your patch submitted.

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by mcculls on 2012-03-22 at 04:08 PM


Following up #c88 ... I also prefer simple solutions, and while I suspect there may be a few rare corner-cases where even just using "new Thread(...)" can cause the spawned thread to inherit/leak context from the calling application/container, implementing runnable instead of extending thread should solve all the common issues people experience with FRQ. So +1 from me.

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by cpovirk@google.com on 2012-03-24 at 03:16 AM


b5b4b40


Status: Fixed

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by kevinb@google.com on 2012-03-26 at 09:17 PM


To all: confirmation that this will solve your problem would be appreciated before the 12.0 release.

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by archie.cobbs on 2012-05-17 at 08:05 PM


Sorry this report is not before the 12.0 release, but I am still seeing this problem in 12.0:

  May 17, 2012 3:02:54 PM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads
  SEVERE: The web application [/demo3] appears to have started a thread named [com.google.common.base.internal.Finalizer] but has failed to stop it. This is very likely to create a memory leak.

I'm using tomcat 7.0.21.

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by jens.nehlmeier on 2012-05-23 at 04:14 PM


We are using Jetty 8.1.3 along with BoneCP which requires Guava. Even with Guava 12.0 the Finalizer thread does not stop running and prevents the garbage collection of the WebAppClassloader.

Collaborator

gissuebot commented Oct 31, 2014

Original comment posted by fry@google.com on 2012-05-23 at 06:41 PM


(No comment entered for this change.)


Status: Accepted
Owner: emcmanus@google.com

Collaborator

gissuebot commented Nov 1, 2014

Original comment posted by techsy730 on 2012-07-09 at 06:42 PM


I can confirm this in Guava 12.0 on Sun's JDK 1.7.0_04 on a Windows 7 box.

I think I have traced this down to the thread that runs the com.google.common.base.internal.Finalizer.

Despite great pains to make sure that the Finalizer and its spawned thread use independant classloaders from the app's classloaders , it's contextClassLoader is still holding a reference to the module's classloader.

Collaborator

gissuebot commented Nov 1, 2014

Original comment posted by emcmanus@google.com on 2012-07-10 at 08:04 PM


I've spent some more time investigating what happens with Tomcat specifically. If the FinalizableReferenceQueue is in a static field of some class in a webapp, then indeed the Finalizer thread never dies and the ClassLoader of the webapp is never garbage-collected. The reason is that the separate ClassLoader that is used to load the Finalizer class contains a reference to the original webapp ClassLoader in its saved AccessControlContext. That ClassLoader then keeps all the static fields of all of its classes alive, and in particular the FRQ. The Finalizer thread is set up to die as soon as its FRQ is garbage-collected but that never happens.

This can be fixed as previously suggested, by bashing the private URLClassLoader.acc field to null, but I think the cure is worse than the disease, and in particular could open security holes in environments with untrusted code.

In my tests, it is enough for the webapp just to set its static FRQ field to null and then the Finalizer thread will die when the no-longer-referenced FRQ is collected, and the webapp ClassLoader becomes collectible at that point. (In Tomcat, you can still get the alarming log message about the Finalizer thread still being present, but that is only because the FRQ hasn't been collected yet.)

I would be interested to know whether people who have run into this problem are able to work around it in a similar way.

I think we should probably add, as suggested earlier, a FinalizableReferenceQueue.close() instance method that explicitly stops the Finalizer thread, along with a test that this is enough to make the ClassLoader collectible.

Collaborator

gissuebot commented Nov 1, 2014

Original comment posted by archie.cobbs on 2012-07-10 at 08:14 PM


Hmm... in my case, I don't think I have any such static fields, at least none that directly reference a FRQ. There may be some indirect reference being created somehow.

FYI, here are all the guava classes that I'm explicitly importing:

import com.google.common.base.Function;
import com.google.common.base.Predicate;
import com.google.common.base.Predicates;
import com.google.common.cache.Cache;
import com.google.common.cache.CacheBuilder;
import com.google.common.cache.CacheLoader;
import com.google.common.cache.LoadingCache;
import com.google.common.collect.BiMap;
import com.google.common.collect.Collections2;
import com.google.common.collect.DiscreteDomains;
import com.google.common.collect.ImmutableBiMap;
import com.google.common.collect.Iterators;
import com.google.common.collect.MapMaker;
import com.google.common.collect.Ranges;
import com.google.common.util.concurrent.UncheckedExecutionException;

Also, I'm using BoneCP as well.. perhaps that is where the reference originates.

Collaborator

gissuebot commented Nov 1, 2014 edited by cgdecker

Original comment posted by cow...@bbs.darktech.org on 2012-07-10 at 08:15 PM


Consider how much time has been wasted on this issue (by developers and end-users) when we could have just added a close() method to begin with. Please end the pain :)

Collaborator

gissuebot commented Nov 1, 2014

Original comment posted by techsy730 on 2012-07-10 at 08:22 PM


I worked around it by making my static reference to the FinalizibleReferenceQueue a weak reference. Then when I create each of my FinalizibleReferences, I have them keep a strong reference to the FinalizibleReferenceQueue that they are registerered with. When finalizeReferent() is called, that reference is nulled out. Thus, once all finalizible references are finalized, the FinalizibleReferenceQueue only has a weak reference, thus can be GCed, and thus the thread stopped.

I also have a method to fetch the FinalizibleReferenceQueue, where if the previous reference one collected (and thus, the reference cleared), it will create a new FinalizibleReferenceQueue, give that to the newly created FinalizibleReference, and then create a new WeakReference to that FinalizibleReferenceQueue and update my static reference to that. Moderately elegant. ;)

Something like that could probably be moved to inside the FinalizibleReferenceQueue (which the Finalizer thread will "listen" for its collection, as opposed to the FinalizibleReferenceQueue itself like it is now), and have each of the three FinalizibleReference reference implementations keep a hard reference, that is cleared upon the Finalizer "finalizing" it.

Collaborator

gissuebot commented Nov 1, 2014

Original comment posted by emcmanus@google.com on 2012-07-10 at 09:05 PM


The suggestion from techsy730 is very interesting. The idea would be that the Finalizer thread would die, not when the FRQ is no longer referenced as today, but when there are no longer any FinalizableWeakReferences (etc) that reference the FRQ. Perhaps with some amount of hysteresis to avoid continually creating and destroying Finalizer threads (we would get that more or less for free using ThreadPoolExecutor). Making this work in a race-proof way could be challenging but I think the end result would be substantially better than what we have today.

Collaborator

gissuebot commented Nov 1, 2014

Original comment posted by techsy730 on 2012-07-10 at 09:29 PM


Using a SoftReference instead of a WeakReference for the "handle" in the FinalizibleReferenceQueue may help with excessive amounts of stopping the thread just to have to create a new one again, but it carries its own risks. According to the API, a perfectly valid implementation of SoftReference would be to not clear it at all until the absolute latest the API mandates it be considered, in the final "Last ditch" GC before a OutOfMemoryError is thrown. Of course no sane JVM implementation will always wait that long (most use some sort of "timeout" after the last reference was used), but we have to consider that possibility.

Maybe a better idea would be an "expiring" reference. An object that holds a strong and a weak reference until time X passes, after which is nulls out the strong reference and only holds the weak reference. In fact, that would be a pretty nifty wrapper for existing references. (Contemplates making a new feature suggestion)

As for the thread safety, I did it by just having a private static lock in the class holding the static WeakReference, and all fetches and initilizations of that WeakReference were synchronized on that. For each of my FinalizibleReferences, I made the field holding the strong reference volatile. I'm pretty reasonably sure this covers all angles.

Now, since in my workaround, I am creating whole new FinalizibleReferenceQueue's, I don't have to care about the thread safety of spawning the new thread using the fancy ClassLoader and reflection magic that is done, as it's a whole new instance. If this sort of reinitializable weak handle approach was brought into FinalizibleReferenceQueue, then the fancy Java magic would have to be made thread safe. But I agree that it would be worth it.

(Actually, a reinitializable weak reference and a reinitializable soft reference are also cool idea, maybe another for the feature suggestions)

Collaborator

gissuebot commented Nov 1, 2014

Original comment posted by emcmanus@google.com on 2012-08-01 at 03:15 PM


I've added a close() method for now (and made FRQ implement Closeable). Even if we later add a mechanism that closes automatically when there are no FinalizableReferences left, the close() method will still be useful for when it is not practical to dispose of all the referenced objects explicitly.

Collaborator

gissuebot commented Nov 1, 2014

Original comment posted by jens.nehlmeier on 2012-08-01 at 03:52 PM


So does this mean that every library that uses Guava's FinalizableReferenceQueue has to expose a close() method itself because the library may not know that it is used inside a web application and have no notion about redeployments?
For example would the close() method help to fix Google Guice's issue #288 (http://code.google.com/p/google-guice/issues/detail?id=288) which is pretty much the same as this one?

Collaborator

gissuebot commented Nov 1, 2014

Original comment posted by archie.cobbs on 2012-08-01 at 03:57 PM


To address a WAR memory leak caused by this bug, just add a ServletContextListener that uses reflection to invoke the close() method (if found by reflection).

See Spring's IntrospectorCleanupListener for example of this idea applied to the Introspector class: http://static.springsource.org/spring/docs/3.1.x/javadoc-api/org/springframework/web/util/IntrospectorCleanupListener.html

Collaborator

gissuebot commented Nov 1, 2014

Original comment posted by mcculls on 2012-08-01 at 04:18 PM


Wrt. Guice issue 288 the only fix required is to update it to use Guava 10.0 or later, which is when the weak/soft maps created by MapMaker stopped using the FinalizableReferenceQueue. The change to use a more recent build of Guava has already been committed to Guice trunk (it's currently using 11.0.1) and will be in the next release.

Collaborator

gissuebot commented Nov 1, 2014

Original comment posted by emcmanus@google.com on 2012-08-01 at 04:41 PM


You don't need to use reflection to invoke the close() method even if you are concerned about compatibility with earlier Guava versions. Just write
  if (frq instanceof Closeable) {
    ((Closeable) frq).close();
  }

Collaborator

gissuebot commented Nov 1, 2014

Original comment posted by jerith666 on 2013-08-16 at 05:57 PM


issue 1505 may or may not be relevant here

Collaborator

gissuebot commented Nov 1, 2014

Original comment posted by emcmanus@google.com on 2013-11-20 at 11:23 PM


Marking this as Fixed since it is possible to avoid the problem by calling close() at the appropriate time. We do not currently plan to implement a mechanism that closes automatically when there are no FinalizableReferences left.


Status: Fixed

Collaborator

gissuebot commented Nov 1, 2014

Original comment posted by Karsten.Ohme on 2014-03-28 at 02:33 PM


For Guava 13.0.1 (which is a requirement for BoneCP) I had to modify the fix from comment #15:

http://pastie.org/8976099
Insteas of thread.getClass().getName(); I had to use thread.getName();

Also important is, that this must not be proxied by aspectj. The method was not invoke here if this was the case.

Collaborator

gissuebot commented Nov 1, 2014 edited by cgdecker

Original comment posted by cow...@bbs.darktech.org on 2014-03-28 at 02:37 PM


I suggest opening a new bug report that references this one because most committers will ignore comments on closed issues.

Collaborator

gissuebot commented Nov 1, 2014

Original comment posted by emcmanus@google.com on 2014-03-28 at 04:43 PM


I believe comment #110 is a note about a workaround that is needed if you have to use an old version of Guava, correct? Obviously we can't modify old Guava versions retroactively. If there is something that needs to be done to the current version then by all means open a new issue.

@gissuebot gissuebot closed this Nov 1, 2014

@cgdecker cgdecker modified the milestone: 12.0 Nov 1, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment