Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Object cleaner implementation detail leaks #8017

Closed
jasontedor opened this issue Jun 12, 2018 · 13 comments
Closed

Object cleaner implementation detail leaks #8017

jasontedor opened this issue Jun 12, 2018 · 13 comments
Milestone

Comments

@jasontedor
Copy link
Contributor

jasontedor commented Jun 12, 2018

This is a follow on to the discussion in #8014. The problem here is that the object cleaner thread is considered an internal implementation detail yet its details can leak.

In Elasticearch, we have a thread leak control that detects threads created by a test suite but are not stopped at the end of the test suite. This is part of broader detection in our test framework for leaked resources of any kind.

The problem here is that because the object cleaner thread can not be stopped or otherwise controlled, it fails our thread leak control as it will always be running in nearly any test suite that interacts with Netty.

We tried to upgrade to Netty 4.1.25 and it was quite a mess because of this thread: elastic/elasticsearch#31232. We tried putting a filter in place on relevant test suites but we ultimately decided the surface area here is too large and this is gross anyway, especially without any public API guarantees about the thread. We have ultimately decided to revert this upgrade (elastic/elasticsearch#31282) and are now considering ourselves blocked on upgrading until a better solution to the object cleaner thread problem can be found.

We are looking for a way to shutdown the cleaner (ideally by adding something to our shutdown code (HTTP, TCP) so that all tests pick it up and it's not as messy as needing to filter the thread from leak control everywhere).

Alternatively, a way to avoid the use of the object cleaner would be acceptable too.

@normanmaurer
Copy link
Member

@mosesn @bryce-anderson @ejona86 @carl-mastrangelo @trustin I would be interested in your take related to this one...

@carl-mastrangelo
Copy link
Member

Some ideas: maybe the ThreadFactory for the cleaner can be provided by the user?

@ejona86
Copy link
Member

ejona86 commented Jun 12, 2018

Has this become an issue now because Recycler was converted to use ObjectCleaner to remove the finalizer? It doesn't sound like this is a case of FastThreadLocal needing

Unfortunately there is no limit to the amount of time the thread may remain alive, because it requires a GC to be able to exit.

Brainstorm:

  • Use Java's Cleaner when it is available. This avoids the thread creation. Unfortunately, elasticsearch and others experiencing this problem would need to use Java 9+ for its tests
  • Continue using finalizer in Recycler. Maybe with the release of Java 11 we could do Cleaner+ObjectCleaner fallback
  • (Ignoring the Recycler usage) Make ObjectCleaner Thread-aware or Thread-specific. Have an API or choose "good times" to loop through the list of threads and prune any that are stopped, but have not been GC'ed. Because there may still be live references, there is a potential for it to interact strangely. But for the FastThreadLocal usage it seems fine
  • handwavy change the approach of Recycler to avoid the need for the ObjectCleaner. For example, it could allow all the objects to be collected each GC, but reuse any created objects between GCs. Or maybe Recycler becomes disabled for non-Netty threads and... that helps us somehow... (I haven't quite worked out why Recycler does all the things it does.)

@jasontedor
Copy link
Contributor Author

It is not the recycler, we disable that in our default configuration and if we disable it in our tests it still fails thread leak control. Instead I think that it is from the fast thread locals.

@re-thc
Copy link

re-thc commented Jun 14, 2018

FastThreadLocal does use Cleaner. Not sure if I understand it all correctly, but isn't there a cyclic dependency between FastThreadLocal / InternalThreadLocalMap and Cleaner? Meaning the Cleaner task / thread for FastThreadLocal / InternalThreadLocalMap will never actually complete.

@ejona86
Copy link
Member

ejona86 commented Jun 21, 2018

I just got bit by something similar. In our internal tests we prevent certain types of tests from making threads to make them more reliable and make sure they run quickly. But using EmbeddedChannel creates a Thread because the DefaultChannelPipeline uses a FastThreadLocal for generating a name. This specific problem could be handled by deferring an ObjectCleaner.register() until the FastThreadLocal actually needs the notification, but it's not clear that's the best approach. I can split out an issue if need be, but it seems to show how hard it is to avoid ObjectCleaner in tests.

Seen with Netty 4.1.24.Final:

...snip...
	at java.lang.Thread.<init>(Thread.java:464)
	at io.netty.util.concurrent.FastThreadLocalThread.<init>(FastThreadLocalThread.java:35)
	at io.netty.util.internal.ObjectCleaner.register(ObjectCleaner.java:103)
	at io.netty.util.concurrent.FastThreadLocal.registerCleaner(FastThreadLocal.java:163)
	at io.netty.util.concurrent.FastThreadLocal.get(FastThreadLocal.java:147)
	at io.netty.channel.DefaultChannelPipeline.generateName(DefaultChannelPipeline.java:416)
	at io.netty.channel.DefaultChannelPipeline.filterName(DefaultChannelPipeline.java:300)
	at io.netty.channel.DefaultChannelPipeline.addLast(DefaultChannelPipeline.java:210)
	at io.netty.channel.DefaultChannelPipeline.addLast(DefaultChannelPipeline.java:409)
	at io.netty.channel.DefaultChannelPipeline.addLast(DefaultChannelPipeline.java:396)
	at io.netty.channel.embedded.EmbeddedChannel.setup(EmbeddedChannel.java:195)
	at io.netty.channel.embedded.EmbeddedChannel.<init>(EmbeddedChannel.java:167)
	at io.netty.channel.embedded.EmbeddedChannel.<init>(EmbeddedChannel.java:148)
	at io.netty.channel.embedded.EmbeddedChannel.<init>(EmbeddedChannel.java:135)
	at io.netty.channel.embedded.EmbeddedChannel.<init>(EmbeddedChannel.java:100)
...snip...

@vkostyukov
Copy link
Contributor

If (explicitly) shutting down a cleaner thread is a reasonable short-term solution to mitigate this, maybe just storing a static reference to a "currently running thread" from within ObjectCleaner is enough?

I totally agree, though, that it's a long-term solution should probably allow users to opt-out from running cleaners threads in their tests/apps.

@normanmaurer
Copy link
Member

Sorry I was busy with some other things at work so I had no time to think about this yet.... I will think a bit about this over the next days and then hopefully find a good solution next week (crossing fingers). That said I am not sure yet how to handle this atm as without the cleaner we may leak direct memory.

ejona86 added a commit to ejona86/netty that referenced this issue Jun 25, 2018
Motivation:

As discussed in netty#8017, ObjectCleaner leaks into the rest of the environment and
interacts with checks in some testing environments.

Modifications:

Avoid using ObjectCleaner in FastThreadLocal when the extending class is
ignoring removal notifications.

Result:

Reduced usage of ObjectCleaner, and when combined with disabling Recycler, may
allow tests to run without ObjectCleaner being used.
normanmaurer added a commit that referenced this issue Jun 27, 2018
Motivation:

ObjectCleaner does start a Thread to handle the cleaning of resources which leaks into the users application. We should not use it in netty itself to make things more predictable.

Modifications:

- Remove usage of ObjectCleaner and use finalize as a replacement when possible.
- Clarify javadocs for FastThreadLocal.onRemoval(...) to ensure its clear that remove() is not guaranteed to be called when the Thread completees and so this method is not enough to guarantee cleanup for this case.

Result:

Fixes #8017.
normanmaurer added a commit that referenced this issue Jun 27, 2018
Motivation:

ObjectCleaner does start a Thread to handle the cleaning of resources which leaks into the users application. We should not use it in netty itself to make things more predictable.

Modifications:

- Remove usage of ObjectCleaner and use finalize as a replacement when possible.
- Clarify javadocs for FastThreadLocal.onRemoval(...) to ensure its clear that remove() is not guaranteed to be called when the Thread completees and so this method is not enough to guarantee cleanup for this case.

Result:

Fixes #8017.
normanmaurer added a commit that referenced this issue Jun 28, 2018
Motivation:

ObjectCleaner does start a Thread to handle the cleaning of resources which leaks into the users application. We should not use it in netty itself to make things more predictable.

Modifications:

- Remove usage of ObjectCleaner and use finalize as a replacement when possible.
- Clarify javadocs for FastThreadLocal.onRemoval(...) to ensure its clear that remove() is not guaranteed to be called when the Thread completees and so this method is not enough to guarantee cleanup for this case.

Result:

Fixes #8017.
@normanmaurer
Copy link
Member

@jasontedor this should be resolved by #8064 and #8062. To make it short we do not use the ObjectCleaner anymore. We may looking into using Cleaner once we use Java9+ as minimum (whenever this happens is another question, and will not happen before a major version bump).

@normanmaurer normanmaurer added this to the 4.1.26.Final milestone Jun 28, 2018
@jasontedor
Copy link
Contributor Author

Thanks @normanmaurer. We have successfully upgraded to 4.1.28.Final with none of the issues we had previously. 🙏

@jesty
Copy link

jesty commented Jan 30, 2019

Hello, we are using 4.0.56 and analyzing the threads with Eclipse Memory Analyzer Tool we have the scenario in the attached screenshot. Is it the same issue?
thread_from_mat

@normanmaurer
Copy link
Member

@jesty no... this class only existed in 4.1.

@jesty
Copy link

jesty commented Jan 30, 2019

@jesty no... this class only existed in 4.1.

ok, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants