Skip to content
This repository has been archived by the owner on Aug 31, 2018. It is now read-only.

Memory leak in Hk2ThreadLocal #43

Closed
swenson opened this issue Dec 15, 2016 · 14 comments
Closed

Memory leak in Hk2ThreadLocal #43

swenson opened this issue Dec 15, 2016 · 14 comments

Comments

@swenson
Copy link

swenson commented Dec 15, 2016

We (myself and @laazik and others) are tracking down a memory leak, and it seems to be related to the switching out of ThreadLocal for Hk2ThreadLocal in PerLocatorUtilities: 4fa6c59

Hk2ThreadLocal appears to have fewer protections, such as the ThreadLocalMap used in ThreadLocal.

Would it be possible to switch back to ThreadLocal while we work on a more complete Hk2ThreadLocal?

Here is an example heap histogram from one of our crashed instances. You can see the number of WeakHashMap, WeakHashMap.Entry, HashMap.Node, and Long instances are highly correlated and massive.

heap histogram 2016-12-15 11-37-17

I would be happy to work with you on this. Please advise.

@jwells131313
Copy link
Contributor

As I recall the reason we moved to Hk2ThreadLocal was because it solved at least one class of memory leak with the ability to call removeAll in any thread which would wipe out all threads memory. Do you have a simple test somewhere that demonstrates the memory leak you are seeing?

@swenson
Copy link
Author

swenson commented Dec 15, 2016

I don't have a simple test at the moment -- this typically causes our applications to crash after hours of heavy usage.

@jwells131313
Copy link
Contributor

I assume the gc roots of all of those HashMaps goes back to the Hk2ThreadLocal. Are you using many many threads? I'm just trying to get a handle on the circumstances under which this leak manifest itself. Can you describe why you think this might be happening?

@swenson
Copy link
Author

swenson commented Dec 15, 2016

After processing very heavy load on a Jersey server -- I believe many, many, many threads over hours or days are used to serve requests, which accumulate in the Hk2ThreadLocal's hash map.

Since the Hk2ThreadLocal uses a plain HashMap instead of a ThreadLocalMap (which uses weak references, I believe), I would guess that the Longs and values accumulate over time.

@jwells131313
Copy link
Contributor

I'm not getting it. There would be one or two of these maps which would have a single entry per thread. Unless you actually have 7.7 million threads I am not understanding where these things come from. Or that many ServiceLocators (which is where the WeakHashMaps could be coming from). How many ServiceLocators do you see? Can you like find one and print out it's ID? That would give some information about how many there are.

@jwells131313
Copy link
Contributor

Also if there are 7.7 million threads... well... that sounds like a Jersey thread leak...

@jwells131313
Copy link
Contributor

Unless... maybe if there HAVE been 7.7 million threads over time hk2 thread locator might be holding on to dead threads data. I'll at least look into that

@swenson
Copy link
Author

swenson commented Dec 15, 2016

I think that there HAVE been 7.7 million threads over time. In one of our services, it takes weeks of high load to produce this behavior. (And reverting back to a version of hk2 with the old ThreadLocal does not exhibit the same memory leak.)

@laazik
Copy link

laazik commented Dec 15, 2016

We are running/handling the requests using the async @Suspended AsycnResponse resp model, and the work inside the methods is happening using the completable future chains, which are ran in the default fork-join pool. Resumption of the work resp.resume() is also called from that chain.

To the note of @swenson indeed, the number of requests this service is handling is in the number of tens of millions. We can also repro it under very heavy load in around 4 - 6 hours, I try to re-run the crash test tomorrow to get exact numbers.

Also yes, the heap graphs are night and day between the version where this change was enabled and version where it was not there. We have had now a stable run with previous (non HK2 thread local version) for around a month without restarts and the heap usage is running between 0.36 and 0.46 percent of total.

@jwells131313
Copy link
Contributor

Yes, this was a bug and a pretty bad memory leak. I just pushed a fix for it, should be available in github shortly. Fix will be in the next release of hk2

@jwells131313
Copy link
Contributor

Also added a test so closing this issue, but do tell me if it does help with your use case!

@swenson
Copy link
Author

swenson commented Dec 19, 2016

Thank you so much!

@ChristerF
Copy link

@jwells131313 do you know when the next release will be?

@jwells131313
Copy link
Contributor

Hk2 version 2.5.0-b31 was released today, it has the memory leak fix. I don't know when Jersey will officially pick it up. It should just be a drop-in replacement for b30 which is what the latest Jersey is using today

joschi pushed a commit to Graylog2/graylog2-server that referenced this issue Jan 3, 2017
Memory leak in versions before HK2 2.5.0-b31:

* javaee/hk2#43
* javaee/hk2@bb2621c
edmundoa pushed a commit to Graylog2/graylog2-server that referenced this issue Jan 10, 2017
Memory leak in versions before HK2 2.5.0-b31:

* javaee/hk2#43
* javaee/hk2@bb2621c
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants