Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sniffing in 2.2.2 causes memory leak & OOM #392

Closed
spuzon opened this issue Mar 14, 2016 · 5 comments
Closed

Sniffing in 2.2.2 causes memory leak & OOM #392

spuzon opened this issue Mar 14, 2016 · 5 comments
Assignees
Labels

Comments

@spuzon
Copy link

spuzon commented Mar 14, 2016

Hi,
I run few instances of logstash, since upgrade to 2.2.2 i started to notice permanent crashes caused by OOM errors. Bigger heap only increase interval between OOMs. I checked heap dump and found excessive amount of org.apache.http.config.Registry objects.

Out of 3GB I had 887000 Registries taking 1.4GB of memory.
screen shot 2016-03-14 at 8 39 01 am

I started to monitory tenured memory utilization and amount of Registries object instances on the heap for each logstash instance.

Number of Registry objects 1GB heap (jmap -gchisto)

screen shot 2016-03-14 at 8 38 16 am

Tenured memory utilization 1GB heap (jstat -gcutil)
screen shot 2016-03-14 at 8 41 41 am

Drops from 85% to few percent indicate watchdog killed process as there was no point to waste CPU due to CMS cycles as memory won't be recycled.

You may notice one series of data on those screenshots that looks pretty fine, that's instance with sniffing set to false, all others had sniffing => true

Once disabled sniffing on all instances the problem is gone.

Disabled Sniffing: Number of Registry objects 1GB heap (jmap -gchisto)
screen shot 2016-03-14 at 8 54 32 am

@suyograo suyograo added the bug label Mar 14, 2016
@spuzon spuzon changed the title Sniffinf in 2.2.2 causes memory leak & OOM Sniffing in 2.2.2 causes memory leak & OOM Mar 14, 2016
@jsvd jsvd assigned jsvd and unassigned andrewvc Mar 14, 2016
@jsvd
Copy link
Member

jsvd commented Mar 14, 2016

I've done some investigation and it is possible that the HttpClient objects from Manticore aren't being freed..my findings are in cheald/manticore#45

@suyograo
Copy link
Contributor

This has been fixed in version 2.5.3.Many thanks to @jsvd and @cheald.

To install this you can do:

bin/plugin install --version 2.5.3 logstash-output-elasticsearch

@cheald
Copy link

cheald commented Mar 15, 2016

FWIW, Manticore::Client instances are intended to be create-once-reuse-many. If you're frequently creating new instances, that's highly suboptimal; Clients are fully threadsafe and may be created once and then shared among arbitrarily many workers. All the work is managed by the Client's backing pool. This class of leak is likely because of frequent Manticore::Client instantiation. It needed to be fixed, but it occurring suggests to me that Manticore may not be able to properly take advantage of all the HttpClient goodies if its backing pools are being regularly thrown away.

I'll poke around a bit in the plugin and see how it's being used - might be that we can find a more efficient way to use Manticore :)

@cheald
Copy link

cheald commented Mar 15, 2016

It actually looks like this was recently changed. elastic/elasticsearch-ruby@067842d looks like it should sidestep the many-instances issue, combined with today's fixes should make Manticore much more friendly to user resources!

@suyograo
Copy link
Contributor

@cheald thanks. @andrewvc has been working to make our Manticore usage more efficient. See elastic/elasticsearch-ruby#281.

We would love any feedback though :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants