New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak when monitoring is disabled (reproducible, steps are known) #228

Closed
flexoid opened this Issue Nov 15, 2018 · 3 comments

Comments

Projects
None yet
2 participants
@flexoid

flexoid commented Nov 15, 2018

Hi.

We discovered a serious bug which is almost failed our production setup by consuming all available memory, mostly by the background worker (DelayedJob) processes.

We have monitor: false (!!!) for that server as we didn't plan to trace it for that time, and a picture like that.

image

After some time in heap dump analyzing I found out that there are too many old objects like ScoutApm::MetricSet, ScoutApm::NumericHistogram and other related to ScoutApm.

A fast code review and further process memory analyzis revealed that ScoutApm::AgentContext#request_histograms_by_time contains a huge amount of objects (32k+).

As I understand, this line explicitly specifies that BackgroundWorker, which should collect all this old histograms, is not started if monitoring is disabled, but looks like metrics are collected anyway (installed here).

Please, contact me if you'll need additional information.

@cschneid

This comment has been minimized.

Contributor

cschneid commented Nov 15, 2018

I believe what's happening is that we treat the Sidekiq (and resque, and so on) instruments differently than library (ActiveRecord, HTTP, etc) instruments.

You pointed to the exact line that looks wrong. The library instruments are behind that check of should_load_instruments? which does the monitor check. Somehow the background instruments don't have that check.

So what's happening is that a bit too much is installed, and the sidekiq instruments are capturing some data, but since the rest of the agent is (correctly) not started, that data never moves forward and out from through normal means, so it slowly grows over the course of several days.

I'll have a new version out to you in a bit that will prevent Sidekiq from being instrumented at all when monitor = false.

Thank you for the excellent report, we'll have it fixed right away.

@cschneid

This comment has been minimized.

Contributor

cschneid commented Nov 19, 2018

#231 should fix this. I'm going to do some more testing before fully releasing it.

@cschneid

This comment has been minimized.

Contributor

cschneid commented Dec 3, 2018

@flexoid - I merged and deployed this in 2.4.21. Thanks again for the excellent report.

@cschneid cschneid closed this Dec 3, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment