Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

config reloads cause huge increase in memory #3858

Closed
jhorwit2 opened this Issue Feb 17, 2018 · 9 comments

Comments

Projects
None yet
4 participants
@jhorwit2
Copy link

jhorwit2 commented Feb 17, 2018

What did you do?

I updated the config in a way that should not noticeably impact cpu/memory (updated the scrape timeout from 5s to 10s).

What did you expect to see?

I expected the scrape timeout to increase, which it did and I expected CPU/memory usage to remain the same.

What did you see instead? Under which circumstances?

Memory spikes at reload time

image

I've noticed this happen with any config change as did the memory usage. The series created/removed and appended samples per second had no fluctuation.

Environment

  • System information:

Running the prometheus 2.2.0-RC0 docker container from dockerhub
Host is Linux 4.1.12-112.14.13.el7uek.x86_64 x86_64

  • Prometheus version:
prometheus, version 2.2.0-rc.0 (branch: HEAD, revision: 1fe05d40e4b2f4f7479048b1cc3c42865eb73bab)
  build user:       root@f7abb25edc70
  build date:       20180213-11:40:47
  go version:       go1.9.2
@jhorwit2

This comment has been minimized.

Copy link
Author

jhorwit2 commented Feb 17, 2018

I should add I've noticed this on 2.1.0 as well. It causes OOMs once prometheus has enough data.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Feb 17, 2018

This is expected when you change your target labels, we need to create new chunks for all the new metrics.

@jhorwit2

This comment has been minimized.

Copy link
Author

jhorwit2 commented Feb 17, 2018

@brian-brazil That's understandable, but I'm not changing any target labels in this scenario. The only thing i updated was the global scrape timeout (i had nothing timing out before i just wanted to prove this is why it was increasing).

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Feb 17, 2018

In that case this was probably coincidental, 15M new chunks is going to take ~15GB of RAM. That's what the graphs you originally had in this bug indicates to me.

@jhorwit2

This comment has been minimized.

Copy link
Author

jhorwit2 commented Feb 17, 2018

@brian-brazil the reason I removed that graph is I've since run it a couple more times and don't see a correlation between memory chunks increasing and the memory increase during a reload.

@jhorwit2

This comment has been minimized.

Copy link
Author

jhorwit2 commented Feb 17, 2018

image

image

I ran another test at ~11:25 which saw an increase in memory but no increase in memory chunks. The config change was decreasing the scrape timeout from 15s to 10s.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Feb 19, 2018

This is probably caused by re-creating all scrape loops on reload:

prometheus/scrape/scrape.go

Lines 223 to 239 in 404b306

for fp, oldLoop := range sp.loops {
var (
t = sp.targets[fp]
s = &targetScraper{Target: t, client: sp.client, timeout: timeout}
newLoop = sp.newLoop(t, s)
)
wg.Add(1)
go func(oldLoop, newLoop loop) {
oldLoop.stop()
wg.Done()
go newLoop.run(interval, timeout, nil)
}(oldLoop, newLoop)
sp.loops[fp] = newLoop
}

In the past this was fairly un-invasive as loops didn't hold any meaningful memory state. But now we actually keep rather big caches per loop for our scrape performance optimisations.
So I suppose one could say the spike on reload is still a lot less than the general baseline would be without those caches – but ideally it still wouldn't happen of course.

I think it should be safe to hand over the entire scrape cache object to the new loop and thus avoiding the whole issue. But it needs some careful consideration.

@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Sep 6, 2018

Closing it as a duplicate of #3112.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 22, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 22, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.