Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus OOM crash upon restart #3559

Closed
christinehylin opened this Issue Dec 7, 2017 · 12 comments

Comments

Projects
None yet
3 participants
@christinehylin
Copy link

christinehylin commented Dec 7, 2017

What did you do?
Started prometheus service after service stopped due to OOM.

What did you expect to see?
Prometheus starting normally.

What did you see instead? Under which circumstances?

OOM crash due to old memory issue before crash.

Environment
Linux

  • System information:

    Linux 4.4.0-101-generic x86_64

  • Prometheus version:

prometheus, version 2.0.0 (branch: HEAD, revision: 0a74f98)
build user: root@615b82cb36b6
build date: 20171108-07:11:59
go version: go1.9.2

Prometheus is running though

  • Alertmanager version:

    insert output of alertmanager --version here (if relevant to the issue)

  • Prometheus configuration file:

insert configuration here
  • Alertmanager configuration file:
insert configuration here (if relevant to the issue)
  • Logs:
insert Prometheus and Alertmanager logs relevant to the issue here

prometheus_crash.txt

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Dec 8, 2017

If you crashed initially due to an OOM, it's not surprising that the same issue reoccurs when you restart. I'd suggest checking what queries are running against the Prometheus, one of them might be taking a lot of ram.

@christinehylin

This comment has been minimized.

Copy link
Author

christinehylin commented Dec 8, 2017

@brian-brazil There were no queries running on Prometheus at the time.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Dec 8, 2017

It seems to get to the scraping phase, so you're probably pulling in too much data. Try a machine with more RAM.

@christinehylin

This comment has been minimized.

Copy link
Author

christinehylin commented Dec 8, 2017

We've done that, thanks.

@christinehylin

This comment has been minimized.

Copy link
Author

christinehylin commented Dec 14, 2017

@brian-brazil Nov 28 21:48:15 q2-gog-sh00-pm1 prometheus[25868]: level=info ts=2017-11-28T21:48:15.372378972Z caller=targetmanager.go:71 component="target manager" msg="Starting target manager..." was this the line you used to determine that scraping phase commenced?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Dec 15, 2017

That, and there's scraping goroutines. Do you have additional information that indicates this is a bug?

@christinehylin

This comment has been minimized.

Copy link
Author

christinehylin commented Dec 15, 2017

No the behavior seems to be in line with what you suggest. However, is it sometimes expected that prometheus runs oom on restart even when Prometheus didn't crash prior? I was restarting Prometheus to reload configs and it was fine prior to the restart but then it refused to start up.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Dec 15, 2017

That can happen depending on query load. You don't need to restart Prometheus for a new config, you can send a SIGHUP.

@christinehylin

This comment has been minimized.

Copy link
Author

christinehylin commented Dec 15, 2017

I learned that after the first time i tried restarting. We ended up reducing scrape targets to get Prometheus back up, there were no queries at the time though.

@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Jul 24, 2018

@christinehylin are you still getting this issue? If yes, have you tried upgrading Prometheus to the latest stable version (v2.3.2)?

@christinehylin

This comment has been minimized.

Copy link
Author

christinehylin commented Jul 24, 2018

@simonpasquier I think we're good now.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 22, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 22, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.