Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prometheus crashes #1355

Closed
zwopir opened this Issue Jan 28, 2016 · 14 comments

Comments

Projects
None yet
4 participants
@zwopir
Copy link

zwopir commented Jan 28, 2016

prometheus crashes. No action except normal scaping during that time:
full panicparse stacktrace: https://gist.github.com/zwopiR/686590a0c53fd52ee8c9
truncated full stacktrace: https://gist.github.com/zwopiR/d00c6c63ce5ba4085973

prometheus, version 0.16.1 (branch: stable, revision: 968ee35)
build user: @9f8f0f8d724a
build date: 20160118-19:00:26
go version: 1.5.1
running inside docker prom/prometheus.
I happily provide any further information you may need

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 28, 2016

Given that it failed creating a thread, the problem likely isn't on Prometheus's side. Did you run out of memory?

@zwopir

This comment has been minimized.

Copy link
Author

zwopir commented Jan 28, 2016

node_memory_Active, node_memory_Cached compared to node_memory_MemTotal doesn't indicate so. Are they any metrics from node_exporter that may help tracking down this issue?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 28, 2016

What was node_memory_MemFree? If your Cached and MemFree is low you're out of memory.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jan 28, 2016

It could also be a rarely striking bug in Go1.5.1, which looks like you are out of memory. Try 0.16.2, which is compiled with Go1.5.3 or build it yourself with a recent Go compiler.

This is almost certainly not a bug in Prometheus itself.

@zwopir

This comment has been minimized.

Copy link
Author

zwopir commented Jan 28, 2016

not sure about being out of memory. The the attached screenshot. The tooltip depicts roughly the last scrape, but anyway we'll try the new version
bildschirmfoto 2016-01-28 um 14 45 58

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 28, 2016

That looks fine memory wise.

@zwopir

This comment has been minimized.

Copy link
Author

zwopir commented Jan 28, 2016

so we might have hit the Go1.5.1 Bug?! we give the latest version a try. Thank you very much!

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Feb 2, 2016

@zwopir Any more crashes seen?

@zwopir

This comment has been minimized.

Copy link
Author

zwopir commented Feb 3, 2016

we switched to 0.16.2 two days ago. No crashes so far, but this morning lots of
`time="2016-02-03T08:15:00Z" level=warning msg="Sample ingestion resumed." source="storage.go:555"
time="2016-02-03T08:15:00Z" level=warning msg="2097152 chunks waiting for persistence, sample ingestion suspended." source="storage.go:551"```

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Feb 3, 2016

This means that the load on your storage is somewhat high. You may want to consider switching to a larger server.

This sort of log-spam will be gone with 0.17.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Feb 3, 2016

The sample ingestion suspended actually means you are in really bad shape. (It's not the 'storage is now in degraded mode' message, which is less bad than it sounds. Both issues are addressed in 0.17.)

You definitely need to either reduce your ingestion rate or increase the number of chunks allowed to wait for persistence even more (looks like you already did – if you have enough RAM, you can do so even more, see http://prometheus.io/docs/operating/storage/ for details).

Note that the number of chunks waiting for persistence is overestimated in 0.16.2. This is also fixed in 0.17.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Feb 3, 2016

Closing since the original issue is resolved.

@beorn7 beorn7 closed this Feb 3, 2016

@zwopir

This comment has been minimized.

Copy link
Author

zwopir commented Feb 3, 2016

thanks for the advice on storage parameters. Looking forward to 0.17

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 24, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.