Skip to content
This repository has been archived by the owner on Aug 13, 2019. It is now read-only.

Currently exposed prometheus metrics are not enough for problem analysis #123

Closed
onorua opened this issue Aug 19, 2017 · 2 comments
Closed

Comments

@onorua
Copy link

onorua commented Aug 19, 2017

I've tried to analyze the reason why prometheus server is utilizing more than 128GB of RAM (causing its killing by OOMKiller), but I could not find the reason for it, as there is no time series and samples/sec metrics exposed anymore. samples/sec may be Prometheus specific, but amount of series is definitely TSDB specific.

I believe we need something like this:
https://github.com/prometheus/tsdb/blob/master/head.go#L219-L221
but on "global" level.

If you think that 1.x era metrics are not applicable anymore, could you please provide the list of metrics to pay attention and some performance indicators?

@fabxc
Copy link
Contributor

fabxc commented Aug 19, 2017

You can still get samples per second via rate(tsdb_samples_appended_total[5m]).
Total amount of series in the DB is actually non-trivial to even compute, but currently active series can be queried via sum(scrape_samples_scraped).

(I told you out of band, but just again here for public reference)

We should definitely have some docs on how to analyze 2.0.

@fabxc
Copy link
Contributor

fabxc commented Sep 11, 2017

Amount of metrics has significantly increased in master, giving detailed overviews of active series, in-memory chunks, and more. They are all available under tsdb_*.

@fabxc fabxc closed this as completed Sep 11, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants