Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a gauge for "number of open chunks". #1710

Closed
discordianfish opened this Issue Jun 6, 2016 · 8 comments

Comments

Projects
None yet
3 participants
@discordianfish
Copy link
Member

discordianfish commented Jun 6, 2016

We have a metric for rushed mode, but non for throttled ingestion which is at least as important.

Additionally to that, it might be also useful to add a metric about the currently active timeseries which, I think, is different from prometheus_local_storage_memory_series because that shows how many Prometheus could fit into memory, not necessary how many would be active if it could fit more.
Or, this doesn't make sense but then the documentation could be a bit cleaner in that regard.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jun 6, 2016

  • "Active timeseries" is somewhat vague and therefore cannot have a metric. (What would be the threshold of not considering a time series active anymore? See also prometheus/docs#442 – my proposed definition there would be exact but it would be excessively expensive to track.) What could be helpful to answer your question might be the number of open head chunks. These are chunks that are still mutable and cannot be persisted and thus cannot be evicted. Would that satisfy your use case?
  • There cannot be an on/off metric for throttled ingestion because it is not a state (like rushed mode). Each time a scrape or rule evaluation starts, the storage layer is asked if it is OK to do so. Essentially, the answer is "no" if the (unbumped) persistence urgency score would be 1. As an approximation, you could use prometheus_local_storage_persistence_urgency_score, but note that it is calculated at different times and then obviously sampled by scraping in intervals, but that's the best answer you can get: You are "in throttled mode" (which does not exist) if the urgency score is 1.
  • WRT to documentation: I have troubles going into all the detail in the docs because the sheer amount of text will render it useless for quick troubleshooting. I'm happy to review PRs that give it a try, see also prometheus/docs#442 as mentioned above (where the original author hasn't come back in a while).
@discordianfish

This comment has been minimized.

Copy link
Member Author

discordianfish commented Jun 7, 2016

What could be helpful to answer your question might be the number of open head chunks. These are chunks that are still mutable and cannot be persisted and thus cannot be evicted. Would that satisfy your use case?

Yes, that sounds exactly what I'm looking for and would be super helpful.

Each time a scrape or rule evaluation starts, the storage layer is asked if it is OK to do so. Essentially, the answer is "no" if the (unbumped) persistence urgency score would be 1.

What do you think about an 'error' counter which would be incremented if the storage layer answers "no"? But I'm good with using prometheus_local_storage_persistence_urgency_score.

WRT to documentation..

Let's discuss that over in prometheus/docs#442, looks like this is about the same confusion.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jun 7, 2016

A "throttling_needed_total" counter sounds like a good idea. Easy to implement, and easy to reason with. Can alert nicely on increase.

Title adjusted.

@beorn7 beorn7 changed the title Add throttled ingestion metrics Add a counter for "throttling needed" and a gauge for "number of open chunks". Jun 7, 2016

@beorn7 beorn7 self-assigned this Jun 7, 2016

@juliusv juliusv added the help wanted label Jul 23, 2016

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Jul 28, 2016

@beorn7 Both cases already have separate counters for the throttled cases:

iterationsSkipped.Inc()

targetSkippedScrapes.WithLabelValues(interval.String()).Inc()

Can this issue be considered done?

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jul 29, 2016

You are right about the counts for "throttling needed". But we still need a Gauge for "open chunks". That's seems to me a really useful metric to have.

@ghost

This comment has been minimized.

Copy link

ghost commented Oct 5, 2016

@beorn7 @juliusv @discordianfish Would this be a good first issue to take a stab at? Or is there something else that'd be better suited?

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Oct 5, 2016

@copyconstructor A metric for open chunks sounds like a really good starting point: Note that the chunking code was just moved to its own package, which made the metrics collection a bit more complicated. Just have a look how it is done right now, and feel free to ask me or @juliusv if you have questions.

@beorn7 beorn7 changed the title Add a counter for "throttling needed" and a gauge for "number of open chunks". Add a gauge for "number of open chunks". Oct 5, 2016

jmeulemans added a commit to jmeulemans/prometheus that referenced this issue Feb 16, 2017

@beorn7 beorn7 closed this in #2435 Feb 17, 2017

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 24, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.