Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add real counter semantics #2473

Closed
bschick opened this Issue Mar 6, 2017 · 3 comments

Comments

Projects
None yet
3 participants
@bschick
Copy link

bschick commented Mar 6, 2017

Would like to track the current value of counters over a long period of time.

For example, how many get requests has my site ever received. Right now I don't see an easy way to do with with Prometheus due to resets on process restart. The only solution that I can see it to save the actual counter values in some shared DB outside of the processes and Prometheus.

This is particularliy problematic for distributed applications that are best not being tighly coupled through share writable state and restart often (which IMO is the modern version of a GC).

Issues like this #1334 propose a solution, but will not work because of limited retention times.

A solution could be server and client library support for counter semantics

Rather than send the current value to the server, clients would send increment commands. The server would be responsible for caching (or loading) the last value, incrementing by the amount sent, and storing the result.

Client libraries would have to be changed to expose increments since last scrape. Yes, this gets a bit tricky because a scrape could happen followed by a server crash (or whatever) resulting the increment command being lost. A few ways to handle this:

  • Just acknowledge and ignore the problem. It is not any worse than today where the data for counters is discarded by design unless clients track it.
  • Add versions, acks, etc. (much like an MQ protocol) to provide more robust updates
  • I assume you'd still support the current semantics (counter v1) that would allow clients to track and report values directly. They could then use whatever means they want to ensure correctness (as they'd do today).

Expected result

Prometheus would get a new style of counter that could increment over any period of time. You might only have the last 1 month of data points, but each counter's last value would be useful. Today, counter are really only useful for rate unless the clients to a bunch of work or the counters are short lived.

@ekarak

This comment has been minimized.

Copy link

ekarak commented Mar 6, 2017

There are gauges for this genre of timeseries data; just use a recording rule to aggregate them on a fixed time interval, e.g. to sum them up on a 1-minute interval by instance and request type. Just align the evaluation interval (60s by default) to your sum_over_time interval and you'll be fine.
You shouldn't directly use sum(sum_over_time(<metric>[1m])) on a raw gauge metric to get the nr of requests per minute, because for a 15 seconds scrape interval, you'd get 4x the actual value.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Mar 6, 2017

I think you don't understand how Prometheus counters work, please see https://www.robustperception.io/how-does-a-prometheus-counter-work/ and my upcoming talk at Cloudnativecon Europe in a few weeks.

If you've further questions about counters, you can ask them at https://groups.google.com/forum/#!aboutgroup/prometheus-users

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.