Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a function that adjusts for counter resets #1334

Closed
gtopper opened this Issue Jan 22, 2016 · 12 comments

Comments

Projects
None yet
4 participants
@gtopper
Copy link

gtopper commented Jan 22, 2016

Without transforming the counter into a rate, which is effectively what increase() does. Such a function should therefore always result in a monotonically increasing vector. This is useful when counters represent KPIs (e.g. money).

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jan 22, 2016

Would this function really be meaningful in the Prometheus context? It would require to look back until the beginning of history to add up all the counter-resets. However, since we have limited retention, whenever a counter reset falls off the end of the retention time, the current value would change.

In a way, you already have this function by saying increase(some_metric[1000d]) assuming that your retention time is less than 1000d.

@gtopper

This comment has been minimized.

Copy link
Author

gtopper commented Jan 22, 2016

The issue with increase(some_metric[1000d]) is that it's very inefficient, and does not yield a monotonic time series. As far as I can ascertain, increase does the same thing as rate, only without scaling down to per-second values. This is confirmed by reading increase's implementation, which comes down to simply scaling the values in the time series.

What I'd like to achieve is a graph where all time series start at zero, and one can compare the rise of different series over time, as well as at the end of the observed time frame. Here's what it should look like. This presentation form is quite useful for certain KPIs (e.g. revenue).

The example above is actually graphed using Graphite's integral() function, which inherently results in just why I've described: a monotonically increasing graph where all time series start at zero.

I now realize that, for my use case, the function suggested in this ticket is actually not be quite sufficient, as the resultant graph would not start at zero, but at whatever arbitrary value the counter had reached at the beginning of the observed time frame.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jan 22, 2016

I think I've got it now.

So, you want to graph something that refers to a fixed point t0 in the past. The graph should show the (counter-reset corrected) value of a counter C at the variable time t minus the value at t0, i.e. f(t) = C(t)C(t0).

The fundamental problem is that PromQL has no notion of that fixed point in the past. It's all relative and sliding window. Perhaps somebody else has an idea, but I believe that we needed to introduce a fundamentally new concept into PromQL to satisfy your request.

@gtopper

This comment has been minimized.

Copy link
Author

gtopper commented Jan 22, 2016

Yup, you got it. It's worth stressing that I don't want to explicitly hard code a reference point. I'd like the graph drawn to always start at zero. Having to reformulate queries for that would be very unfriendly to my dashboard's users (I'm using Grafana - not sure if it matters).

I know that PromQL is a language for both graphing and alerting, and that the functionality I'm seeking is probably useless for alerts (as there is no concept of a graph being drawn, just time windows). Still, I think it's very valuable for graphing certain KPIs. For my use case, I'll probably have to keep Graphite on for this functionality, while using Prometheus everywhere else for its superior support for breakdowns and alerts (as an added bonus, graphs look sharper / more accurate too).

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Jan 22, 2016

Yeah, that's really impossible with Prometheus's query model at the moment. Prometheus just takes your expression and evaluates it at every resolution step in the graph, but each of those evaluations has no notion of the graph range, or that it is even shown in a graph at all.

This might be better solved on the visualization side, but I'd be surprised if e.g. Grafana would be up for supporting this kind of feature.

@gtopper

This comment has been minimized.

Copy link
Author

gtopper commented Jan 22, 2016

I'm not familiar with PromDash, but perhaps it would be solvable there? I'd still need a noresets function, as I wouldn't expect the visualization platform to adjust for counter resets.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 22, 2016

This sound like something better handled on the visualization side, I'm unsure that this is a useful graph and suspect logs are better for this class of use case.

@gtopper

This comment has been minimized.

Copy link
Author

gtopper commented Jan 23, 2016

@brian-brazil, could you elaborate on how logs can address my use case? I've found that being able to look at any timeframe and compare the KPI performance of competing algorithms over that timeframe to be very useful, and doing so in graph form very convenient. Doing it this way is flexible, low latency, and easy to use. My main reason for switching from Graphite to Prometheus is wanting more flexibility to allow for multiple (5-10) orthogonal breakdowns. I might also consider Druid or InfluxDB, though I'm sure they come with their own caveats. In the end, I can keep using Graphite wherever integral is needed, but will not be able to take advantage of Prometheus' superior support breakdowns.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 23, 2016

You seem to trying to get accurate graphs of something to do a type of reporting-like statistical analysis that Prometheus is not designed to do, that usually means that logs are a better approach than metrics.

@gtopper

This comment has been minimized.

Copy link
Author

gtopper commented Jan 23, 2016

Actually, I'm fine with some inaccuracy (let's say up to ±3% margin of error is fine). This isn't for billing or anything. I'm not sure I understand how logs address the use case. Are you suggesting I use something like Spark Streaming to generate periodic reports? That would be more accurate, but much less flexible, and with much higher latency. If I'm missing some obvious solution, please do help me see it. :)

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Feb 29, 2016

You can send queries to Prometheus with an appropriate interval and start time, and as mentioned the execution model has no notion of graphs. It looks like Prometheus already has the required support here, and this is more something that needs to be implemented client side. If this is incorrect and there's something server side that's needed, please let us know.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 24, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.