Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upFR: Make rate() / increase() algorithmically more efficient by allowing precomputation of "counters-with-monotonicity-corrections-already-applied" #2655
Comments
This comment has been minimized.
This comment has been minimized.
|
Simple example:
If Complications:
|
This comment has been minimized.
This comment has been minimized.
|
This is not something we can solve within Prometheus, at the least this would require perfect long term storage. Precomputation via recording rules is the recommended way to solve this.
This is something you normally do anyway. You want a single range for all your rate functions inside one Prometheus, preferably across your entire organisation. |
This comment has been minimized.
This comment has been minimized.
|
@brian-brazil I'm not sure that strongly suggesting (or weakly imposing) a necessary choice of fixed windows is a great idea, but regardless, precomputing functions of aggregations across windows of particular sizes requires access to historical data anyway, doesn't it? Given that we already need to support historical data access for that, i'm asking for a slightly more general type of support for historical data access alongside this, which enables an extremely powerful complexity optimization. |
This comment has been minimized.
This comment has been minimized.
No, it only requires our current imperfect approach. The design you propose requires perfect historical data, which is not within the scope of our design (or possible in general) as you can't handle historical data not being available briefly. |
This comment has been minimized.
This comment has been minimized.
|
Hmm, ok. What about the thought that @juliusv was leading towards: implementing this idea as an optimization strategy in query execution? The major issue is that once the post- Perhaps caching the results of one computation temporarily for shortcutting most of the computation of the inevitable next query (with a slight offset) would be doable? This could be transparent to the user, not require additional syntax, and done as an optimization strategy by the query engine? It would definitely require the engine to have knowledge of how to make use of partially-precomputed results, but that seems reasonable. Or perhaps allowing a specification of a temporary precomputed series that would exist for the "duration of a transaction of other queries" would be possible? I can think of lots of uses for something like this! |
This comment has been minimized.
This comment has been minimized.
This won't work due to our extrapolation behaviour, which Julius already postulated.
This won't work for the same reason, plus you have to worry about cache invalidation.
You can use recording rules for this. |
This comment has been minimized.
This comment has been minimized.
This requires preordained choice of what you'd want to do this for, which very much limits the power of it (and could be very expensive if not all possible choices are used often). |
This comment has been minimized.
This comment has been minimized.
The only part of that which would currently create a problem if one pre-corrected counters seems to be this: https://github.com/prometheus/prometheus/blob/master/promql/functions.go#L85-L97 I haven't stared at it for long enough to see if there's an easy workaround.
For a range query, we load the entire graph range for each series once from storage and then run the expression on it at different timesteps. Pre-correcting counter resets over the entire loaded series (the expensive part of
Recording rules suck (need to be configured, don't backfill history, require you to query differently). I'm just saying, if there's an easy way to optimize this under the hood for rates/increases with long range windows over large graph windows, that would be good. |
This comment has been minimized.
This comment has been minimized.
With new storage we iterate over data rather than pre-loading it, so there'd be memory implications here.
We're getting into fairly advanced query engine optimisation here. There's lower hanging fruit. |
This comment has been minimized.
This comment has been minimized.
That'll make it more complicated, alright. In general I'm not saying that we should do this right away, but that it's a useful optimization we might want to consider sometime in the future. The efficiency gains at least for the mentioned use case would be very large. |
This comment has been minimized.
This comment has been minimized.
As far as I can see, there is no difference in this regard. Only in the call at https://github.com/prometheus/prometheus/blob/master/promql/functions.go#L58 are we materializing the data from iterators. This is working the same in the new storage. Problem is more that we materialize / decode the data for each data point again. With a long range in the range expression, each materialized matrix will have a large overlap, and there would be a large gain in caching. I think this touches the discussion we had some weeks ago in person with @fabxc about optimizations in the query layer. |
This comment has been minimized.
This comment has been minimized.
|
Based on work in #3966, I wouldn't expect any benefit from this. rate() is already faster than changes(). |
brian-brazil
closed this
Mar 20, 2018
This comment has been minimized.
This comment has been minimized.
|
Here's a simpler alternative: for each counter remember when it was last reset. Past that timestamp you can use subtraction, without needing to iterate over intermediary samples. It doesn't even have to be very precise: a (reliable) pessimistic approximation would give most of the benefits. However, given the current TSDB implementation, it is unlikely to provide a performance improvement unless you're computing rates/increases over 120+ samples (quite the opposite actually, since you may end up decoding the same chunk twice). But with the addition of a cache in front of the TSDB, it may work quite nicely. |
This comment has been minimized.
This comment has been minimized.
|
With OpenMetrics we'll actually gain that information, though how to integrate it sanely with PromQL is unclear and the data may not be 100% reliable due to races. With #3966 we should only be decoding each chunk once. |
This comment has been minimized.
This comment has been minimized.
|
I couldn't find anything regarding that in the OpenMetrics repository, but I'm guessing it has to do with the scrape target providing this information. What I was referring to was for Prometheus (TSDB?) to maintain an in-memory threshold (doesn't even need to be persisted). The equivalent of a cache, if you want. You can always go back and do the expensive calculation regardless of whether the counter actually got reset or not, but after a few hours of Prometheus uptime you'll probably take the fast path 99.9% of the time (particularly with rules). But yeah, given #3966, unless you're doing binary search over the decoded data AND the range is large enough for this to be significantly faster than iteration, it's not a significant performance boost. |
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 22, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
virusdave commentedApr 26, 2017
•
edited
While there are functions that are specialized for monotonic counters (rate, increase, etc), they tend to be performance-pessimistic. That is, they tend to require an adhoc analysis of datapoints throughout the time range of interest to be of use.
It would be great if one could transform a monotonic counter being recorded into a post-corrected counter, perhaps as a recording-time transformation.
Basically, a recording rule like:
This doesn't seem possible right now, but if this were possible, then expensive calculations like
increase()over larger windows would be reduced to a constant-time single subtraction (monotonic - (monotonic offset window)) rather than a linear operation overwindow(whichincrease()seems to be now). This is obviously a significant (algorithmic) improvement.This could potentially also be done as a query-time optimization, completely invisible to the user.
This real problem manifests itself most immediately to me when trying to have grafana dashboards showing percentiles of various distributions (latency, sizes, etc).
The problem can be partially mitigated (really, slightly masked) by precomputing the percentiles, but this requires choosing a discrete set of window durations ahead of time. It also doesn't actually solve the problem, it just amortizes it across each sample. In fact, since sampling potentially happens more often than a particular dashboard graph is rendered, this might even be substantially more expensive.
The discussion leading to this request (and ideas on how it could be handled) is at https://matrix.to/#/!HaYTjhTxVqshXFkNfu:matrix.org/$1493164205657354HHxeu:matrix.org