-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rate()/increase() extrapolation considered harmful #3746
Comments
This would be a violation of the PromQL execution model, a range only returns data from its range. Evaluation should also not look into the future.
This is the recommended way to deal with this. This is something to handle in Grafana. |
I don't know much (if anything) about the PromQL execution model, but whatever it is I don't think it's a good reason for making up data instead of using real data. And if the execution model does in fact trump common sense, there is no good reason for having extrapolation. If my 10 second interval has 2 data points 5 seconds apart, then the increase is the difference between the 2 points and the rate is the difference divided by 5.
Could you please explain how I can get a graph of counter increases that works just as well over 15 minutes and 7 days using a fixed
but that won't work, will it? |
You can't, which is why this is something that needs to be handled in the thing sending the requests to Prometheus. Our current algorithm has been extensively debated and tested, it is very unlikely that any changes will be made. |
There are a couple of problems with that. One, Grafana or whoever creates the dashboard must know the exact resolution of the underlying data in order to be able to produce an accurate workaround for Prometheus' extrapolation. And second, what Grafana would have to do to get that accurate value out of Prometheus is (assuming an original resolution of 5 seconds) something along the lines of:
Does that look like a reasonable solution to anyone? |
Also, when you recommend Grafana as the dashboarding solution in your documentation, you don't get to pass the buck to "the thing sending the requests to Prometheus". Sorry. |
All I can do is recommend is watching https://www.youtube.com/watch?v=67Ulrq6DxwA Several of the things that you consider to be bugs are in fact features, non-integral values are correct for example. If you want exact results you need to use a logs-based monitoring system, not a metrics-based one. |
For reference: The discussion and implementation can be found in the following PRs: |
I found and went through a couple of those before. But all I can see is a lot of discussion about extrapolating data and one quick dismissal of actually including the last value before the start of the range, on 2 accounts: that it wouldn't provide a lower bound anymore (which the current implementation doesn't either) and that with a range that is not a multiple of the scrape interval you'd get a sawtooth pattern.
So I'm asking you: what is the benefit of using extrapolated data when interpolated data is available? Or actual data, without interpolation, shifted by less than the scrape interval? I mean, if (ignoring counter resets for now) you take the difference between the last point in the range and the last point before the range and compute a rate from that (by dividing by the actual distance between the 2 points), then multiply by the range length you'll get a smooth rate, smooth increase and no aliasing. What the $#@% is wrong with that? How is it worse than guesstimating by extrapolation? You can still apply limits in special cases (e.g. if the series starts or ends in the middle of the range, then adjust the rate to reflect the ratio of the range that's actually covered by the data) but overall (i) you have fewer special cases, (ii) it's faster to compute and (iii) you're not replacing information with noise. Why, you could even get a graph of increases that you could add up and would match the actual increase in the counter. |
To be more concrete, what specifically is wrong with this particular solution? func extrapolatedRate(ev *evaluator, arg Expr, isCounter bool, isRate bool) Value {
ms := arg.(*MatrixSelector)
// XXX: Also request the last point before the beginning of the range.
ms.IncludeLastPointBeforeRange = true
var (
matrix = ev.evalMatrix(ms)
rangeStart = ev.Timestamp - durationMilliseconds(ms.Range+ms.Offset)
rangeEnd = ev.Timestamp - durationMilliseconds(ms.Offset)
resultVector = make(Vector, 0, len(matrix))
)
for _, samples := range matrix {
if len(samples.Points) < 2 {
continue
}
sampledInterval := float64(samples.Points[len(samples.Points)-1].T-samples.Points[0].T) / 1000
averageDurationBetweenSamples := sampledInterval / float64(len(samples.Points)-1)
sampleIntervalThreshold := averageDurationBetweenSamples * 1.1
// XXX: If the last point before the range is too far from the range start, drop it.
firstSample := 0
if rangeStart - samples.Points[0].V > sampleIntervalThreshold {
if len(samples.Points) < 3 {
continue
}
firstSample = 1
sampledInterval := float64(samples.Points[len(samples.Points)-1].T-samples.Points[1].T) / 1000
}
var (
counterCorrection float64
lastValue float64
)
// XXX: Unrelated: only need to do the magic dance for counters.
if isCounter {
for i := fistSample; i < len(samples.Points); i++ {
sample := samples.Points[i]
if sample.V < lastValue {
counterCorrection += lastValue
}
lastValue = sample.V
}
}
resultValue := samples.Points[len(samples.Points)-1].V - samples.Points[firstSample].V + counterCorrection
// Duration between last sample and boundary of range.
durationToEnd := float64(rangeEnd-samples.Points[len(samples.Points)-1].T) / 1000
if samples.Points[firstSample].T < rangeStart && durationToEnd < sampleIntervalThreshold {
// XXX: Default case, the samples start just before the range start
// and end just before the range end: the rate is the difference divided
// by the sampled interval and the increase is the rate times the range length.
if isRate {
resultValue = resultValue / sampledInterval
} else {
resultValue = resultValue * (ms.Range.Seconds() / sampledInterval)
}
} else {
// XXX: The samples don't cover the whole range: the increase is the
// difference between the last and first values, the rate is the increase
// divided by the range length.
if isRate {
resultValue = resultValue / ms.Range.Seconds()
}
}
resultVector = append(resultVector, Sample{
Metric: dropMetricName(samples.Metric),
Point: Point{V: resultValue, T: ev.Timestamp},
})
}
return resultVector
} What it does is it requests an extra point before the start of the range and (unless it's more than average interval * 1.1 away from the start of the range) uses the actual difference between the last point in the interval and this point as the base Then, if the samples approximately cover the whole range (within the same average interval * 1.1 before either end) the Else, if the samples only cover part of the range, the There is no sawtooth pattern with any smoothly increasing counter; no jitter; no data dropped if you're computing a And the only downside AFAICT is that it's "a violation of the PromQL execution model", whatever that means. Also, I understand if you want to keep the current behavior for And finally, I get quite tired of the "if you want accurate numbers use logs" argument. I'm not asking for accurate metrics in the face of restarts, timeouts or sampling. Just to have PromQL produce an exact difference between 2 data points X seconds away from each other without having to jump through hoops. It feels literally like someone decided that every arithmetic operation should replace the least significant 90% of the result with garbage. Because that's what I get unless I fight Prometheus tooth and nail. |
@free when I want that precision level it often means that I can use irate or idelta. I miss a increase function however. |
@free After more thinking: I think really what we do with irate/idelta is close to this ; but you want to change it from "between the last 2 datapoint" to "between the initial value and the last datapoint". In that sense, beside the irate/idelta we could another prefixed function like xrate/xdelta |
The reason My data has 5 second resolution, but if I do
In other words, for what I need,
I'd be fine with either replacing the |
Oh, and by the way @roidelapluie , I like your proposed naming, the |
Maybe you can take that one step further and make a pull request? so we can continue the discussion around code? |
Sure I can. I just wanted to get some buy-in before I put in any work, as I did on the TSDB, trying to work around it pushing data to disk and causing all of Prometheus to hang when disk writes block. I spent a couple of weeks working on that and the PR is still open 4 months later: prometheus-junkyard/tsdb#149 . However, if I do any work on this I'll try to avoid going into the selector/iterator and just extend the start of the interval by some fixed amount. Seeing how I have essentially already written the code for it a few comments up, it shouldn't be all that big an effort. It may take me a couple of days though, as I'm quite busy at work these days. |
OK, finally got an initial (but quite clean and complete) iteration done: #3760. There is one difference from my initial proposal (as also described in the PR), namely not to adjust the increase by The other thing I realized is that while Other than that, in my relatively limited testing I haven't found any major issues. You could see some jitter if your start/end time match "exactly" the scrape times (e.g. when scraping times are clustered around whole 5 second multiples and some fall just before, some just after and your start/end times are also multiples of 5 seconds) but that's essentially a given and no issue at all with smoothly increasing counters. |
I appreciate that you are looking at his issue. Unfortunately while considering your proposal I see multiple issues with it. We operate under a number of constraints, and as-is your solution cannot be adopted due to its knock-on effects and inconsistencies with the rest of PromQL. The constraints we are under include:
In addition calling a function "exact" is misleading as exact answers are not possible in metrics, only approximations. If you have suggestions for how to improve the existing rate() function, keeping in mind the above constraints, I'll be happy to consider them for Prometheus 3.0. I suspect though that you will find it more fruitful to try and tackle this in Grafana. The recommended way to use rate() is to provide a range that covers several scrape intervals. If for example if you feel rate() should include data further in the past, you can expand the size of the range when writing your query. You also have the option of pulling the raw data via the API and performing whatever calculations you like. One of the reasons the idelta() function exists is to allow variants of rate() and friends to be implemented via recording rules without adding complexity to PromQL itself, so that's another approach to consider. |
Wow, that was harsh... Let me address your concerns one by one, then. Even though considering your readiness to discuss the validity of my concerns or solution and the fact you threw everything but the kitchen sink at me, I'm sure it will be in vain.
I thought that was what code (and design) reviews were for. To bring the code into a shape that is widely agreed upon, not as a rubber stamp for "adopting code as-is".
The only thing I can say to this is to then give functions that take range vectors a range vector that includes the last point before the range. Whichever functions don't need it, can skip it. Et voila!
My code never does that. I mentioned that in my initial post, but it was neither in my first pseudocode proposal or the PR and I explicitly and repeatedly pointed that out.
xrate() and friends behave exactly the same as rate() and friends (minus the extrapolation and possibly plus an extra point in the beginning). So to the extent that rate() is resilient to all of the above, so is xrate().
I don't know what to say to this except to suggest that one could replace the current rate() implementation; or replace irate(), because xrate() fully matches it over ranges of up to 2x the underlying resolution and in your very first reply you were arguing that hardcoding the range based on the underlying resolution is The Right Way™. Or deal with the fact that the current rate() implementation is broken and replacing it should probably not be done in one go.
I'm not sure what you mean by that, but again, xrate() and friends behave exactly the same as rate() and friends (minus A, plus B etc.). So xrate() deals with stale markers in the exact same way and to the exact same extent that rate() does.
"Maybe" and "so what"? This is indeed true for the current implementation of rate() and increase() but it doesn't mean it's a good/useful idea (see my comments above) or that this needs to hold for xrate() and xincrease(). Or, if this turns out to be the last standing strawman, I can definitely make it so, even though all it would do is introduce noise.
My code makes no changes to range vector semantics whatsoever. AFAICT I've extended the range of the iterators underlying range vectors, but the range vectors themselves work the exact same way as before. As proof of that, existing PromQL tests all pass without modifications. I did extend the range vectors used by the x***() functions, but that only applies to those functions (and is explicitly pointed out in the code as a hack that needs a proper solution). No breaking change.
I have no strong attachment to the name, I thought it was a good joke. I'd be perfectly fine with lame_rate() and lame_increase() as long as they did the right thing.
I feel like a broken record here, but the current rate() and particularly increase() functions cannot be fixed. As long as they are artificially limited to look only at points inside the range and attempt to extrapolate the data to the entire range, they are random number generators with the range and resolution of the underlying data as inputs. Also, when talking about Grafana (or common sense, for that matter) a rate()/increase() range query using the same value for the range and step will throw away data and replace it with estimates. In what world does that make sense? And the only 2 arguments I ever heard for that behavior are:
As for your suggestion to use idelta() and recorded rules (I'm guessing at the same resolution as scraped series), I'm sorry to break this to you, but you don't need idelta() for that. Or irate() or rate(). Pretty much all of Prometheus' functions can be reproduced with simple recorded rules, but (let me break another one) the whole point is to do it efficiently. Like I said, I can simply use |
I don't know enough about the Prometheus internals in order to comment on the particulars, but I do agree that having an "exact rate" function would be very useful feature to have. [Edit - I misunderstood] The default behavior of I understand that a maintainer has to say "no" every so often, but I'd love to see this straightened out and merged! |
To be completely fair, the "graphs change every time I refresh" problem should be fixed from the Grafana end, by optionally allowing you to align the start and end parameters of However, that won't change the fact that the rate() and increase() implementation introduces significant errors and (given Grafana's model of providing you with |
The Graphite issue is due to alignment if the sampling rate is higher than the evaluation rate, and what your merge request adds is the ability to compute "exact" rates without extrapolating, correct? We definitely need more docs, I'm confused :P |
I'm in the same boat as @leoluk as far as confusion/discussions go (with the difference that I still didn't grasp evaluation logic:)). |
Let's not use the term "exact". As Brian pointed out, it's still an approximation (evaluation is never exactly aligned with scraping, scraping is never precisely spaced etc.), except I would argue it's a more correct approximation because it doesn't do any extrapolation, indeed. Instead it only looks at actual data points and takes the difference between them as the increase and that divided by the distance between the data points as the rate. The current rate() implementation is way more complex, as it attempts to only look at datapoints that fall inside the specified range, but then it extrapolates them using various heuristics in order to smooth over the fact that it doesn't actually know what the increase over the requested range was (as in As long as your rate/interval range is a multiple of the underlying timeseries' resolution, you will get jitter with any implementation, as long as there is no alignment of the start and end times to multiples of the range, because the set of points that the rate/interval is computed over keeps shifting by one point. But with the proposed In order for your graph to be unchanging, just shifting to the left, you need to either align start and end or (with |
I have explained the constraints that we are under, I'm afraid that I can only work with ideas that are in line with those constraints. I also believe that the claimed benefits of this proposal are overstated. For example it is claimed that it fixes aliasing, yet in various other places there are mentions aliasing issues still arising (that's what that sawtooth is). This is unfortunately not the sort of problem has a simple solution. |
[ @beorn7, as the other half of the owners of this code, would you please be so kind and express an opinion? I understand you may not have the time, but you can definitely get the gist from the 80 lines of code in #3746 and Brian's comment from 3 days ago. If you have even more time, this one comment should cover my point, no need to go through the rest of the thread. ] OK, let's go back to the constraints you listed and go through them one by one. I'll leave the one you listed first for last, because I feel that is really the only point of contention.
Check.
Check. My aliasing (sawtooth) comment only applied to
Based on the multitude of issues raised both here and on Grafana, I would argue that any user who took a close look at the rate/increase values produced by Prometheus is way more confused than they would be by 3 rate functions. Or 10, for that matter. See the comment by @matejzero, above. Also, I am not insistent on whether it should be a new function or a replacement for the existing one. It's entirely up to the maintainers.
Check.
Can be if you insist, not a problem. Even though my preference is different and I don't really see why this is a constraint (other than that's how
Check. As mentioned above, my proposed change passes all existing PromQL tests without modification, meaning (among other things) the range vector semantics are the same. Or it could mean that none of the tests actually verifies vector semantics, which I would find hard to believe.
And we finally get to the one argument that actually merits a discussion. All I'm arguing for is that functions should be given one extra point before the beginning of the range of the interval. They may choose to use it or not. In effect, the signature of xrate should really be Looking at it from a different angle, the current implementation of rate cannot reliably produce a rate for anything shorter than 2x the resolution of the underlying data, precisely because of this one limitation. But
No, they are not. Let's go through each:
The code I'm currently proposing never produces fractions for increase() when the underlying counter is effectively an integer. I.e. you never ever get 1.5 requests over any time period.
There isn't.
There is none. What I understand by jitter is for spikes to suddenly appear or disappear because information that was ignored (the increase between points in 2 different intervals) is now suddenly taken into account. All data is always taken into account. It may move from one interval into the next and the corresponding increase will be subtracted from one and added to the other, but never appear and disappear. Just basic sampling. And it never happens with a range equal to 1x sampling resolution. Never ever. Not even when sampling failed a few times. It will with the current
Check. That was the whole point of the exercise.
Check. It won't produce a value when there are zero points in the range, but that could also be changed. You will get one value for every data point of the underlying.
Check.
See my comment above. That sawtooth I mentioned **only occurred for
But it does. That's the whole point. It's not a perfect solution (simply because there are more ways of looking at what should take priority, e.g. accurate vs. smooth increase) but simple it definitely is. Both in terms of concept -- you can cover all increases in an interval with a step equal to the range -- and implementation -- take the value difference between last and first, adjusted for counter resets, and divide by the time difference. The current |
We want functions that output results that make sense. In the real world you have me explaining everyday to my ~200 colleagues that have their eyes glued to the corporate Grafana and who expect big round integers in the |
What good do they do? If those "constraints" come into a disagreement with real-world usecases, you'd think it's time to reconsider the "constraints". |
FYI, I am actually maintaining a Prometheus fork at free/prometheus, that simply adds |
I found this topic hard to parse before but the examples above have convinced me. If the fork will continue to actively track mainline Prometheus, I plan to make the switch with our next infrastructure upgrade. Thanks for maintaining it! It would be quite sensible to see this additive enhancement in the mainline product, though. I urge Prometheus maintainers to reconsider this, especially as in its xrate/xincrease form it does not affect any existing non-exact measurement logic. |
Reading this thread is like if someone gave you an easel and paints and you're like "cool! now i can paint all the stuff" but sometimes you need to paint trees and there's no green, and so you're like "hey, could maybe we add some green, i made a green that works pretty good" and get told "no, green doesn't vibe with our rules about color that we made up please just mix yellow and blue or use a green canvas" So, yeah, we can do a bunch of stuff to make rates behave in a predictable, expected manner, but really prometheus should just have green. |
Just piling on to say that I'm switching over to the @free fork to address similar issues I've been encountering with historical data 'changing' between refreshes in grafana. I find it pretty disappointing that @brian-brazil and @beorn7 can't seem to bring themselves to accept that there is a legitimate problem that many users have, with an accepted fix that has to be maintained out-of-tree due to what appear to be solely philosophical issues. How many folks need to switch to using a fork before y'all accept that is is an actual problem that your users want solved? |
Signed-off-by: Sylvain Rabot <s.rabot@lectra.com>
Wait... I will have to install a fork to get reasonable rate of change computation? That's inconvenient and I probably will just put up with the current behavior... It makes good sense that you need N+1 samples to compute rates at N points spaced at the polling interval without losing information. While it's true that Prometheus is not about drawing graphs, it sounds like the current This design constraint about not including 1 more point outside the range should be reexamined, I think. What exactly does it protect? |
The main issue is that prometheus range only returns inner variables. We would need another notation to make this happen:
vs
but that feels not good |
@free 's patch proves that it does not have to. |
That is why that implementation is not correct. Because it its implementation, |
Understood, it still only does that in a context where it's needed. In any case, this has been an issue a lot of people have complained about for a long time now and we did not see anyone else proposing something acceptable to the maintainers standard, let alone implement it. So you say it's not correct, we say it works and does what we expect it to do. |
I do not think anyone proposed the inclusion in promql of a syntax |
You're right, but as you said |
And unless I'm mistaken, If we take the mathematics notation of intervals the current equivalent of promql So, in addition to looking weird, it's the exact opposite of the mathematical notation. |
I do not think the latter necessarily follows from the former. You could make a special case for |
I would argue it's not Intervals as currently defined, work perfectly for gauges: e.g. if I want the average memory usage over 5 minutes, then all relevant samples for that calculation are contained strictly within that 5 minute interval. But with counters (as pointed out in all Prometheus documentation, slides, blog posts and videos) the relevant information is not the absolute value, but the change in that value (i.e. I don't care how many seconds the CPU was in use since whenever the process started, I want to know how many seconds it was in use since the previous scrape; or over the previous 5 minutes). But because counters are (understandably) stored by Prometheus just the same as gauges (i.e. the absolute value is stored rather than the increase), then it follows that computation using counters needs to handle the samples differently from computation using gauges. So (to make a parallel to the average memory usage over 5 minutes example above) computing the average CPU usage over the previous 5 minutes requires all counter increases over the previous 5 minutes, not merely the counter's absolute values over those 5 minutes. And just as you can't possibly compute the last counter increase by looking at the last sample only, you can't compute the first counter increase in a 5 minute range by looking at the first sample in that 5 minute range only. So computing all counter increases over 5 minutes requires all samples in that 5 minute range plus the previous sample. Which I take to mean that an interval is currently defined consistently in terms of Go code (all samples within [I am aware that Prometheus does not internally keep track of which timeseries is a gauge and which is a counter. But fortunately Prometheus functions are clearly identified as "gauge functions" and "counter functions" (e.g. |
Signed-off-by: Sylvain Rabot <s.rabot@lectra.com>
Signed-off-by: Sylvain Rabot <s.rabot@lectra.com>
Signed-off-by: Sylvain Rabot <s.rabot@lectra.com>
Signed-off-by: Sylvain Rabot <s.rabot@lectra.com>
Signed-off-by: Sylvain Rabot <s.rabot@lectra.com>
The problem
[Feel free to skip ahead to the proposed solution, as I'm guessing the problem is widely understood. Or just read the last couple of paragraphs for a real-life example of the problem and my current workaround.]
I've been trying to deal with the errors introduced by the
rate()
andincrease()
functions for so long that I keep going back every few months and spend a couple of hours tweaking Grafana dashboards before I remember that there is literally no way of getting an accurate result out of them short of using recorded rules for everything and, even in that case, working around and against Prometheus.To clarify, let's assume I have a monitoring configuration with scrapes every 5 seconds and rule evaluation every 10 seconds. It collects among other things a
request_count
counter, that increases by 1 every 20 seconds. Let's also assume that I would really like a Grafana graph that displays an accurate number of requests over time, on a dashboard with an adjustable time range. (I.e. I want to be able to look at 15 minutes, 1 hour and 7 days on the same dashboard, with Grafana automatically adjusting the resolution.) Not a tall order, right?For the purpose of clarity, here is how the
request_count
samples would look like (at a 5 second resolution):The naive approach is you add a graph and put in an expression along the lines of:
and set Min step to
10s
(I would prefer a Min step of5s
, since the resolution of the underlying data is 5 seconds, but let's not even go there yet). Ideally the graphed values (with a 10 second resolution) would look something like:What I'm getting instead, depending on the exact time I reload the page is one of:
or
This is because what
increase()
ends up doing is it calculates the increase between disjoint pairs of values (i.e.(v0, v1)
,(v2, v3)
,(v4, v5)
), losing half of the data in the process, and extrapolates it to 10 seconds, ending up with double the actual increase. At 15 seconds resolution it only throws away a third of the data and the increases it doesn't throw away get multiplied by 1.5 due to the extrapolation. As the resolution decreases, less data gets thrown away and the multiplier gets closer to 1 but there is, as I said before, literally no way to get an accuraterate()
/increase()
value in a range query directly from a counter.rate()
is not affected by the multiplier issue, but still suffers from thrown away data and noise.Even with less discrete data, such as CPU usage, the graph jumps every 5 seconds between one shape and another. In particular, if I'm looking at Prometheus' own CPU usage, because my rule evaluation happens every 10 seconds but takes less than 5 seconds, the graph keeps jumping between an almost flat 10% (when the pairs picked don't include rule evaluation) and an almost flat 20% (when the pairs do include rule evaluation).
The workaround I've come up with is to compute a recorded rule
request_count_increase10s
that I can thensum_over_time()
, which is actually computed asso all increases in
request_count
actually get included in the calculation (because every 15 second range includes 3 data points, i.e. 2 intervals). But even with this cumbersome solution, I still have to fight against the implementation (the/ 1.5
part) and the extrapolation will introduce unnecessary noise whenever the underlying timeseries neither increases constantly nor looks like a staircase.Proposed solution
In an ideal (but by no means unattainable) world, what would happen instead of extrapolation is ... wait for it ... interpolation. I.e. just before the matrix for the underlying samples is evaluated in
promql.functions.extrapolatedRate()
, the range could be expanded so it included one point before the start and one point after the end of the interval. This could be either:MatrixSelector
/Engine
/SeriesIterator
(e.g. as aSeekBefore(t)
orPrevious()
function).With the points immediately outside the interval now available there are 2 possible approaches:
The latter has the advantage that there are no weird fractional values where there shouldn't be any, e.g. if you're computing the increase in the number of requests over an interval.
I am willing to put in the legwork if anyone thinks it makes sense. To me it makes a lot of sense to use actual data instead of fabricated (extrapolated) data when computing rates/increases (which, I don't have to point out but I will, is the whole reason for the existence of counters). But it's possible it's only me.
Additional thoughts
If you're not tired by now, there are a couple more things I should add.
For one, I am aware that one could hardcode the
increase
range and adjust the result accordingly into the Grafana dashboard, but over a high enough range this will result in even worse data sampling, as you'd now be seeingincrease(foo[15s])
with a 30 minute resolution. Sometimes this is acceptable, most of the time it's not.Second, there is another workaround that only works for
increase(foo[10s])
, namely usingfoo - foo offset 10s
. But there are a couple of problems with this approach: it doesn't handle counter resets and it's twice as slow because now Prometheus looks upfoo
twice. (Or, for whatever reason, it's twice as slow asincrease()
, even with a 5 minute range over samples with 5 second resolution.)There is yet another Grafana workaround, which is to define a min step (graph resolution) of e.g.
5s
and an interval template variable with a minimum value of10s
and try to guesstimate theStep count
for the template variable so that it stays at about 2x the graph resolution. But then you can't even apply the 1.5 or 2 factor, because it varies with resolution, not to mention the whole thing is screen resolution dependent.The text was updated successfully, but these errors were encountered: