-
Notifications
You must be signed in to change notification settings - Fork 11.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Align queries to prometheus with the step #10434
Align queries to prometheus with the step #10434
Conversation
…ssions get consistent results
Huh, ok, I've got some tests to fix :) |
… to truly work as desired
Updated to fix the CI fails, which I'm confident will pass now (did in my dev env). |
Codecov Report
@@ Coverage Diff @@
## master #10434 +/- ##
=======================================
Coverage 49.79% 49.79%
=======================================
Files 312 312
Lines 22096 22096
Branches 1125 1125
=======================================
Hits 11003 11003
Misses 10452 10452
Partials 641 641 |
Hm.. maybe this should be done globally for all data sources, in https://github.com/grafana/grafana/blob/master/public/app/core/utils/kbn.ts#L160 |
Seems like a reasonable concept. Are you suggesting that calculateInterval should adjust range.from and range.to? |
no, just align interval to be a even multiple of min interval |
I'm afraid I don't understand how that would help. The problem that my patch fixes is not the interval size, it's whether the from/to are integer multiples of the interval. |
My 2c (feel free to ignore): I believe there might be value in making start and end alignment default, but optional. I fully agree with the fact that rate/increase graphs jumping all over the place on refresh is a problem. (I'm actually trying to fix the other end of this, with little to no success: prometheus/prometheus#3746.) That being said, I can think of a couple of reasons why one might not want aligned data. For one, the most recent value will always reflect a partial result. E.g. if you have a bar graph with 1 hour resolution (to take an extreme example), the last bar will always start at zero and start filling up as the hour goes by. With a line graph (assuming an otherwise constant rate/increase) the line will be horizontal except for the last point, where it will go down, basically reflecting where in the middle of Second (and even more speculative), clamping the start and end points to a multiple of Like I said, feel free to ignore though. It may not be worth the added UI clutter and code complexity. |
Oh, I just realized that I wrote all that comment on the assumption that Prometheus uses my proposed rate/increase implementation. With the current implementation (which always throws away the increase between adjoining intervals, iff you're requesting Not Grafana's or this PR's fault, but should probably be taken into consideration as it's more serious than either of my rather philosophical points from the previous comment. |
I'm cleaning up that merge (I introduced a variable conflict), and a test in |
I was just reviewing this. Think this can be merged soon but would like to clean up the code a bit. the clampRange function is called twice in the query function which looks a bit clunky. Maybe this can be done in the createQuery function and start/stop added to the query object. This makes it possible to reuse it in the response handling code. Also a not a big fan of how the transformerOptions object was changed by removing the label names. Think that makes it harder to read. https://github.com/grafana/grafana/pull/10434/files#diff-a59431ca1f1f94cdb3ae176c50a585b2L152 |
I think this needs some reeebasing. |
I would like to point out again that while this will indeed produce consistent results when used with Because of Prometheus' buggy implementation of Please consider making this optional, as it's not a solution to (Prometheus', not Grafana's) problem, but merely a workaround for the annoyance of graphs jumping around. |
@free I'm curious where this would hide anything. Could you point me to the relevant lines in promql/engine.go or promql/engine_test.go with a concrete example? In general, we're committed to showing what's expected when a user wants to see the last n minutes of data of a time series. As a visual frontend we care less about the data that's there, but rather it's ease of interpretation in the most frequent use case, even if it means modifying date ranges to accommodate p8s' implementation "quirks". Wouldn't you be able to see the non-clamped data in original prometheus UI? |
There are 2 Prometheus issues -- prometheus/prometheus#3806 and prometheus/prometheus#3746 -- and a Prometheus Developers mailing list thread -- https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!topic/prometheus-developers/B_CMEp40PHE -- on the subject. Unfortunately the Prometheus developers don't agree that it's an important issue to solve, so it doesn't look like it will get fixed anytime soon. Essentially the problem is that Prometheus only looks at the points falling in the specified range when computing a And yes, you would be able to see the data in the Prometheus UI, but most people won't bother (or won't be aware that they can -- e.g. I did not consider that option and I've worked with Prometheus and Grafana quite a bit over the past year). That's why I would (personally) prefer if this was an option, but I fully understand that not everyone has the same priorities, so I'm merely asking nicely. |
const startJitter = start % step; | ||
const endJitter = end % step; | ||
// Shift interval forward on jitter | ||
if (startJitter || endJitter) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks unnecessary complex.
What should happen (in my opinion) is for end
to be rounded up to a multiple of step
(so the range always includes end
, which is most often the wall time) and start
to be a fixed number of steps away from end
, so that if end - start
is not a multiple of steps you don't end up flipping between N and N+1 data points on the graph. I.e.
const clampedEnd = Math.ceil(end / step) * step; // Round up
const clampedRange = Math.ceil((end - start) / step) * step; // Also round up the range length
return {
end: clampedEnd;
start: clampedEnd - clampedRange;
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow. Given step = 3
, start = 1
and end = 5
, prometheus returns 2 datapoints:
http://localhost:9090/api/v1/query_range?query=1&start=1&end=5&step=3
My code clamps to:
start = 3
end = 6
returns 2 datapoints
Yours clamps to
start = 0
end = 6
returns 3 datapoints
Could you explain the benefits of your approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, if you want to mirror Prometheus in the number of points, you can always round the range down rather than up (because that's essentially what Prometheus does). I was instead going for covering the whole requested range (focusing particularly on the end of the range so you don't withhold that information until the a whole step
has passed).
The deeper problem I noticed was that your code will indeed return 2 data points with step = 3
, start = 1
and end = 5
; but one second later -- when start = 2
and end = 6
-- it will return 3 data points: 3
, 6
and 9
, with the former leaving out the data at 2
, which was "requested" and the latter unlikely to ever have any data (assuming the wall time is 6
).
@@ -146,8 +145,7 @@ export class PrometheusDatasource { | |||
|
|||
var allQueryPromise = _.map(queries, query => { | |||
if (!query.instant) { | |||
let range = this.clampRange(start, end, query.step); | |||
return this.performTimeSeriesQuery(query, range.start, range.end); | |||
return this.performTimeSeriesQuery(query, query.start, query.end); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point you may simply pass query
only as parameter and retrieve start
and end
as query.start
and query.end
. Then you also won't need to create the data
object in performTimeSeriesQuery()
, simply pass through the query
object as it already has all the fields it needs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I had that in an earlier version. I quite like to make that argument dependency explicit in the func signature.
I think this is complete now. I added some tests for We had a discussion about making this optional and decided to hold off and get community feedback. Hopefully there will be enough until the next release is due. I'm removing myself from reviewing since I now added some code. Lastly, to test the effect, I recommend using a series like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel it necessary to point out that even though I'm suggesting improvements, I still don't like this change. :o)
@@ -25,7 +25,8 @@ | |||
placeholder="{{ctrl.panelCtrl.interval}}" data-min-length=0 data-items=100 ng-model-onblur ng-change="ctrl.refreshMetricData()" | |||
/> | |||
<info-popover mode="right-absolute"> | |||
Leave blank for auto handling based on time range and panel width | |||
Leave blank for auto handling based on time range and panel width. Note that the actual dates used in the query might be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/might be adjusted to fit/will be adjusted to match/
Or better yet, "will be adjusted to a multiple of". It's clearer both on how the adjustment is made and on the fact that it's not optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Please rebase onto master before merging so all CI steps can pass.
* only increase interval by step if jitter happened * shift both start and end * simplified tests by using low epoch numbers
…eries * origin/master: (21 commits) docs: removes notes about beeing introduced in 5.0 lock caniuse-db version to resolve phantomjs rendering issue Update dashboard_permissions.md move database-specific code into dialects (#11884) refactor: tracing service refactoring (#11907) fix typo in getLdapAttrN (#11898) docs: update installation instructions targeting v5.1.2 stable changelog: add notes about closing #11862, #11656 Fix dependencies on Node v10 Update dashboard.md changelog: add notes about closing #10338 Phantom render.js is incorrectly retrieving number of active panels (#11100) singlestat: render time of last point based on dashboard timezone (#11425) Fix for #10078: symbol "&" is not escaped (#10137) Add alpha color channel support for graph bars (#10956) interpolate 'field' again in Elasticsearch terms queries (#10026) Templating : return __empty__ value when all value return nothing to prevent elasticsearch syntaxe error (#9701) http_server: All files in public/build have now a huge max-age (#11536) fix: ldap unit test decrease length of auth_id column in user_auth table ...
This should be probable mentioned in the changelog |
@davkal can you add a note and link to this pr of this change to our changelog? |
Done. |
Aligns the start/end of the query sent to prometheus with the step, which ensures PromQL expressions with 'rate' functions get consistent results, and thus avoid graphs jumping around on reload.
Related to some of the later issues discussed in #9705, and in repeatedly in various other places
Works best combined with using $__interval as the rate interval, to avoid sub-sampling (step > sample interval), but has merit in itself. The two things are the full fix for the 'my rate-based graphs are inconsistent and change at every reload' problem.