Sub query support for range selection. #1227

fabxc · 2015-11-18T16:54:09Z

Currently one can only do range selection on vector selectors and has to use recording rules to range-select expressions.

brian-brazil · 2015-11-18T17:00:23Z

An interesting question is how we choose the interval for this, currently it's not needed as everything in calculated at a point. We should also look at how to discourage/prevent use of this in rules/alerts on performance grounds.

grobie · 2015-12-24T09:07:56Z

👍

Andor · 2016-04-29T12:02:57Z

Currently we can run query like rate(hit[1m]) + rate(miss[1m]), which is not very convenient.
IMHO better syntax is something like this: rate(hit+miss [1m]) or rate((hit+miss) [1m])

brian-brazil · 2016-04-29T12:09:16Z

That's incorrect usage, you need to do the rate separately first or you'll get the wrong answer around counter resets. I'd also suggest that the exporter do either total&hit or total&miss, as that's a bit easier to work with.

guoshimin · 2016-06-03T17:14:21Z

Being able to run range queries on arbitrary expressions is valuable for ad-hoc explorations, either for sanity checking before making recorded rules, or for analyzing past data where you didn't already have recorded rules.

roman-vynar · 2016-11-24T21:19:19Z

Will leave one of the references with discussion on this topic https://groups.google.com/forum/#!topic/prometheus-users/JOVfqQsVRl8

virusdave · 2017-04-06T16:10:36Z

Is there any ticket tracking this feature request (historical function evaluation)?

brian-brazil · 2017-04-06T16:16:21Z

This is the issue for it. I wouldn't be too hopeful of it ever getting implemented.

virusdave · 2017-04-06T16:19:03Z

OK. Is that due to a philosophical issue, or a eng-resource-scarcity problem?

brian-brazil · 2017-04-06T16:25:53Z

It's a semantic issue, and an operational one. It's unclear how to do this correctly, and even then its use would have to be greatly limited for safety.

pparth · 2017-09-26T10:29:26Z

I do not understand why this is not considered important at all. The requirement is simple: we want to generate a range-vector from a function that returns an instant-vector. For example i want to execute something like:

sum_over_time(((sum(rate(server_request_hit{service_name=~"[[serviceName]]",http_response_status=~"5.."}[1m]))) / (sum(rate(server_request_hit{service_name=~"[[serviceName]]"}[1m]))) * vector(100)) > bool vector(0.1))

The internal query returns a time series which has a Y value of 1 when the underlying value is > 0.1 and a 0 when the underlying value is <= 0.1. This is how i calculate failure incidents. Then, i just want to know how many seconds these incidents lasted in the selected time period. To do that, i need to execute the sum_over_time function which is not applicable as the internal query is not considered a range vector.

In fact, Netflix Atlas allows this kind of operations without any issues.

brian-brazil · 2017-09-26T13:28:49Z

That would be an inappropriate usage of this feature, what you want is a recording rule.

pparth · 2017-09-26T14:14:36Z

How is this supported, is through a recording rule. But not what i want. I cannot have a recording rule for each step of my query where an instance vector is generated. And this is just for a single query. Also, it is too difficult for everyone who want to create a dashboard to have to create a recording rule. It is not that easy. Adding recording rules is an administration process now and it is definitely not suitable for people who just want to create more elaborate queries for their dashboards.

jeeyoungk · 2018-05-02T02:45:07Z

What would it take to implement this? I'm willing to jump through the hoops to make this happen. this makes a lot of x_over_time() functions useless when you have a lot of counters, as you want almost always want to apply rate() first.

brian-brazil · 2018-05-02T04:03:32Z

See https://www.robustperception.io/composing-range-vector-functions-in-promql/

Most of the over_time functions make no sense with counters.

juliusv · 2018-07-20T10:42:27Z

Yeah, the x_over_time() use cases come up often, and obviously it's not nice to have to use recording rules as an intermediary step every time during ad-hoc exploration.

I'm also undecided on adding this feature, but let's say we would want it, how would we determine the sub-query evaluation interval? I think we could go two routes here:

Introduce a new range syntax like [<range>:<resolution>], so rate(foo[1m])[10m:15s] would mean: evaluate rate(foo[1m]) for from now to 10 minutes back in time, at a 15s resolution.
Introduce a new function like subquery(<expression>, <resolution>), so subquery(rate(foo[1m]), 15) would do the same and return a range vector.

brian-brazil · 2018-07-20T17:49:11Z

I was pondering the 2nd, if we ever add this. At the very least we'd want more protections against abusive queries first.

roidelapluie · 2018-08-19T12:49:42Z

In https://docs.google.com/document/d/1-C5PycocOZEVIPrmM1hn8fBelShqtqiAmFptoG4yK70/edit# the consensus is to adopt option 1.

Some notes:

I think it is fine to default the subquery evaluation to the global evaluation, but I would rather still keep the ":" separator:

rate(sum(foo)[5m:]) rather than rate(sum(foo)[5m]); for 2 reasons:

the : would mean that you know what you are doing; most of the cases should not require subqueries.
rate(foo[5m]) and rate(foo[5m:1m]) would be different; as the last one would have a chance "hit" the boundaries.

Also subqueries would solve #3806 if we can do: rate(min without() foo)[5m:1m]

codesome · 2018-09-10T17:43:01Z

@brian-brazil I would like to work on this if we are keen on adding it now.

brian-brazil · 2018-09-11T08:25:50Z

Sure, one thing is that the generated data will need to be aligned independently of the query parameters.

sumkincpp · 2018-09-25T19:24:22Z

This feature is really important for being able to work with empty values.

For example, this one is adjusted exporter availability :

avg_over_time(((absent(up)-1) or up)[365d])*100

henridf · 2018-09-26T19:56:32Z

@juliusv catching up and trying to understand option 1. So for the use case from the above-linked discussion (https://groups.google.com/d/msg/prometheus-users/JOVfqQsVRl8/PLA0hGAMCgAJ), would the query in option 1 syntax be this? :

min_over_time( rate(mysql_global_status_questions[2s])[60s])

And returning to your example, is it correct that rate(foo[1m])[10m:15s] would give the same result as a request to the query_range endpoint with the following params? :

query='rate(foo[1m])`
start=<now - 10m>
end=< now >
step=15s

brian-brazil · 2018-09-26T20:28:08Z

Almost, to preserve the invariant that query_range is syntactic sugar on top of query the start time should be aligned with the step.

henridf · 2018-09-26T20:30:59Z

so in theory at least, this could eventually remove the need for query_range?
(not sure about others but I was confused by the existence and role of query_range when first discovering the http api)

brian-brazil · 2018-09-26T21:43:49Z

No, I'd expect instant queries to still not return range vector and subqueries don't support arbitrary start points.

henridf · 2018-09-27T07:47:52Z

Instant queries can already return range vectors...

free · 2018-12-18T13:35:31Z

I see one issue with the proposed implementation: in the same way that I think of rate/increase currently being broken (they don't consider the sample just before the range, see #3806), subqueries will now break the *_over_time functions (and changes, resets).

Assuming a subquery range equal to the step, a max_over_time subquery calculation will now include the value of the last sample before the range into the calculation. So the value of the last sample before each step will be included in the calculation of that step and the next.

Put another way, rather than "fixing" rate and increase, this proposed change will merely make all functions that take range vector parameters behave differently (and produce different results) in a query vs. in a subquery.

The only way I can see for fixing this inconsistency would be to drop/ignore the (calculated) sample at the start of each range, unless there happens to be an actual sample with that exact timestamp (which is what ranges do currently). I for one would rather live with the inconsistency than forever give up hope of a decent rate implementation, but I still think it is worth bringing up the issue.

brian-brazil · 2018-12-18T13:47:51Z

If that were the case it'd be a bug, but looking at the PR and unittests I'm not seeing that issue. Can you comment where you're seeing that over on the PR?

free · 2018-12-18T15:58:52Z

I'm not seeing it in the PR because the series used in the tests all have samples at 10 second multiples and the resolutions are also a 10 second multiple. But if you used samples that were not aligned with the resolution and the increase between samples was no longer constant, you'd get a different result from rate[1m] and rate[1m:10s]` (assuming the series started before the 1m10s range and ended after it).

E.g. something like this (back of the envelope calculation, not actually tested against the PR):

# Fibonacci sequence, to ensure the rate is not constant.
load 7s
  metric 1 1 2 3 5 8 13 21 34 55 89 144 233 377

# Extrapolated from [3@21, 144@77]: (144 - 3) / (77 - 21)
eval instant at 80s rate(metric[1m])
  {} 2.517857143

# No extrapolation, [2@20, 144@80]: (144 - 2) / 60
eval instant at 80s rate(metric[1m:10s])
  {} 2.366666667

# Only one value between 10s and 20s, 2@14
eval instant at 20s min_over_time(metric[10s])
  {} 2

# min(1@10, 2@20)
eval instant at 20s min_over_time(metric[10s:10s])
  {} 1

brian-brazil · 2018-12-18T17:35:21Z

That tests passes. That's what I'd expect though semantically, and the subquery examples here aren't sane.

min_over_time(rate(metric[5m])[1h:1m]) is more what I'd expect to see.

brian-brazil · 2018-12-18T17:38:05Z

@codesome Can you add the above tests, with just a comment indicating that using subqueries unnecessarily is unwise?

free · 2018-12-19T19:12:17Z

That's what I'd expect though semantically, and the subquery examples here aren't sane.

I'm not sure whether I should feel offended or not. I'll go with the latter and ask you to please elaborate on that statement.

Yes, the ranges are rather short and cover only a couple of samples, but that's not my point. (Had I gone for longer ranges, with more samples I would have ran out of bits for the Fibonacci numbers.)

My point was that all functions that take range vectors will include one extra sample when used in a subquery (as opposed to when used in a regular query). Which was what I was arguing for in #3806 (for the rate and increase functions) and got shot down because it would introduce inconsistencies in the definition of a range.

This is most obvious when using min_over_time (rate is very much an approximation anyway, so no one would expect the values to match). min_over_time will consider this extra point from before the range, which may affect the outcome, in that some values will be included in the calculation of N+1 samples (as opposed to N samples with a plain query), where N is the ratio of range to resolution. This may be considered negligible in your example (1 hour range with 1 minute resolution) but the closer the range and resolution get to each other, the higher the likelihood of error.

brian-brazil · 2018-12-19T19:21:39Z

Yes, the ranges are rather short and cover only a couple of samples, but that's not my point.

That's not what I mean (smaller data makes more sense for unittests like this anyway), I meant that the actual queries themselves aren't sane. Subqueries shouldn't be used on instant vector selectors, because you can use a range vector selector instead which is both more efficient, and doesn't lose you timestamps.

My point was that all functions that take range vectors will include one extra sample when used in a subquery

That's the same as if you were to use a recording rule via https://www.robustperception.io/composing-range-vector-functions-in-promql. Given that subqueries are to cover that sort of use case, I don't see a big issue here.

free · 2018-12-19T21:31:26Z

OK, I finally see your point about this being the same as using a range function on top of a recording rule, but no, I don't think it's accurate.

It would be accurate if (1) the query step/subquery resolution either matched or was a multiple of the eval interval (which would be a reasonable expectation) and (2) recording rule samples happened to also be aligned with the eval interval (their timestamps are exactly eval interval apart, but not usually aligned with it). Then the samples on the step/resolution boundaries would be included both in the range ending at said boundary and the range beginning there. Something like this:

eval_step (10s)           |---------|---------|---------|---------|
recording rule samples    3         2         1         4         5
range 1 (20s)            [3         2         1]
range 2 (20s)                      [2         1         4]
range 3 (20s)                                [1         4         5]

This is exactly what a subquery now produces.

What happens with the recording rule + range vector function in practice though, is that condition (2) doesn't hold (i.e. recording rule samples are not aligned with the eval step) so the resulting ranges look like this:

eval_step (10s)           |---------|---------|---------|---------|
recording rule samples       3         2         1         4         5
range 1 (20s)            [   3         2       ]
range 2 (20s)                      [   2         1       ]
range 3 (20s)                                [   1         4       ]

So if your range function was min_over_time, you'd get a result of [1, 1, 1] in the subquery and idealized recording rule + range vector function case, versus [2, 1, 1] in the real-world recording rule + range vector function case.

Furthermore, and apart from whether this does or doesn't exactly match the recording rule + range vector function outputs, I believe that it would make more sense (to the average user) to try and emulate the behavior of regular queries vs. emulating the behavior of the existing (and soon to be forgotten) workaround to the lack of subqueries.

As to your observation about the queries not being sane because they use a subquery on an instant vector as opposed to some range vector function, you are technically right, but I was merely trying to come up with an obvious way of comparing queries to subqueries. Ideally I should have written a recording rule on top of the raw data and a range query on top of that and compared it with the equivalent subquery expression, but (1) it would have been overkill for an example and (2) you can't do that with the current query test infrastructure anyway.

And now that I look at the comparison above, it suggests a different solution to achieving parity between queries and subqueries: have the subquery results explicitly NOT aligned with the subquery resolution. That would ensure that every range of length N x resolution will contain exactly N, not N+1 samples. And decisively crush my hopes for a better rate implementation. (o:

codesome · 2018-12-19T22:14:46Z

@free I tried the tests that you gave above, and they pass.

@brian-brazil I'll add min_over_time(rate(metric[5m])[1h:1m]) for the fibonacci sequence. Should I also add the tests by @free?

brian-brazil · 2018-12-19T22:17:14Z

Yes, please do. The rest I'll need to think about in the morning.

codesome · 2018-12-19T22:18:38Z

The time ranges in min_over_time(rate(metric[5m])[1h:1m]) for fibonacci doesn't look like a good idea, I guess I will change it.

brian-brazil · 2018-12-20T10:41:53Z

I finally see your point about this being the same as using a range function on top of a recording rule, but no, I don't think it's accurate.

More accurately it is a thing that can happen with things currently, if the source data happens to be aligned.

emulating the behavior of the existing (and soon to be forgotten) workaround to the lack of subqueries.

It's not going away, it'd be unwise to use subqueries in a recording rule on performance grounds.

you can't do that with the current query test infrastructure anyway.

The new promtool rule test feature has that, we've yet to switch the promql unittests over to it.

And now that I look at the comparison above, it suggests a different solution to achieving parity between queries and subqueries: have the subquery results explicitly NOT aligned with the subquery resolution.

That is an option, but it feels kinda weird. I think the real issue here is that ranges are closed intervals, rather than half-open intervals. I have previously spotted this as a bit odd, but it hasn't caused any real problems thus far. A user could do a slightly shorter range if they wanted to avoid this.

free · 2018-12-21T09:08:29Z

More accurately it is a thing that can happen with things currently, if the source data happens to be aligned.

That has a 1/5000 probability of happening in my deployment (5s scrape interval), even lower for the average Prometheus deployment. I wouldn't start from this as a baseline when deciding how subqueries should or should not behave.

It's not going away, it'd be unwise to use subqueries in a recording rule on performance grounds.

I haven't actually tested the code, but I am confident the performance overhead is minimal. Most of the CPU load and memory usage will likely go into decoding and buffering the original timeseries plus functions and aggregations, not into sampling outputs (i.e. subqueries). There is going to be some overhead, but I for one will definitely replace all my rate queries with subqueries (assuming subquery implementation remains as is), simply because the results will be so much more accurate.

I think the real issue here is that ranges are closed intervals, rather than half-open intervals. I have previously spotted this as a bit odd, but it hasn't caused any real problems thus far.

I fully agree with your assessment here, but I'll point out again that this hasn't been an issue thus far because both scraped series and recording rule outputs have essentially zero chance of aligning exactly with API query steps. With subqueries, you're going from 0% to 100% in one fell swoop.

(Now that I think of it, there exists one situation where there's a 100% chance of queries using one another's output to have their samples align perfectly, and that is when you have 2 recording rules within the same rule group. But in that instance no one's going to take, say a rate at 10s resolution and then immediately do a min_over_time over its output at 30s resolution (or any similar combination I can think of), so it simply hasn't come up thus far.)

A user could do a slightly shorter range if they wanted to avoid this.

Yes, if said user was savvy enough. And happened to issue said query from a script (or PromQL supported range arithmetic, as in rate(foo[5m-1s])). But most users are going to run those queries from Grafana, so they would have to be hardcoded (e.g. rate(foo[299s])), meaning the dashboard would only be useful on a fixed or highly restrictive time range (i.e. neither a 15 minute, nor a 1 month range are very useful at 5 minute resolution). Whereas I'm sure many dashboards are more or less fully dynamic (by making use of Grafana's $__interval variable) allowing you to examine the underlying data over any time range you may need to.

brian-brazil · 2018-12-21T11:19:51Z

I haven't actually tested the code, but I am confident the performance overhead is minimal.

I didn't mean for your queries, but for things like min_over_time(rate(metric[5m])[1h:]), where you don't want to redo the rate 12x on every eval cycle.

I fully agree with your assessment here

We have the option to tweak this in 3.0, only breakage it's likely to cause is in promql unittests as in general you should be writing your expressions to be robust in the face of jitter.

But in that instance no one's going to take, say a rate at 10s resolution and then immediately do a min_over_time over its output at 30s resolution (or any similar combination I can think of), so it simply hasn't come up thus far.

That's a standard subquery-equivalent use case, and we've had no complaints around it that I know of. The only one I know of in this general area is one user confused by the new PromQL unittesting stuff.

free · 2018-12-21T13:37:31Z

We have the option to tweak this in 3.0, only breakage it's likely to cause is in promql unittests as in general you should be writing your expressions to be robust in the face of jitter.

To fully clarify my position on closed vs. half-open intervals: I agree that ranges should be half-open, so you can reliably do the recording rule equivalent of min_over_time[1m:1m] and get clean downsampling (i.e. not have to worry about a sample being included into 2 successive ranges). This works perfectly with gauges.

When it comes to counters though, I would argue (and I think there should be no disagreement here) that the relevant information in a counter is the (reset-adjusted) increase between successive samples, not the absolute value of the counter (you've written a few blog posts on the topic yourself). Meaning that, theoretically speaking, a range over a counter would retain all increases recorded within that half-open interval, namely what is included now plus the increase corresponding to the first sample in the range (i.e. the adjusted increase between the last sample right at or before range start, assuming it's not stale, and the first sample strictly within the range).

So while in theory a range over a gauge would be the same as a range over a counter (all values in the half-open interval vs. all increases in the half-open interval), in practice (unless you're going to change the way counters are stored) they would be slightly different (a counter range would include one extra sample to the left).

(I realize Prometheus doesn't keep track of which metrics are counters and which are gauges, but that information may be inferred from the function that the range vector is being used in.)

This kind of change in 3.0 would essentially fix all issues with both query vs. subquery behavior and (my favorite topic, sorry) rate extrapolation. You'd still need to make sure your expressions are "robust in the face of jitter", as you put it, but even the requirements for robustness would be reduced (because you would only need to ensure the existence of at least 1 sample within the range, not 2).

And I now realize that this kind of differentiation between counters and gauges would further improve the correctness of subqueries: currently, if subquery resolution is lower than the original resolution (or 2 samples simply happen to fall within the same step) there's a good chance that a counter reset will be undervalued (or possibly even missed). If counter ranges and gauge ranges would be handled differently (and the evaluator was made aware of which kind of range it was dealing with) it could correctly downsample counters (or expressions expected to return counter-like values), just as it does gauges.

roidelapluie · 2018-12-21T14:23:00Z

deriv and rate for counters/increasing gauges must always produce the same result. There is no other way.

brian-brazil · 2018-12-21T14:36:26Z

It's incorrect to use deriv on a counter, it only works correctly on gauges. Similarly rate and gauges.

…

On Fri 21 Dec 2018, 14:23 Julien Pivotto ***@***.*** wrote: devir and rate for counters must always produce the same result. There is no other way. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1227 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGyTdqes7QGH9aCFxpyW30W_FRsh5geRks5u7O7RgaJpZM4Gk2cx> .

roidelapluie · 2018-12-21T17:27:32Z

Yes but on an increasing vector I would expect rate and deriv to give the same result.

beorn7 · 2018-12-22T23:42:33Z

Yes but on an increasing vector I would expect rate and deriv to give the same result.

deriv uses regression analysis, while rate only looks at the first and last sample in the range (modulo counter resets).

brian-brazil added the feature-request label Dec 16, 2015

fabxc added kind/enhancement and removed feature request labels Apr 28, 2016

beorn7 mentioned this issue Sep 1, 2016

get range vector from a function return. #1939

Closed

brian-brazil added priority/Pmaybe not-as-easy-as-it-looks component/promql labels Jul 14, 2017

roidelapluie mentioned this issue Aug 19, 2018

Proposal for improving rate/increase #3806

Open

simonpasquier mentioned this issue Oct 22, 2018

How to get maximum of a sum over a period of time prometheus/docs#1194

Closed

codesome mentioned this issue Nov 6, 2018

Subquery #4831

Merged

5 tasks

shakuzen mentioned this issue Nov 9, 2018

Column per label in time series aggregated table panel grafana/grafana#13782

Closed

douardda mentioned this issue Dec 11, 2018

better PromQL syntax documentation #3697

Closed

brian-brazil closed this as completed in #4831 Dec 22, 2018

free mentioned this issue Feb 2, 2019

Prometheus query: query step is bound by min interval grafana/grafana#14209

Closed

lock bot locked and limited conversation to collaborators Jun 20, 2019

Sub query support for range selection. #1227

Sub query support for range selection. #1227

Comments

fabxc commented Nov 18, 2015

brian-brazil commented Nov 18, 2015

grobie commented Dec 24, 2015

Andor commented Apr 29, 2016

brian-brazil commented Apr 29, 2016

guoshimin commented Jun 3, 2016

roman-vynar commented Nov 24, 2016

virusdave commented Apr 6, 2017

brian-brazil commented Apr 6, 2017

virusdave commented Apr 6, 2017

brian-brazil commented Apr 6, 2017

pparth commented Sep 26, 2017 • edited Loading

brian-brazil commented Sep 26, 2017

pparth commented Sep 26, 2017 • edited Loading

jeeyoungk commented May 2, 2018

brian-brazil commented May 2, 2018

juliusv commented Jul 20, 2018 • edited Loading

brian-brazil commented Jul 20, 2018

roidelapluie commented Aug 19, 2018

codesome commented Sep 10, 2018

brian-brazil commented Sep 11, 2018

sumkincpp commented Sep 25, 2018

henridf commented Sep 26, 2018 • edited Loading

brian-brazil commented Sep 26, 2018

henridf commented Sep 26, 2018

brian-brazil commented Sep 26, 2018

henridf commented Sep 27, 2018

free commented Dec 18, 2018

brian-brazil commented Dec 18, 2018

free commented Dec 18, 2018

brian-brazil commented Dec 18, 2018 • edited Loading

brian-brazil commented Dec 18, 2018

free commented Dec 19, 2018

brian-brazil commented Dec 19, 2018

free commented Dec 19, 2018

codesome commented Dec 19, 2018

brian-brazil commented Dec 19, 2018

codesome commented Dec 19, 2018

brian-brazil commented Dec 20, 2018

free commented Dec 21, 2018

brian-brazil commented Dec 21, 2018

free commented Dec 21, 2018 • edited Loading

roidelapluie commented Dec 21, 2018 • edited Loading

brian-brazil commented Dec 21, 2018 via email

roidelapluie commented Dec 21, 2018

beorn7 commented Dec 22, 2018

pparth commented Sep 26, 2017 •

edited

Loading

pparth commented Sep 26, 2017 •

edited

Loading

juliusv commented Jul 20, 2018 •

edited

Loading

henridf commented Sep 26, 2018 •

edited

Loading

brian-brazil commented Dec 18, 2018 •

edited

Loading

free commented Dec 21, 2018 •

edited

Loading

roidelapluie commented Dec 21, 2018 •

edited

Loading