Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sub query support for range selection. #1227

Closed
fabxc opened this issue Nov 18, 2015 · 46 comments
Closed

Sub query support for range selection. #1227

fabxc opened this issue Nov 18, 2015 · 46 comments

Comments

@fabxc
Copy link
Contributor

fabxc commented Nov 18, 2015

Currently one can only do range selection on vector selectors and has to use recording rules to range-select expressions.

@brian-brazil
Copy link
Contributor

An interesting question is how we choose the interval for this, currently it's not needed as everything in calculated at a point. We should also look at how to discourage/prevent use of this in rules/alerts on performance grounds.

@grobie
Copy link
Member

grobie commented Dec 24, 2015

👍

@Andor
Copy link

Andor commented Apr 29, 2016

Currently we can run query like rate(hit[1m]) + rate(miss[1m]), which is not very convenient.
IMHO better syntax is something like this: rate(hit+miss [1m]) or rate((hit+miss) [1m])

@brian-brazil
Copy link
Contributor

That's incorrect usage, you need to do the rate separately first or you'll get the wrong answer around counter resets. I'd also suggest that the exporter do either total&hit or total&miss, as that's a bit easier to work with.

@guoshimin
Copy link

Being able to run range queries on arbitrary expressions is valuable for ad-hoc explorations, either for sanity checking before making recorded rules, or for analyzing past data where you didn't already have recorded rules.

@roman-vynar
Copy link
Contributor

Will leave one of the references with discussion on this topic https://groups.google.com/forum/#!topic/prometheus-users/JOVfqQsVRl8

@virusdave
Copy link

Is there any ticket tracking this feature request (historical function evaluation)?

@brian-brazil
Copy link
Contributor

This is the issue for it. I wouldn't be too hopeful of it ever getting implemented.

@virusdave
Copy link

OK. Is that due to a philosophical issue, or a eng-resource-scarcity problem?

@brian-brazil
Copy link
Contributor

It's a semantic issue, and an operational one. It's unclear how to do this correctly, and even then its use would have to be greatly limited for safety.

@pparth
Copy link

pparth commented Sep 26, 2017

I do not understand why this is not considered important at all. The requirement is simple: we want to generate a range-vector from a function that returns an instant-vector. For example i want to execute something like:

sum_over_time(((sum(rate(server_request_hit{service_name=~"[[serviceName]]",http_response_status=~"5.."}[1m]))) / (sum(rate(server_request_hit{service_name=~"[[serviceName]]"}[1m]))) * vector(100)) > bool vector(0.1))

The internal query returns a time series which has a Y value of 1 when the underlying value is > 0.1 and a 0 when the underlying value is <= 0.1. This is how i calculate failure incidents. Then, i just want to know how many seconds these incidents lasted in the selected time period. To do that, i need to execute the sum_over_time function which is not applicable as the internal query is not considered a range vector.

In fact, Netflix Atlas allows this kind of operations without any issues.

@brian-brazil
Copy link
Contributor

That would be an inappropriate usage of this feature, what you want is a recording rule.

@pparth
Copy link

pparth commented Sep 26, 2017

How is this supported, is through a recording rule. But not what i want. I cannot have a recording rule for each step of my query where an instance vector is generated. And this is just for a single query. Also, it is too difficult for everyone who want to create a dashboard to have to create a recording rule. It is not that easy. Adding recording rules is an administration process now and it is definitely not suitable for people who just want to create more elaborate queries for their dashboards.

@jeeyoungk
Copy link
Contributor

What would it take to implement this? I'm willing to jump through the hoops to make this happen. this makes a lot of x_over_time() functions useless when you have a lot of counters, as you want almost always want to apply rate() first.

@brian-brazil
Copy link
Contributor

See https://www.robustperception.io/composing-range-vector-functions-in-promql/

Most of the over_time functions make no sense with counters.

@juliusv
Copy link
Member

juliusv commented Jul 20, 2018

Yeah, the x_over_time() use cases come up often, and obviously it's not nice to have to use recording rules as an intermediary step every time during ad-hoc exploration.

I'm also undecided on adding this feature, but let's say we would want it, how would we determine the sub-query evaluation interval? I think we could go two routes here:

  • Introduce a new range syntax like [<range>:<resolution>], so rate(foo[1m])[10m:15s] would mean: evaluate rate(foo[1m]) for from now to 10 minutes back in time, at a 15s resolution.
  • Introduce a new function like subquery(<expression>, <resolution>), so subquery(rate(foo[1m]), 15) would do the same and return a range vector.

@brian-brazil
Copy link
Contributor

I was pondering the 2nd, if we ever add this. At the very least we'd want more protections against abusive queries first.

@roidelapluie
Copy link
Member

In https://docs.google.com/document/d/1-C5PycocOZEVIPrmM1hn8fBelShqtqiAmFptoG4yK70/edit# the consensus is to adopt option 1.

Some notes:

I think it is fine to default the subquery evaluation to the global evaluation, but I would rather still keep the ":" separator:

rate(sum(foo)[5m:]) rather than rate(sum(foo)[5m]); for 2 reasons:

  • the : would mean that you know what you are doing; most of the cases should not require subqueries.
  • rate(foo[5m]) and rate(foo[5m:1m]) would be different; as the last one would have a chance "hit" the boundaries.

Also subqueries would solve #3806 if we can do: rate(min without() foo)[5m:1m]

bitmap

@codesome
Copy link
Member

@brian-brazil I would like to work on this if we are keen on adding it now.

@brian-brazil
Copy link
Contributor

Sure, one thing is that the generated data will need to be aligned independently of the query parameters.

@sumkincpp
Copy link

This feature is really important for being able to work with empty values.

For example, this one is adjusted exporter availability :

avg_over_time(((absent(up)-1) or up)[365d])*100

@henridf
Copy link
Contributor

henridf commented Sep 26, 2018

@juliusv catching up and trying to understand option 1. So for the use case from the above-linked discussion (https://groups.google.com/d/msg/prometheus-users/JOVfqQsVRl8/PLA0hGAMCgAJ), would the query in option 1 syntax be this? :

min_over_time( rate(mysql_global_status_questions[2s])[60s])

And returning to your example, is it correct that rate(foo[1m])[10m:15s] would give the same result as a request to the query_range endpoint with the following params? :

  • query='rate(foo[1m])`
  • start=<now - 10m>
  • end=< now >
  • step=15s

@brian-brazil
Copy link
Contributor

Almost, to preserve the invariant that query_range is syntactic sugar on top of query the start time should be aligned with the step.

@henridf
Copy link
Contributor

henridf commented Sep 26, 2018

so in theory at least, this could eventually remove the need for query_range?
(not sure about others but I was confused by the existence and role of query_range when first discovering the http api)

@brian-brazil
Copy link
Contributor

No, I'd expect instant queries to still not return range vector and subqueries don't support arbitrary start points.

@henridf
Copy link
Contributor

henridf commented Sep 27, 2018

Instant queries can already return range vectors...

@free
Copy link
Contributor

free commented Dec 18, 2018

I see one issue with the proposed implementation: in the same way that I think of rate/increase currently being broken (they don't consider the sample just before the range, see #3806), subqueries will now break the *_over_time functions (and changes, resets).

Assuming a subquery range equal to the step, a max_over_time subquery calculation will now include the value of the last sample before the range into the calculation. So the value of the last sample before each step will be included in the calculation of that step and the next.

Put another way, rather than "fixing" rate and increase, this proposed change will merely make all functions that take range vector parameters behave differently (and produce different results) in a query vs. in a subquery.

The only way I can see for fixing this inconsistency would be to drop/ignore the (calculated) sample at the start of each range, unless there happens to be an actual sample with that exact timestamp (which is what ranges do currently). I for one would rather live with the inconsistency than forever give up hope of a decent rate implementation, but I still think it is worth bringing up the issue.

@brian-brazil
Copy link
Contributor

If that were the case it'd be a bug, but looking at the PR and unittests I'm not seeing that issue. Can you comment where you're seeing that over on the PR?

@free
Copy link
Contributor

free commented Dec 18, 2018

I'm not seeing it in the PR because the series used in the tests all have samples at 10 second multiples and the resolutions are also a 10 second multiple. But if you used samples that were not aligned with the resolution and the increase between samples was no longer constant, you'd get a different result from rate[1m] and rate[1m:10s]` (assuming the series started before the 1m10s range and ended after it).

E.g. something like this (back of the envelope calculation, not actually tested against the PR):

# Fibonacci sequence, to ensure the rate is not constant.
load 7s
  metric 1 1 2 3 5 8 13 21 34 55 89 144 233 377

# Extrapolated from [3@21, 144@77]: (144 - 3) / (77 - 21)
eval instant at 80s rate(metric[1m])
  {} 2.517857143

# No extrapolation, [2@20, 144@80]: (144 - 2) / 60
eval instant at 80s rate(metric[1m:10s])
  {} 2.366666667

# Only one value between 10s and 20s, 2@14
eval instant at 20s min_over_time(metric[10s])
  {} 2

# min(1@10, 2@20)
eval instant at 20s min_over_time(metric[10s:10s])
  {} 1

@brian-brazil
Copy link
Contributor

brian-brazil commented Dec 18, 2018

That tests passes. That's what I'd expect though semantically, and the subquery examples here aren't sane.

min_over_time(rate(metric[5m])[1h:1m]) is more what I'd expect to see.

@brian-brazil
Copy link
Contributor

@codesome Can you add the above tests, with just a comment indicating that using subqueries unnecessarily is unwise?

@free
Copy link
Contributor

free commented Dec 19, 2018

That's what I'd expect though semantically, and the subquery examples here aren't sane.

I'm not sure whether I should feel offended or not. I'll go with the latter and ask you to please elaborate on that statement.

Yes, the ranges are rather short and cover only a couple of samples, but that's not my point. (Had I gone for longer ranges, with more samples I would have ran out of bits for the Fibonacci numbers.)

My point was that all functions that take range vectors will include one extra sample when used in a subquery (as opposed to when used in a regular query). Which was what I was arguing for in #3806 (for the rate and increase functions) and got shot down because it would introduce inconsistencies in the definition of a range.

This is most obvious when using min_over_time (rate is very much an approximation anyway, so no one would expect the values to match). min_over_time will consider this extra point from before the range, which may affect the outcome, in that some values will be included in the calculation of N+1 samples (as opposed to N samples with a plain query), where N is the ratio of range to resolution. This may be considered negligible in your example (1 hour range with 1 minute resolution) but the closer the range and resolution get to each other, the higher the likelihood of error.

@brian-brazil
Copy link
Contributor

Yes, the ranges are rather short and cover only a couple of samples, but that's not my point.

That's not what I mean (smaller data makes more sense for unittests like this anyway), I meant that the actual queries themselves aren't sane. Subqueries shouldn't be used on instant vector selectors, because you can use a range vector selector instead which is both more efficient, and doesn't lose you timestamps.

My point was that all functions that take range vectors will include one extra sample when used in a subquery

That's the same as if you were to use a recording rule via https://www.robustperception.io/composing-range-vector-functions-in-promql. Given that subqueries are to cover that sort of use case, I don't see a big issue here.

@free
Copy link
Contributor

free commented Dec 19, 2018

OK, I finally see your point about this being the same as using a range function on top of a recording rule, but no, I don't think it's accurate.

It would be accurate if (1) the query step/subquery resolution either matched or was a multiple of the eval interval (which would be a reasonable expectation) and (2) recording rule samples happened to also be aligned with the eval interval (their timestamps are exactly eval interval apart, but not usually aligned with it). Then the samples on the step/resolution boundaries would be included both in the range ending at said boundary and the range beginning there. Something like this:

eval_step (10s)           |---------|---------|---------|---------|
recording rule samples    3         2         1         4         5
range 1 (20s)            [3         2         1]
range 2 (20s)                      [2         1         4]
range 3 (20s)                                [1         4         5]

This is exactly what a subquery now produces.

What happens with the recording rule + range vector function in practice though, is that condition (2) doesn't hold (i.e. recording rule samples are not aligned with the eval step) so the resulting ranges look like this:

eval_step (10s)           |---------|---------|---------|---------|
recording rule samples       3         2         1         4         5
range 1 (20s)            [   3         2       ]
range 2 (20s)                      [   2         1       ]
range 3 (20s)                                [   1         4       ]

So if your range function was min_over_time, you'd get a result of [1, 1, 1] in the subquery and idealized recording rule + range vector function case, versus [2, 1, 1] in the real-world recording rule + range vector function case.

Furthermore, and apart from whether this does or doesn't exactly match the recording rule + range vector function outputs, I believe that it would make more sense (to the average user) to try and emulate the behavior of regular queries vs. emulating the behavior of the existing (and soon to be forgotten) workaround to the lack of subqueries.

As to your observation about the queries not being sane because they use a subquery on an instant vector as opposed to some range vector function, you are technically right, but I was merely trying to come up with an obvious way of comparing queries to subqueries. Ideally I should have written a recording rule on top of the raw data and a range query on top of that and compared it with the equivalent subquery expression, but (1) it would have been overkill for an example and (2) you can't do that with the current query test infrastructure anyway.

And now that I look at the comparison above, it suggests a different solution to achieving parity between queries and subqueries: have the subquery results explicitly NOT aligned with the subquery resolution. That would ensure that every range of length N x resolution will contain exactly N, not N+1 samples. And decisively crush my hopes for a better rate implementation. (o:

@codesome
Copy link
Member

@free I tried the tests that you gave above, and they pass.

@brian-brazil I'll add min_over_time(rate(metric[5m])[1h:1m]) for the fibonacci sequence. Should I also add the tests by @free?

@brian-brazil
Copy link
Contributor

Yes, please do. The rest I'll need to think about in the morning.

@codesome
Copy link
Member

The time ranges in min_over_time(rate(metric[5m])[1h:1m]) for fibonacci doesn't look like a good idea, I guess I will change it.

@brian-brazil
Copy link
Contributor

I finally see your point about this being the same as using a range function on top of a recording rule, but no, I don't think it's accurate.

More accurately it is a thing that can happen with things currently, if the source data happens to be aligned.

emulating the behavior of the existing (and soon to be forgotten) workaround to the lack of subqueries.

It's not going away, it'd be unwise to use subqueries in a recording rule on performance grounds.

you can't do that with the current query test infrastructure anyway.

The new promtool rule test feature has that, we've yet to switch the promql unittests over to it.

And now that I look at the comparison above, it suggests a different solution to achieving parity between queries and subqueries: have the subquery results explicitly NOT aligned with the subquery resolution.

That is an option, but it feels kinda weird. I think the real issue here is that ranges are closed intervals, rather than half-open intervals. I have previously spotted this as a bit odd, but it hasn't caused any real problems thus far. A user could do a slightly shorter range if they wanted to avoid this.

@free
Copy link
Contributor

free commented Dec 21, 2018

More accurately it is a thing that can happen with things currently, if the source data happens to be aligned.

That has a 1/5000 probability of happening in my deployment (5s scrape interval), even lower for the average Prometheus deployment. I wouldn't start from this as a baseline when deciding how subqueries should or should not behave.

It's not going away, it'd be unwise to use subqueries in a recording rule on performance grounds.

I haven't actually tested the code, but I am confident the performance overhead is minimal. Most of the CPU load and memory usage will likely go into decoding and buffering the original timeseries plus functions and aggregations, not into sampling outputs (i.e. subqueries). There is going to be some overhead, but I for one will definitely replace all my rate queries with subqueries (assuming subquery implementation remains as is), simply because the results will be so much more accurate.

I think the real issue here is that ranges are closed intervals, rather than half-open intervals. I have previously spotted this as a bit odd, but it hasn't caused any real problems thus far.

I fully agree with your assessment here, but I'll point out again that this hasn't been an issue thus far because both scraped series and recording rule outputs have essentially zero chance of aligning exactly with API query steps. With subqueries, you're going from 0% to 100% in one fell swoop.

(Now that I think of it, there exists one situation where there's a 100% chance of queries using one another's output to have their samples align perfectly, and that is when you have 2 recording rules within the same rule group. But in that instance no one's going to take, say a rate at 10s resolution and then immediately do a min_over_time over its output at 30s resolution (or any similar combination I can think of), so it simply hasn't come up thus far.)

A user could do a slightly shorter range if they wanted to avoid this.

Yes, if said user was savvy enough. And happened to issue said query from a script (or PromQL supported range arithmetic, as in rate(foo[5m-1s])). But most users are going to run those queries from Grafana, so they would have to be hardcoded (e.g. rate(foo[299s])), meaning the dashboard would only be useful on a fixed or highly restrictive time range (i.e. neither a 15 minute, nor a 1 month range are very useful at 5 minute resolution). Whereas I'm sure many dashboards are more or less fully dynamic (by making use of Grafana's $__interval variable) allowing you to examine the underlying data over any time range you may need to.

@brian-brazil
Copy link
Contributor

I haven't actually tested the code, but I am confident the performance overhead is minimal.

I didn't mean for your queries, but for things like min_over_time(rate(metric[5m])[1h:]), where you don't want to redo the rate 12x on every eval cycle.

I fully agree with your assessment here

We have the option to tweak this in 3.0, only breakage it's likely to cause is in promql unittests as in general you should be writing your expressions to be robust in the face of jitter.

But in that instance no one's going to take, say a rate at 10s resolution and then immediately do a min_over_time over its output at 30s resolution (or any similar combination I can think of), so it simply hasn't come up thus far.

That's a standard subquery-equivalent use case, and we've had no complaints around it that I know of. The only one I know of in this general area is one user confused by the new PromQL unittesting stuff.

@free
Copy link
Contributor

free commented Dec 21, 2018

We have the option to tweak this in 3.0, only breakage it's likely to cause is in promql unittests as in general you should be writing your expressions to be robust in the face of jitter.

To fully clarify my position on closed vs. half-open intervals: I agree that ranges should be half-open, so you can reliably do the recording rule equivalent of min_over_time[1m:1m] and get clean downsampling (i.e. not have to worry about a sample being included into 2 successive ranges). This works perfectly with gauges.

When it comes to counters though, I would argue (and I think there should be no disagreement here) that the relevant information in a counter is the (reset-adjusted) increase between successive samples, not the absolute value of the counter (you've written a few blog posts on the topic yourself). Meaning that, theoretically speaking, a range over a counter would retain all increases recorded within that half-open interval, namely what is included now plus the increase corresponding to the first sample in the range (i.e. the adjusted increase between the last sample right at or before range start, assuming it's not stale, and the first sample strictly within the range).

So while in theory a range over a gauge would be the same as a range over a counter (all values in the half-open interval vs. all increases in the half-open interval), in practice (unless you're going to change the way counters are stored) they would be slightly different (a counter range would include one extra sample to the left).

(I realize Prometheus doesn't keep track of which metrics are counters and which are gauges, but that information may be inferred from the function that the range vector is being used in.)

This kind of change in 3.0 would essentially fix all issues with both query vs. subquery behavior and (my favorite topic, sorry) rate extrapolation. You'd still need to make sure your expressions are "robust in the face of jitter", as you put it, but even the requirements for robustness would be reduced (because you would only need to ensure the existence of at least 1 sample within the range, not 2).

And I now realize that this kind of differentiation between counters and gauges would further improve the correctness of subqueries: currently, if subquery resolution is lower than the original resolution (or 2 samples simply happen to fall within the same step) there's a good chance that a counter reset will be undervalued (or possibly even missed). If counter ranges and gauge ranges would be handled differently (and the evaluator was made aware of which kind of range it was dealing with) it could correctly downsample counters (or expressions expected to return counter-like values), just as it does gauges.

@roidelapluie
Copy link
Member

roidelapluie commented Dec 21, 2018

deriv and rate for counters/increasing gauges must always produce the same result. There is no other way.

@brian-brazil
Copy link
Contributor

brian-brazil commented Dec 21, 2018 via email

@roidelapluie
Copy link
Member

Yes but on an increasing vector I would expect rate and deriv to give the same result.

@beorn7
Copy link
Member

beorn7 commented Dec 22, 2018

Yes but on an increasing vector I would expect rate and deriv to give the same result.

deriv uses regression analysis, while rate only looks at the first and last sample in the range (modulo counter resets).

@lock lock bot locked and limited conversation to collaborators Jun 20, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests