Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'@ <timestamp>' modifier #8121

Merged
merged 35 commits into from Jan 20, 2021
Merged

'@ <timestamp>' modifier #8121

merged 35 commits into from Jan 20, 2021

Conversation

codesome
Copy link
Member

@codesome codesome commented Oct 28, 2020

This PR implements @ <timestamp> modifier as per this design doc.

An example query:

rate(process_cpu_seconds_total[1m]) 
  and
topk(7, rate(process_cpu_seconds_total[1h] @ 1234))

which ranks based on last 1h rate and w.r.t. unix timestamp 1234 but actually plots the 1m rate.

Closes #7903

This PR is to be followed up with an easier way to represent the start, end, range of a query in PromQL so that we could do @ <end>, metric[<range>] easily.

@codesome
Copy link
Member Author

codesome commented Oct 28, 2020

We have this prototype running at http://104.131.10.78:9090/graph if anyone wants to try it out. (Edit: this is running a prototype which can break in unknown ways. Not in sync with the PR.)

@roidelapluie
Copy link
Member

Can we use scalar, instead of creating a new timestamp type?

@codesome
Copy link
Member Author

What do you mean by a scalar here? Does the timestamp 1603888158 count as a scalar?

@roidelapluie
Copy link
Member

roidelapluie commented Oct 28, 2020

Also, thanks for this and the demo!!

I am asking for the timestamp/scalar because the first thing I tried is up @timestamp(up) :)

@roidelapluie
Copy link
Member

oooh wait the @ is not evaluated on every step, so my idea would not work.

@codesome
Copy link
Member Author

up @timestamp(up) means the timestamp is dependent on the evaluation time, right? So it is not actually using the fixed timestamp benefit :)

@codesome
Copy link
Member Author

Since the above server is still gathering data, here is a peek from one of our internal servers

Screenshot from 2020-10-28 19-02-17

@beorn7
Copy link
Member

beorn7 commented Oct 28, 2020

I think there are benefits in making the argument of @ dynamic.

For one, we probably want it to support an upcoming function now() to, for example, graph the top 10 fullest disk as of now. But that function's return value still won't depend on the evaluation time.

However, I can see MTD-like queries where I want it to depend on the evaluation time, e.g. where you want to compare a value at time t to the value at the beginning of the month relative to t, i.e. something like @ start_of_month(time()). (The yet-to-be-created start_of_month function would probably take time() as the default, and of course there could be more elegant ways, referring to what I wrote about fields in a fully-fledged time struct.) This would allow error-budget burn graphs, which you want to reset at the beginning of a new billing period etc.

To make the latter work nicely, we probably want an @-style start of a range in a range selector. If we allowed dynamic ranges, it could look like increase(errors_total[time()-start_of_month()]), but an @-syntax could say "from then to the time of evaluation", i.e. increase(errors_total[@start_of_month()]). But I guess I'm digressing…

The final plug for a dynamic argument of @ would be to retrieve it from recording rules or do arithmetic on it, which relates to my musings about timestamps and durations.

See https://docs.google.com/document/d/1jMeDsLvDfO92Qnry_JLAXalvMRzMSB1sBr9V7LolpYM/edit#heading=h.vmb7pe7hp12

@beorn7
Copy link
Member

beorn7 commented Oct 28, 2020

To be clear: The above are just musings, not concrete suggestions what to implement in this PR. I think this is great as a demo.

@codesome
Copy link
Member Author

Yeah this prototype is just barebone stuff required for a PoC, other helpers like now() end() etc would be based on what we decide and I don't plan implementing in this PR yet.

@brian-brazil
Copy link
Contributor

Note that this PR constitutes a breaking change to PromQL, and thus should not be merged.

@codesome
Copy link
Member Author

I see this as an incremental additional without breaking anything that exists right now. But anyway, should not be merged yes, just a PoC.

@roidelapluie
Copy link
Member

Note that this PR constitutes a breaking change to PromQL, and thus should not be merged.

I agree with @codesome that it is not breaking anything from what I can see

@brian-brazil
Copy link
Contributor

It breaks the invariant that has existed for 5+ years that PromQL does not look into the future, and thus that no samples beyond the evaluation time of a query may be accessed. Users could be depending on this, thus it is a breaking change.

@bwplotka
Copy link
Member

I am definitely in favor of this modifier, love it! Thanks, @codesome for this prototype. What I like about this syntax is that is quite clear. Sure, there is some boilerplate but it's fully understandable and does what expected.

It breaks the invariant that has existed for 5+ years that PromQL does not look into the future, and thus that no samples beyond the evaluation time of a query may be accessed. Users could be depending on this, thus it is a breaking change.

Hm @brian-brazil if that's the case isn't this as easy as returning an error when requested @timestamps is outside or requested range? 🤔

@beorn7
Copy link
Member

beorn7 commented Oct 28, 2020

Users could be depending on this, thus it is a breaking change.

That's not the definition of a breaking change.

https://xkcd.com/1172/

@beorn7
Copy link
Member

beorn7 commented Oct 28, 2020

if that's the case isn't this as easy as returning an error when requested @timestamps is outside or requested range?

First of all, no, it's not the case. This is not a breaking change. Everything that works before will continue to work in exactly the same way.

Second, it is core to this feature that you @-request a timestamp from outside the queried range. For example, if you have a topk graph for the busiest CPUs over the last hour, and then you want to zoom in to look at a particular peak, you are now looking at a time interval in the past but still need to run the selecting query with the same fixed @-time. (This is BTW what most competing metrics systems get wrong with the "easy" task of displaying the top-5 busiest CPUs.)

@brian-brazil
Copy link
Contributor

Hm @brian-brazil if that's the case isn't this as easy as returning an error when requested @timestamps is outside or requested range? thinking

That would cover it, as long as it's not after the start time it can't break anyone downstream due to this.

Everything that works before will continue to work in exactly the same way.

I disagree. Anyone who depends on the not looking into the future invariant would be broken, and this is highly likely to affect at the least both explicit and implicit caching. We'd basically be asking for all users to audit all their uses of the API to see if this would impact on them, or any of their downstream users.

Just because something isn't an explicitly enumerated behaviour doesn't mean that we can change it without causing breakage.

@RichiH
Copy link
Member

RichiH commented Oct 29, 2020

Is anyone aware of any software relying this current behavior or is this a theoretical concern?

A feature flag which is on for PromCorThanos and off by default would solve this if anyone is aware of anyone needing opt-in.

@brian-brazil
Copy link
Contributor

Being theoretical is sufficient for it to be breaking. There's stuff I'd like to break that I'd expect to have zero real impact, but can't.

In terms of concrete examples the most obvious would be Trickster, but I expect this would impact on large chunks of the userbase causing non-obvious breakage. I imagine there's tons of scripts out there running queries over "past" data and caching/storing it somewhere, presuming that the query result can't change as new data comes in.

A feature flag which is on for PromCorThanos and off by default would solve this if anyone is aware of anyone needing opt-in.

I don't think this is what you meant to write, as upgrading does not equate to opting in to breaking changes.

@bwplotka
Copy link
Member

I feel we can discuss if it's breaking change or not in a different thread as this is not blocking this proposal as far as I get this, right? (: We expect the timestamp to be inside the queried range.

Second, it is core to this feature that you @-request a timestamp from outside the queried range. For example, if you have a topk graph for the busiest CPUs over the last hour, and then you want to zoom in to look at a particular peak, you are now looking at a time interval in the past but still need to run the selecting query with the same fixed @-time. (This is BTW what most competing metrics systems get wrong with the "easy" task of displaying the top-5 busiest CPUs.)

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
@codesome codesome changed the title Prototype for '@ <timestamp>' modifier '@ <timestamp>' modifier Nov 19, 2020
@codesome
Copy link
Member Author

I have updated this PR with the actual implementation and not a prototype anymore. I still have to fix 1-2 edge cases (the test cases failing are commented out), and have the add a flag. But otherwise, it is good for a review now.

@codesome
Copy link
Member Author

(Also, currently it only works with a timestamp, I will be adding support for start() end() soon)

@codesome
Copy link
Member Author

codesome commented Jan 4, 2021

Should be ready for another review. I have added the flag and docs.

Copy link
Contributor

@brian-brazil brian-brazil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't do a deep re-review of the main logic, but on the other hand it didn't change.

cmd/prometheus/main.go Outdated Show resolved Hide resolved
docs/querying/basics.md Outdated Show resolved Hide resolved
docs/querying/basics.md Outdated Show resolved Hide resolved
docs/querying/basics.md Outdated Show resolved Hide resolved
docs/querying/basics.md Show resolved Hide resolved
promql/testdata/at_modifier.test Outdated Show resolved Hide resolved
promql/testdata/at_modifier.test Outdated Show resolved Hide resolved
promql/engine_test.go Outdated Show resolved Hide resolved
promql/engine_test.go Outdated Show resolved Hide resolved
promql/engine_test.go Outdated Show resolved Hide resolved
@codesome
Copy link
Member Author

codesome commented Jan 7, 2021

I have fixed all the comments. I hope it is close to merge now :)

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
Copy link
Contributor

@brian-brazil brian-brazil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's looking close. I'll want to a final full review of the core logic though, just in case I missed something or a bug slipped in during the final tweaks.

cmd/prometheus/main.go Outdated Show resolved Hide resolved
cmd/prometheus/main.go Show resolved Hide resolved
cmd/prometheus/main.go Outdated Show resolved Hide resolved
docs/querying/basics.md Outdated Show resolved Hide resolved
promql/functions.go Outdated Show resolved Hide resolved
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
@codesome
Copy link
Member Author

(In case it was missed, the PR was ready for review after the last commit)

docs/disabled_features.md Outdated Show resolved Hide resolved
cmd/prometheus/main.go Outdated Show resolved Hide resolved
docs/disabled_features.md Outdated Show resolved Hide resolved
docs/disabled_features.md Outdated Show resolved Hide resolved
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
@brian-brazil
Copy link
Contributor

👍

// Offset is the offset used during the query execution
// which is calculated using the original offset, at modifier time,
// eval time, and subquery offsets in the AST tree.
Offset time.Duration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooops, changing the meaning of an existing field without renaming / removing it caught me off guard and introduced a bug in PromLens where it now thinks all offsets are 0 (because Offset is now always 0 after parsing). I know we don't have API stability guarantees, but in the future I would recommend in this type of situation to do it the other way around: keep the same semantics for the existing field and instead add another one for the new meaning (e.g. AdjustedOffset or something like that).

squat added a commit to squat/prom-label-proxy that referenced this pull request Jun 22, 2021
This commit bumps the dependency on Prometheus to the commit from the
v2.28.0 tag
(https://github.com/prometheus/prometheus/releases/tag/v2.28.0). The
reason for bumping this is to take advantage of the updates to the
PromQL parser. Without this update, the prom-label-proxy cannot enforce
labels on queries that use any new features, e.g. queries that include:
* the last_over_time, sgn, clamp functions, added in
  prometheus/prometheus#8457;
* the @ operator, added in
  prometheus/prometheus#8121; or
* the group aggregation operator, added in
  prometheus/prometheus#7480.

Currently, queries including these functions produce errors like:
`parse error: unknown function with name "group"`

Signed-off-by: Lucas Servén Marín <lserven@gmail.com>
squat added a commit to squat/prom-label-proxy that referenced this pull request Jun 23, 2021
This commit bumps the dependency on Prometheus to the commit from the
v2.28.0 tag
(https://github.com/prometheus/prometheus/releases/tag/v2.28.0). The
reason for bumping this is to take advantage of the updates to the
PromQL parser. Without this update, the prom-label-proxy cannot enforce
labels on queries that use any new features, e.g. queries that include:
* the last_over_time, sgn, clamp functions, added in
  prometheus/prometheus#8457;
* the @ operator, added in
  prometheus/prometheus#8121; or
* the group aggregation operator, added in
  prometheus/prometheus#7480.

Currently, queries including these functions produce errors like:
`parse error: unknown function with name "group"`

In order to keep tests working, we need to pull in the latest commits
from Alertmanager. This, however, does not satisfy `go mod` since
v0.22.2 of Alertmanager, on which Prometheus depends, is not merged into
main. As a result, we need a replace directive in the go.mod file for
now.

Signed-off-by: Lucas Servén Marín <lserven@gmail.com>
bwplotka pushed a commit to prometheus-community/prom-label-proxy that referenced this pull request Jun 23, 2021
This commit bumps the dependency on Prometheus to the commit from the
v2.28.0 tag
(https://github.com/prometheus/prometheus/releases/tag/v2.28.0). The
reason for bumping this is to take advantage of the updates to the
PromQL parser. Without this update, the prom-label-proxy cannot enforce
labels on queries that use any new features, e.g. queries that include:
* the last_over_time, sgn, clamp functions, added in
  prometheus/prometheus#8457;
* the @ operator, added in
  prometheus/prometheus#8121; or
* the group aggregation operator, added in
  prometheus/prometheus#7480.

Currently, queries including these functions produce errors like:
`parse error: unknown function with name "group"`

In order to keep tests working, we need to pull in the latest commits
from Alertmanager. This, however, does not satisfy `go mod` since
v0.22.2 of Alertmanager, on which Prometheus depends, is not merged into
main. As a result, we need a replace directive in the go.mod file for
now.

Signed-off-by: Lucas Servén Marín <lserven@gmail.com>
squat added a commit to squat/observatorium that referenced this pull request Jun 23, 2021
This commit bumps the dependency on prom-label-proxy to include
prometheus-community/prom-label-proxy#71.
As described in that PR, the reason for bumping this is to take
advantage of the updates to the PromQL parser. Without this update,
the prom-label-proxy, and thus the Observatorium API cannot enforce
labels on queries that use any new features, e.g. queries that include:
* the last_over_time, sgn, clamp functions, added in
  prometheus/prometheus#8457;
* the @ operator, added in
  prometheus/prometheus#8121; or
* the group aggregation operator, added in
  prometheus/prometheus#7480.

Currently, queries including these functions produce errors like:
`parse error: unknown function with name "group"`

Signed-off-by: Lucas Servén Marín <lserven@gmail.com>
@bboreham
Copy link
Member

I'm pretty sure one or two blog posts were written showing how to use this feature, but I can't find them.
Would someone post the link(s) please?

@roidelapluie
Copy link
Member

https://prometheus.io/blog/2021/02/18/introducing-the-@-modifier/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

topk/bottomk X over time, X = [avg, sum, min, max, etc...]
8 participants