Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Add UI for testing full alerting expressions over time #1154

Open
qed- opened this Issue Oct 9, 2015 · 11 comments

Comments

Projects
None yet
5 participants
@qed-
Copy link

qed- commented Oct 9, 2015

It would be great if there was a way to generate time series (from recording rules and synthetic alert timeseries).

This would effectively allow backtesting of alerts and recording rules over past data.

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Oct 9, 2015

@qed- For my better understanding, how would this behave differently from just evaluating the alert expression as a graph to see at which historical points it generates output time series? That's how I currently test alert expressions over history.

For better alert testing usability, I'd like us to add a UI input field though that allows you to test an entire alerting rule (ALERT <alertname> IF <query expression> ...), not just the query expression that the rule contains.

@qed-

This comment has been minimized.

Copy link
Author

qed- commented Oct 9, 2015

If I understand right it's possible to graph the alert expression but not any FOR and WITH clauses.

Ideally you could click on an alert and see exactly how it would have been evaluated in the past, complete with labels.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Oct 9, 2015

Labels are currently static, so WITH doesn't add much. Even when that's more powerful what you mainly care about is if alerts will fire (so you can calibrate the threshold) rather than the exact labels involved.

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Oct 9, 2015

Yes, the thing about FOR is true though. That should be covered by my suggestion about having a UI way to test full alerting expressions (and being able to click on existing configured ones to do the same would be a great addition to that).

@qed-

This comment has been minimized.

Copy link
Author

qed- commented Oct 9, 2015

Yes, I mentioned labels as being the way to tell if the alert is firing or pending.

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Oct 9, 2015

So in my suggestion, the following would happen:

There's a UI textarea that lets you paste in an entire alerting expression. Alternatively, you can click on an already configured alerting rule to take you to the pre-filled-in version of that.

When pressing a button, you get a graph over time where you basically see labeled time series for any firing alert vector elements at the times when they are firing. If nothing is firing over the entire history, you would just get an empty graph. This is where Prometheus's alerting model differes fundamentally from one like Bosun's, where an alerting rule always has a single scalar output value vs. a whole vector of possible values (or none).

As an additional bonus, one could display a graph of just the count of firing alert elements over time or something like that.

Would that address your feature request?

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Oct 9, 2015

(and that would take into account the FOR stanzas etc. of course)

@qed-

This comment has been minimized.

Copy link
Author

qed- commented Oct 9, 2015

That would be perfect.

@juliusv juliusv changed the title Feature request: generate timeseries over historical data Feature request: Add UI for testing full alerting expressions over time Oct 9, 2015

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Oct 9, 2015

@qed- Great, I've adjusted the title of this issue to reflect this more clearly. I can't say yet when this is going to happen, but I think it would be a great thing to have in general.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Oct 9, 2015

I think in addition we need something to handle the for clauses. A summary of how many alerts fired for how long would be useful for tuning that.

@mattbostock

This comment has been minimized.

Copy link
Contributor

mattbostock commented Mar 27, 2017

Related: #1369

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.