-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Description
Problem Statement
It would be really useful to configure alerts that are based on a ratio, such as a Span equation that uses the division operation.
This would enable alerting not just on the existence of some critical behavior, but on whether the percentage of traffic experiencing that behavior is statistically significant compared to our overall traffic.
For example, an application might block unauthorized traffic and return a 401 response. Spikes of 401 wouldn't be unusual, so an anomaly detection approach wouldn't quite fit. Despite that, we might still want to know when a majority of traffic being served is just 401s - it tells us that a majority of our cost-to-serve is being wasted on these unauthorized requests, and it might be time to add some additional service protections (ex. IP blocks for nefarious traffic).
We can add a dashboard view that shows what percentage of traffic is being served 401s with something like this:
count_if(span.status_code, equals, 401) / count(span.duration)
Unfortunately we can't alert on that type of query.
Solution Brainstorm
It would be great to support the same Span Equation query in an Alert that is supported in the dashboard.
This could be a new Alert Type, and would accept the same Span Equation arguments that are currently supported in the trace visualization view (and dashboard views)
Product Area
Alerts
Metadata
Metadata
Assignees
Labels
Projects
Status