Skip to content

Ratio Based Alert Support #106102

@Maixy

Description

@Maixy

Problem Statement

It would be really useful to configure alerts that are based on a ratio, such as a Span equation that uses the division operation.
This would enable alerting not just on the existence of some critical behavior, but on whether the percentage of traffic experiencing that behavior is statistically significant compared to our overall traffic.

For example, an application might block unauthorized traffic and return a 401 response. Spikes of 401 wouldn't be unusual, so an anomaly detection approach wouldn't quite fit. Despite that, we might still want to know when a majority of traffic being served is just 401s - it tells us that a majority of our cost-to-serve is being wasted on these unauthorized requests, and it might be time to add some additional service protections (ex. IP blocks for nefarious traffic).

We can add a dashboard view that shows what percentage of traffic is being served 401s with something like this:
count_if(span.status_code, equals, 401) / count(span.duration)
Unfortunately we can't alert on that type of query.

Solution Brainstorm

It would be great to support the same Span Equation query in an Alert that is supported in the dashboard.
This could be a new Alert Type, and would accept the same Span Equation arguments that are currently supported in the trace visualization view (and dashboard views)

Product Area

Alerts

Metadata

Metadata

Assignees

No one assigned

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions