Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Amazon Managed Service For Prometheus Rule Editing In Grafana Unified Alerting UX #82127

Open
alvinlin123 opened this issue Feb 8, 2024 · 0 comments

Comments

@alvinlin123
Copy link

alvinlin123 commented Feb 8, 2024

Why is this needed:

Customers of Amazon Managed Service for Prometheus (AMP) want to visualize and manage Prometheus alarms and rules in Grafana’s Unified Alerting UX.

Grafana's Unified Alerting UX is compatible with the Cortex/Mimir's rule management APIs. If AMP is based on Cortex, why doesn't Grafana Unified Alerting work with it? AMP disabled Cortex's rule management APIs because of its tenet of being Prometheus-compatible.

When a user configures rules in Prometheus, it provides a file that contains multiple rule groups, and AMP wanted to maintain that contract to allow existing Prometheus customers to “copy and paste” their existing rule configurations into AMP. Cortex rule management APIs works at the rule groups level instead of the file level. This is the main reason why AMP disabled Cortex's rule management APIs and implemented its own version of the APIs. Additionally, AMP wanted to take the complexity of verifying if a given rule groups are loaded into the Ruler Component out of user's hand, so AMP went with asynchronous rule management API model instead of synchronous model like in Cortex.

Note that AMP calls a Prometheus rules file “namespace” instead of “file” because the term “file” induces implementation assumption; any vendor may decide not to store rules in a file.

The API model differences between AMP and Cortex calls for similar but augmented Unified Alerting UX. For example, currently in Grafana you “save” your edit per single rule group, but to work with AMP’s APIs, you “save” your edit per batch of rule groups under the same namespace/file.

What would you like to be added:

An end-user is able to edit Prometheus Rules on AMP in the Unified Alerting UX. Under the hood Grafana Unified Alerting UI would interact with following AMP APIs:

Who is this feature for?

Any Grafana users using AMP's alert and rules.

** Additional information **

I also would like to take this opportunity to answer the important questions in the previous feature request: #64781. I understand the previous feature request was closed due to feature request system migration and inactivity; I will do my best to keep the new feature request active. :)

We want Grafana alerting, when proxying requests, to support SigV4 authentication. This would allow the existing UI to work against the prometheus-compatible endpoints in the AWS API (i.e. with this alone, grafana would treat AWS like a vanilla alertmanager, and be able to read but not write rules).

This WAS true, but as of now sigv4 is supported in Prometheus data source. So no additional work.

In order to detect which alerting implementation Grafana is talking to (cortex, prometheus, thanos, loki, etc) we have logic that effectively "sniffs" the API being proxied. This is necessary so that we can determine which endpoints are supported. Presumably, we'd need to do this for AWS as well, so that we know to use the right rule creation endpoint.

How might Grafana determine that the system it's talking to is actually AMP, in order to enable the right authentication or rule API plugin? Is there some signifier or base path in the URL perhaps, that would be guaranteed to work? Alternatively, is there an endpoint we can hit to obtain info about the target system, that would let Grafana know that it's talking to AMP as opposed to cortex or prometheus?

This is a great question. There are multiple ways we can let Grafana know that it’s talking to AMP. It can based off URL format, or we can have a new API for the sniffer to tell the difference (e.g. Prometheus’ Build Information API, then parse buildUser attribute).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants