Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Identify Prometheus instance when sending notifications to Alertmanager #2416

Closed
mattbostock opened this Issue Feb 10, 2017 · 11 comments

Comments

Projects
None yet
2 participants
@mattbostock
Copy link
Contributor

mattbostock commented Feb 10, 2017

When an alert fires, I often want to see the data that caused the alert to trigger. To do this, I need to know which Prometheus instance fired the alert.

I have a dashboard annotation on each alert that links to Grafana. In Grafana, you can set a $datasource templating variable that allows you to dynamically change the datasource for all panels on a dashboard. I would like the dashboard annotation to include the Prometheus instance as a URL query parameter to Grafana.

The GeneratorURL tells me which Prometheus fired the alert, but is not useful for selecting the Grafana datasource in its current form.

I tried using external labels, however this breaks alert deduplication.

If we can send an identifier (perhaps user-defined as a commandline flag or configuration file option) to Alertmanager, we could then use that in the alert template.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Feb 10, 2017

GeneratorURL is how the Prometheus is identified.

There's no plans to add additional special fields in the AM protocol. I'd suggest using an annotation, and/or notification templating.

@mattbostock

This comment has been minimized.

Copy link
Contributor Author

mattbostock commented Feb 10, 2017

I don't find GeneratorURL that useful as it always links to the graph view, which can take a prohibitive amount of time for many queries. I think it would be more useful if we could break it down into its component parts.

an annotation

Do you mean to template the alerting rules? I'd like to avoid that at this point to keep complexity to a minimum.

notification templating

Do you mean to derive the data source from the GeneratorURL in the notification template?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Feb 10, 2017

I think it would be more useful if we could break it down into its component parts.

Those are the external labels typically.

Do you mean to template the alerting rules?

Yes.

Do you mean to derive the data source from the GeneratorURL in the notification template?

That's one option.

This really comes down to how you've architected your monitoring and associated naming schemes. We already have all the support that should be required, it's just a question of plumbing it together.

@mattbostock

This comment has been minimized.

Copy link
Contributor Author

mattbostock commented Feb 10, 2017

Those are the external labels typically.

External labels are not compatible with alert deduplication, which is required for redundancy.

This really comes down to how you've architected your monitoring and associated naming schemes. We already have all the support that should be required, it's just a question of plumbing it together.

We have a federated setup:

  • two federation servers at the top level, each with the same configuration and independent of one another
  • one or more Prometheus servers in each datacentre (again for redundancy; the number depends on the size of the datacentre); we have over 100 of these

We could use templating as you suggest, but I think we could make this simpler to achieve. Is there reluctance to change the GeneratorURL for backwards compatibility reasons? What if we improved it for Prometheus 2.0?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Feb 10, 2017

It sounds to me like the problem here is you're trying to map your choice of Grafana data source names to Prometheus URLs. Unless you've carefully designed both to be compatible/extensible (for example by making external labels sufficient to figure out a URL and datasource), that's not going to work out.

At least one of those two needs to change, otherwise you get to maintain a manual mapping.

The purpose of the GeneratorURL is not to select Grafana data sources, and I don't see why we'd make that the case as it'd break its intended purposes.

@mattbostock

This comment has been minimized.

Copy link
Contributor Author

mattbostock commented Feb 10, 2017

It sounds to me like the problem here is you're trying to map your choice of Grafana data source names to Prometheus URLs

Our datasources are named in Grafana similar to:

prometheus-global
prometheus-dc1a
prometheus-dc1b
prometheus-dc2a
prometheus-dc2b

...and our Prometheus URLs might be:

https://dc2a.prometheus.example.com

(We don't use letters to enumerate our servers, but this is an example).

I initially tried adding a prometheus external label, which would have a value of global or dc1a, however this broke alert deduplication.

The crux of the issue is that we have multiple redundant Prometheus servers at the a bottom level of our federated setup, and we want to be able to show in Grafana the the same data that triggered the alert.

We could add an alias datasource for each datacentre, and use that in the Grafana query parameters, but it might not necessarily map to the same data that triggered the alert.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Feb 10, 2017

The way I'd approach that would be to just show data from one of the Prometheus servers.

Cases where which server it comes from matters are rare enough that debugging by hand is okay in my experience (usually a network issue that affected only one server, which is less likely at the bottom level) as at that point you're more debugging Prometheus rather than the problem.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Mar 6, 2017

You might be interested in the reReplaceAll function in alertmanager notification templates, I don't think there's anything to be done here on the Prometheus side.

@mattbostock

This comment has been minimized.

Copy link
Contributor Author

mattbostock commented Mar 8, 2017

Thanks @brian-brazil, I was just discussing reReplaceAll with @jamesog. That will solve my use case.

I still think GeneratorURL could be much more useful if it were split into its component parts, it's very inflexible currently.

@mattbostock

This comment has been minimized.

Copy link
Contributor Author

mattbostock commented May 15, 2017

Resolved by #2716.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.