Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External Labels not available in alert Annotations #3043

Open
byrnedo opened this Issue Aug 9, 2017 · 49 comments

Comments

Projects
None yet
7 participants
@byrnedo
Copy link

byrnedo commented Aug 9, 2017

What did you do?

Defined an external label in prometheus config:

global:
  scrape_interval: 15s
  scrape_timeout: 10s
  evaluation_interval: 15s
  external_labels:
    monitor: acme-logs-dev

In a rule I had:

ALERT container_eating_memory
  IF sum(container_memory_rss{container_label_com_docker_swarm_task_name=~".+"}) BY (instance, name) > 2500000000
  FOR 5m
  ANNOTATIONS {
    description="{{ $labels.container_label_com_docker_swarm_task_name }} is eating up a LOT of memory. Memory consumption of {{ $labels.container_label_com_docker_swarm_task_name }} is at {{ humanize $value}}.", 
    summary="{{$labels.monitor}} - HIGH MEMORY USAGE WARNING: TASK '{{ $labels.container_label_com_docker_swarm_task_name }}' on '{{ $labels.instance }}'"}

What did you expect to see?
The templated alert printing the monitor label.

But the alert in both prometheus alert view and the alert that ended up in slack were both missing the external label

What did you see instead? Under which circumstances?

No label listed in prometheus alerts page and empty string in slack

Environment

  • System information:

  • Prometheus version:

    1.7.1

  • Alertmanager version:

    0.8.0

  • Prometheus configuration file:

global:
  scrape_interval: 15s
  scrape_timeout: 10s
  evaluation_interval: 15s
  external_labels:
    monitor: acme-logs-dev
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - alertmanager.service.acme:9093
    scheme: http
    timeout: 10s
rule_files:
- /etc/prometheus/tasks.rules
- /etc/prometheus/host.rules
- /etc/prometheus/containers.rules
scrape_configs:
- job_name: cadvisor
  scrape_interval: 5s
  scrape_timeout: 5s
  metrics_path: /metrics
  scheme: http
  dns_sd_configs:
  - names:
    - tasks.cadvisor
    refresh_interval: 30s
    type: A
    port: 8080
- job_name: node-exporter
  scrape_interval: 5s
  scrape_timeout: 5s
  metrics_path: /metrics
  scheme: http
  dns_sd_configs:
  - names:
    - tasks.nodeexporter
    refresh_interval: 30s
    type: A
    port: 9100
- job_name: prometheus
  scrape_interval: 10s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  static_configs:
  - targets:
    - localhost:9090
- job_name: blackbox-http
  params:
    module:
    - http_2xx
  scrape_interval: 15s
  scrape_timeout: 10s
  metrics_path: /probe
  scheme: http
  dns_sd_configs:
  - names:
    - tasks.kibana.service.acme
    refresh_interval: 30s
    type: A
    port: 5601
  relabel_configs:
  - source_labels: [__address__]
    separator: ;
    regex: (.*)(:80)?
    target_label: __param_target
    replacement: ${1}
    action: replace
  - source_labels: [__param_target]
    separator: ;
    regex: (.*)
    target_label: instance
    replacement: ${1}
    action: replace
  - source_labels: []
    separator: ;
    regex: .*
    target_label: __address__
    replacement: blackboxexporter:9115
    action: replace
  - source_labels: [__meta_dns_name]
    separator: ;
    regex: tasks\.(.*)
    target_label: job
    replacement: ${1}
    action: replace
- job_name: blackbox-tcp
  params:
    module:
    - tcp_connect
  scrape_interval: 15s
  scrape_timeout: 10s
  metrics_path: /probe
  scheme: http
  dns_sd_configs:
  - names:
    - tasks.elasticsearch.service.acme
    refresh_interval: 30s
    type: A
    port: 9200
  - names:
    - tasks.logstash.service.acme
    refresh_interval: 30s
    type: A
    port: 5000
  relabel_configs:
  - source_labels: [__address__]
    separator: ;
    regex: (.*)(:80)?
    target_label: __param_target
    replacement: ${1}
    action: replace
  - source_labels: [__param_target]
    separator: ;
    regex: (.*)
    target_label: instance
    replacement: ${1}
    action: replace
  - source_labels: []
    separator: ;
    regex: .*
    target_label: __address__
    replacement: blackboxexporter:9115
    action: replace
  - source_labels: [__meta_dns_name]
    separator: ;
    regex: tasks\.(.*)
    target_label: job
    replacement: ${1}
    action: replace
  • Alertmanager configuration file:
route:
  receiver: 'slack'
  repeat_interval: 3h #3h
  group_interval: 5m #5m
  group_wait: 1m #1m
  routes:
  #- receiver: 'logstash'
  #  continue: true
  - receiver: 'slack'

receivers:
  - name: 'slack'
    slack_configs:
      - send_resolved: true
        api_url: 'xxx'
        username: 'Prometheus - Alerts'
        channel: '#service-alerts'
        title: "{{ range .Alerts }}{{ .Annotations.summary }}\n{{ end }}"
        text: "{{ range .Alerts }}{{ .Annotations.description }}\n{{ end }}"
        icon_emoji: ':dart:'
  - name: 'logstash'
    webhook_configs:
      # Whether or not to notify about resolved alerts.
      - send_resolved: true
        # The endpoint to send HTTP POST requests to.
        url: 'http://logstash:8080/'

  • Logs:
@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 9, 2017

It makes more sense to ask questions like this on the prometheus-users mailing list rather than in a GitHub issue. On the mailing list, more people are available to potentially respond to your question, and the whole community can benefit from the answers provided.

This is by design, you want alert notification templates.

@byrnedo

This comment has been minimized.

Copy link
Author

byrnedo commented Aug 9, 2017

Ok, will do in future.

@tomwilkie

This comment has been minimized.

Copy link
Member

tomwilkie commented Aug 30, 2018

This is by design, you want alert notification templates.

I think there are some valid use cases where doing this with alert notification templates isn't desirable. For instance, consider dashboard URLs that might contain the datacenter (an external label). Some alerts might include dashboard URLs with datacenter, some might not. And it would be super usefule for the URLs that appear in the Prometheus to contain correct datacenters, and not contain place holders.

Is there are way of currently doing this or should this issue be reopened?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 30, 2018

If you've that complex a setup, you're getting into configuration management at that point. In practice you can probably always append the datacenter in the AM, if it's not used it'll be ignored.

This has been discussed previously, the answer at the time was notification templates (which were yet to be implemented back then). In general the point stands that an alert should not care about the Prometheus it is running in.

@poblahblahblah

This comment has been minimized.

Copy link

poblahblahblah commented Aug 31, 2018

In general the point stands that an alert should not care about the Prometheus it is running in.

I think the point, at least for us, is that humans do care about the Prometheus that alerts are generated from even if you don't or you don't think they should. Even if we ignore this specific scenario the fact remains that there can be external labels or alert label rewrites that can add contextual information for humans that are receiving the alerts which can be helpful. If you are taking a quick glance at multiple alerts, the summary or description annotations seem to be the two best places to put this type of useful information.

Some, but not all, labels being available is confusing at best. If the human decides that combining this information in a specific way is useful to them and their organization then isn't that something worth considering? If you can get this done in a single place (the alert definition) rather than multiple places, why wouldn't you? It's generally what the user would expect.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 31, 2018

I agree that this is useful, but alert templates is the wrong place to put it. This is all possible if you use notification templates, in addition to being more succinct.

@poblahblahblah

This comment has been minimized.

Copy link

poblahblahblah commented Aug 31, 2018

How can alert notification templates be used when pulling alerts via the AlertManager API?

@poblahblahblah

This comment has been minimized.

Copy link

poblahblahblah commented Sep 7, 2018

Bump on this.

How can notification templates be used when pulling alerts via the AlertManager API?

I'll point out that we're not the only ones using the AlertManager API.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Sep 7, 2018

They can't, as the Alertmanager design revolves around sending notifications. Just like the webhook, if you're doing something like that you'll need to do your own templating.

@poblahblahblah

This comment has been minimized.

Copy link

poblahblahblah commented Sep 7, 2018

So much for the pull method.

@nikopen

This comment has been minimized.

Copy link

nikopen commented Oct 10, 2018

@brian-brazil if I understand correctly, externalLabels: "monitor: <foobar>" cannot be used directly on an alert with {{ $labels.monitor }}, but only works as {{ .GroupLabels.monitor }} on the alertmanager configuration? or does one need something extra called a notification template using something else?

Why is the one '$lowercase' and the other '.CamelCase'?

What's the difference between $labels and .GroupLabels?

Why can external labels only be used using {{ .GroupLabels.}} ?

How to test it if the entirety of alertmanager configuration is only accepted as a secret, having to go through hoops if there is a multi-cluster multi-prometheus setup?

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jan 18, 2019

Having just implemented another work around for this for the _n_th time at SoundCloud triggered me to do a quick straw poll among the Prometheus people here. Everybody wants this feature.

I don't know if we should continue the discussion. My suspicion is that all arguments had been tabled already. If there is still no consensus of doing this, we could do a formal vote in prometheus-team@.

@beorn7 beorn7 reopened this Jan 18, 2019

@tomwilkie

This comment has been minimized.

Copy link
Member

tomwilkie commented Jan 18, 2019

+1 we really want this feature to, putting it to a vote seems a reasonable way of moving it forward. And by all means do it on prometheus-developers@, no reasons this should be private.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 18, 2019

Note there's also technical issues here, and this counts as a breaking change.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jan 18, 2019

It would only change behavior for those alert templates that so far referred to a non-existent label. I doubt that's of practical relevance. But even if it is, it's easy to cope with (many orders of magnitude easier than, let's say, dropping protobuf support or changing the on-disk format without providing a working conversion tool...). Thus, the "breaking change" argument has very little weight in the discussion.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 18, 2019

It would only change behavior for those alert templates that so far referred to a non-existent label.

You can also iterate over the labels, and that would change that output.

Thus, the "breaking change" argument has very little weight in the discussion.

It's still a breaking change per our stability guarantees. How easy a change is to deal with in principle doesn't stop it from being a breaking change.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jan 18, 2019

It's still a breaking change per our stability guarantees. How easy a change is to deal with in principle doesn't stop it from being a breaking change.

Which still doesn't change that it has very little weight in this discussion.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jan 21, 2019

As noted in #5009, we could easily hide the new feature behind a command line flag or a boolean entry in the config file.

Do we have consensus[1] that we want to implement the feature in general?

[1] In the wider sense of consensus, which includes: "Even if I personally disagree, I acknowledge that the majority of Prometheus team members want this and I thus do not insist on a formal vote.".

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 21, 2019

I still believe that this is a layering violation, and that having rules have knowledge of the Prometheus they run inside is likely to make things fragile harder to share. In addition it'd make it more difficult for those wanting to write generic alerting rules to avoid these unknown labels - a general principle of PromQL is that you should always know what labels are in play and this goes against that. Having a flag for this also seems like a poor way to do things, as it's now a global setting as to which type of alerts break.

Has anyone ever come up with a use case that can't be handled via notification templates? When this was first discussed, that's what we were looking for to reconsider this.

Also, how would this interact with alert_relabel_configs? The external_labels are not what ultimately ends up in the alertmanager.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jan 21, 2019

I would like to note that the actual PromQL expression won't have access to the external labels. That's only for template expansion in annotations and added labels. I don't see this as a layering violation as I see those on the same level as the external labels. They are added when sent out to Alertmanager. They never take part in PromQL expressions.

And yes, alert_relabel_configs has to be applied as well.

Yet another idea to make this not collide with any existing usage: Make the final label set as sent to AM available as $external_labels.… (in contrast to $labels.… for the label set before applying external labels and alert_relabel_configs). That makes it very explicit with which label set you are operating and won't break any existing templates.

About use cases: Technically, everything can be solved with notification templates. It would just make the config even more complicated. Our routing tree already looks approximately like https://en.wikipedia.org/wiki/SN_1572 . Multiplying the number of receivers by the number of notification templates is infeasible.

The most common cases for us are:

  1. Dashboard links where we often need the zone label as part of the URL (but the dashboard links are not unified enough (and never will be) to add it in the notification template).
  2. Readable description annotations. Want: “95th percentile latency is above 90ms for foobar in zone xy.” Got: “95th percentile latency is above 90ms for foobar in the zone you can easily figure out by looking at the labels below.”
@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 21, 2019

Make the final label set as sent to AM available as $external_labels.

External labels is not the same as what's sent to the AM. In addition we can only know what's sent to the AM after alert templates have been evaluated (as labels are also templated), and alert_relabel_configs+external_labels happens after that - so it's not possible to provide what's sent to the AM to alert templates. The external_labels themselves is at least technically possible.

That makes it very explicit with which label set you are operating and won't break any existing templates.

That would be a saner approach generally, modulo the above.

Dashboard links where we often need the zone label as part of the URL (but the dashboard links are not unified enough (and never will be) to add it in the notification template).

You really have dashboards within a team/receiver that have a non-trivial number of Grafana variable names for the the zone and other similar labels?

Want: “95th percentile latency is above 90ms for foobar in zone xy.” Got: “95th percentile latency is above 90ms for foobar in the zone you can easily figure out by looking at the labels below.”

I'd argue for this that depending on the labels is the right thing to do, in such a case I'd expect zone to be important enough to be in the email subject of equivalent. Having to add this manually to every single alerting rule is a sign of a layering violation.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jan 21, 2019

You really have dashboards within a team/receiver that have a non-trivial number of Grafana variable names for the the zone and other similar labels?

Some have one, some hove none, some use PromDash, some might use something completely different.

Having to add this manually to every single alerting rule is a sign of a layering violation.

I don't have to add it. It's in the external_labels of the Prometheus server. I just want to access it to generate descriptions that read nicely.

For both topics above, I declare my discussion quota exhausted. I guess there are not many I need to convince here, perhaps Brian is the only one. My ambition to do so is limited. And so are my resources.

A different story are actual technical issues, which I will continue to discuss, see next comment.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jan 21, 2019

In addition we can only know what's sent to the AM after alert templates have been evaluated (as labels are also templated), and alert_relabel_configs+external_labels happens after that - so it's not possible to provide what's sent to the AM to alert templates. The external_labels themselves is at least technically possible.

The order is critical here indeed. Since external labels won't override existing labels of the same name, that would work. (I guess that is what you mean by "The external_labels themselves is at least technically possible.")

What I would love is that the alert template expansion works the same as the notification template expansion. I'm wondering if it would have been better to make the added labels the last thing that happens. However, since many will probably use alert_relabel_configs to clamp severity and such, this will indeed cause widely noticeable breakage.

One solution could be to only make external_labels accessible in alert templates.

The other would be to do the order as follows:

  1. external_labes
  2. labels added by alerting rule (expanded with access to external_labels)
  3. alert_relabel_configs
  4. annotations (expanded with access to all the labels)

With annotations not being relabelable, this should not break anything (provided we use a different name than $labels.… to access the "enriched" label set – $external_label.… was just a suggestion).

Both solution would probably solve 99% of the use cases. The second one is a bit more powerful and a bit more consistent with notification template expansion, but the added labels part has the potential of confusion.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 21, 2019

Some have one, some hove none, some use PromDash, some might use something completely different.

I would presume that extra URL parameters are safe, and if someone is using something non-standard they should bear that cost. With a small set of potential dashboarding systems, I don't see it being too hard to set things up to work for all of them in one receiver.

I don't have to add it. It's in the external_labels of the Prometheus server.

You do have to add it, as you'd have to add access it to an annotation in every single alerting rule. More generally, if you find yourself having the exact same thing in ~every alerting rule in a Prometheus (and annotations are part of an alerting rule), you should probably be using notification templates instead.

I am not proposing that there'd be a notification template per alert - that's patently undesirable, I'm trying to avoid alert templates per environment.

I just want to access it to generate descriptions that read nicely.

Keep in mind that the goal is not "Prometheus should produce nice alerts" but that "Alertmanager should produce nice notifications" (or more accurately that produced notifications result in better MTTR, but let's not follow that rabbit hole). There's an interplay between alert and notification templating, and trying to do everything in either is rarely the right balance.

The canonical example is runbooks. Rather than specifying it in every single alerting rule instead default it to youwiki.com/$Alertname, and allow override by an annotation if someone isn't following the standard.

What I would love is that the alert template expansion works the same as the notification template expansion.

We do have a feature that does exactly that :)

However, since many will probably use alert_relabel_configs to clamp severity and such, this will indeed cause widely noticeable breakage.

Yes, that's undesirable. alert_relabel_configs is a way to ensure that whatever alerting rules are generated by your Prometheus comply with some organisation standard. external_labels alone is only meant to cover the main cases, not the more complex ones for which you need to use relabelling.

With annotations not being relabelable, this should not break anything

I'm not quite getting what you're suggesting here - how are the labels being used in step 1? Having external labels go before template expansion will break cases where the alert produces a labelname that's in the external label but the alert templating was either removing it or depending on it not being there.

In addition I think that annotations and label templating should both have access to the exact same inputs, as it'd be confusing otherwise (and we've had requests for each to depend on the other so there's no obvious "right order").

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jan 22, 2019

I'm not quite getting what you're suggesting here - how are the labels being used in step 1? Having external labels go before template expansion will break cases where the alert produces a labelname that's in the external label but the alert templating was either removing it or depending on it not being there.

As said, my suggestion includes that the label set that includes external labels is accessed under a different name. That should cover the case where a template depends on the external label not being there.

The case where an alerting expression creates a label that is then removed by the labels section in the alert so that it can then be added again via the external labels would indeed be changed by my suggestion. In case that's of any practical relevance, we can still go down the road of a feature flag.

In addition I think that annotations and label templating should both have access to the exact same inputs, as it'd be confusing otherwise (and we've had requests for each to depend on the other so there's no obvious "right order").

We could define an order if that helps getting a useful feature in.

BTW: In which form would people make labels depend on annotations? Was that a request you deemed reasonable or are you just bringing it up to needlessly complicate the discussion?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 22, 2019

Have I managed to get across why I consider this a layering violation (even if you don't agree)?

As said, my suggestion includes that the label set that includes external labels is accessed under a different name.

So external_labels would still apply as now, after label templating? That would make sense, but isn't what I got from your list.

In case that's of any practical relevance, we can still go down the road of a feature flag.

That still presents a global option to choose what to break. Given what you're trying to do, you shouldn't need to even consider a flag.

We could define an order if that helps getting a useful feature in.

My point is more that we should continue the current behaviour in this regard, as it's a bit of a rabbit hole otherwise.

BTW: In which form would people make labels depend on annotations? Was that a request you deemed reasonable or are you just bringing it up to needlessly complicate the discussion?

It was a good while back, I didn't consider the request in either direction to be reasonable (in addition to technical issues). At a higher level, if someone wants to be able to do 100% of their templating solely in alert templates without any configuration management or duplication then this is something they'd be asking for, and it's not a request we can practically satisfy.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jan 22, 2019

Have I managed to get across why I consider this a layering violation (even if you don't agree)?

I believe so, but I also believe it would be prohibitively expensive (if not impossible) to convince one side of the view of the other side. Thus, rather than further discussing the legitimacy of the feature itself, I'd like to focus on making this feature possible without creating technical issues, without breaking anybody's current usage, and without creating undue complications or confusions, so that even those that do not want this feature won't be harmed by it.

I have had further thought what would be the minimal required change to provide everything that's needed while changing as little as possible (and breaking nothing). I'll spell that out in the next comment to not mix it up with this discussion.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jan 22, 2019

So, new suggestion:

  1. Leave label processing order on alerts the same as it is, for better or worse.

  2. Add a template variable called $external_labels (name is just a suggestion) that simply
    makes the external_labels: section from the config accessible from both alert label templates
    and alert annotation templates. For example, a config like

    global:
      external_labels:
        owner: api-team
        zone: ex

    would make {{ $external_labels.zone }} and {{ $external_labels.owner }} accessible, but nothing else.

  3. Add a template variable called $final_labels (name is just a suggestion) that represents all the labels of the alert as it is sent to Alertmanager (after applying alert labels, external labels, and alert relabel configs), accessible from only alert annotation templates (and not from alert label templates, as final labels obviously cannot be accessed while labels are still being made – that's easier to understand and thus less prone to confusion than not having access to the final labels in annotations at all).

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 22, 2019

Add a template variable called $external_labels (name is just a suggestion) that simply
makes the external_labels: section from the config accessible from both alert label templates
and alert annotation templates.

As an implementation note, currently the labels are actually in .Labels and $labels is a shorthand we create for that. Given naming over in the AM, I'd expect this to be .ExternalLabels.

Would this also be available from console templates?

Add a template variable called $final_labels (name is just a suggestion) that represents all the labels of the alert as it is sent to Alertmanager (after applying alert labels, external labels, and alert relabel configs), accessible from only alert annotation templates

I'm against having different templating for the two types of templates. There's already enough confusion around templating without adding another subtlety. This becomes an even bigger layering violation.

@matthiasr

This comment has been minimized.

Copy link
Contributor

matthiasr commented Jan 22, 2019

I like the approach with explicit external labels 👍 For consistency, I think they should be available from console templates.

I don't know if the final labels are necessary at this point?

@matthiasr

This comment has been minimized.

Copy link
Contributor

matthiasr commented Jan 22, 2019

I think the name would make it pretty obvious that $final_labels cannot be used to template non-final labels. We could make it available-but-empty at that point, if you don't want to make the templating "languages" different?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 22, 2019

We could make it available-but-empty at that point, if you don't want to make the templating "languages" different?

It's still effectively different, and no use case for this has been offered.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 22, 2019

I think the name would make it pretty obvious that $final_labels cannot be used to template non-final labels.

You're making a big assumption. We've had more than one user assume that labels templates are evaluated from top to bottom, with intermediate values available to the next. Labels being an unordered map in YAML.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jan 22, 2019

As an implementation note, currently the labels are actually in .Labels and $labels is a shorthand we create for that. Given naming over in the AM, I'd expect this to be .ExternalLabels.

Sure, makes sense.

Would this also be available from console templates?

I don't feel strongly about those, but I guess it makes sense for both consistency as well as practical use.

I'm against having different templating for the two types of templates. There's already enough confusion around templating without adding another subtlety. This becomes an even bigger layering violation.

I see the point, but I disagree with it. I prefer having final labels (then probably called .FinalLabels, I guess) accessible in annotations.

If there are no prohibitive technical issues with the above, I guess we are down to different views how the layering looks like and in which template expansion (alert labels, alert annotations, notifications) we should have what accessible. In that case, how about finding out what the rest of prometheus-team thinks about it? Perhaps we don't need a formal vote for that.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 22, 2019

(then probably called .FinalLabels, I guess)

We have to the . version for technical reasons, whether we also have a $ version is open. I note that we wouldn't have a $ in console template following the current way things work. There's an inconsistency there one way or the other.

I prefer having final labels (then probably called .FinalLabels, I guess) accessible in annotations.

Can you explain your use case?

notifications

I don't believe this was under discussion.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jan 22, 2019

Can you explain your use case?

I did already. Once more: We need the option to annotate a specific alert with a human readable message or a link to a dashboard, a customizable runbook, or whatever a specific team might need to link to, using the same labels as we would use in notification templates. Practice of five years has shown that sufficient uniformity for those links or human readable annotations is not feasible to achieve in our organization.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 22, 2019

I did already.

You explained the case for external_labels (which is what this issue is about), you did not provide a case for the labels after alert_relabel_configs.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jan 22, 2019

alert_relabel_configs could change any of those. I want a way to consistently use the same labels in annotation templates as in notification templates. I want the option to use the same labels in annotations that show up in AM right next to my annotation as labels. Imagine I did have dashboard links that are uniform enough so that I could go down the way you suggest and have it all in the notification template. Now practice teaches me that I cannot achieve that amount of uniformity. I still want to use the power of the labels in my dashboard link that I sadly have to assemble in an alert annotation.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jan 22, 2019

Also, the approach with final labels is less surprising for those that do not exactly understand the whole procedure of adding and changing labels on an alert. They can go from what they see in the actual alerts on Alertmanager (which is what they have to deal with anyway) and use it in annotations.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 22, 2019

alert_relabel_configs could change any of those.

That's clear, I was asking if you actually have this use case or if it's just theoretical.

They can go from what they see in the actual alerts on Alertmanager (which is what they have to deal with anyway) and use it in annotations.

As a counter point, this is different from what the user will see on the rules status page in Prometheus.

How would all this interact with rule unittests? Currently they're self-contained.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jan 22, 2019

I was asking if you actually have this use case or if it's just theoretical.

We have not yet relabeled any labels that would be likely to be used in annotations. However, I don't even want to go down that line of argument.

The concrete handful of use cases we have could be solved with .ExternalLabels combined with the deep understanding of how labels are changed during the life cycle of an alert. But the proposed .FinalLabels would be able to do the same. While I could live with it for now, I'm also thinking about what's less confusing and more useful for the community at large. From that perspective, I find .FinalLabels for annotations even more desirable than .ExternalLabels for both annotations and alert labels. (I do see most use by just introducing both, though.)

WRT rules status page: The rules status page isn't concerned with template expansion at all. I assume you mean the alerts page? While it expands the alert-added labels, it does not expand annotations. The latter is a long known issue, which needs to be addressed anyway. It would also make sense to visualize alert relabeling somehow on that page (similar to how we do it on the targets page). See also #4277 , which addresses some alert exploration issues but sadly hasn't seen an update by the reviewer in a while. All of this is neither a blocker for the proposed change, nor is it something we would avoid if we didn't adopt the proposed change.

WRT unittests: Certainly we need to provide the information of external labels and alert relabeling configs to the test case should we need to test template expansions that use them. That's required work but doesn't strike me as particularly difficult.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 22, 2019

I assume you mean the alerts page?

Yes, I did.

While it expands the alert-added labels, it does not expand annotations. The latter is a long known issue, which needs to be addressed anyway.

It was added last year in #3900.

It would also make sense to visualize alert relabeling somehow on that page

That gets a bit weird, as we use one set of labels in ALERTS and another when talking to the AM as one is the "internal" labels and the other the "external" labels. Not saying that we shouldn't expose that information though.

That's required work but doesn't strike me as particularly difficult.

No the work is not difficult, but all of this is additional complication for users. Would they provide a path to a prometheus.yml (in which case should we also get the rules files from there?) or the settings directly in the test file?

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jan 22, 2019

It was added last year in #3900.

Oh my gosh! I totally missed that. My life has just gotten so much easier…

Would they provide a path to a prometheus.yml (in which case should we also get the rules files from there?) or the settings directly in the test file?

Intuitively, I'd provide it in the test file. I also think having a way to test alert relabel configs and the effect of external labels is valuable, even if we didn't allow access to the modified labels in templates.

In any case, I don't think this complication is prohibitive for the feature. There are a large number of users who have the need for the feature. Those that have the discipline to also test the usage of the new features will hardly complain if we give them the option to do so.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 22, 2019

I also think having a way to test alert relabel configs and the effect of external labels is valuable, even if we didn't allow access to the modified labels in templates.

That's a good point, though you'd probably want to plumb all the way through to notification templates at that stage.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jan 23, 2019

I did a quick straw poll among the Prometheus team members around me.
They would like to add the .ExternalLabels (as configured in the config file) to alert templates (both for labels and annotations). I would prefer to do .FinalLabels (ideally both, actually) for a number of reasons, but I can see how just making .ExternalLabels accessible is the least invasive change that still helps most of the way with the use cases.

Could we agree on this for now?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 23, 2019

That'd be a reasonable way to handle it, and I presume console templates are included too.

I'm pondering if we should add $ versions for everything in Prometheus/AM, or switch the doc examples to use .. The latter is probably cleaner long term.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jan 23, 2019

I presume console templates are included too.

Yes, that makes sense to me, too.

I'm pondering if we should add $ versions for everything in Prometheus/AM, or switch the doc examples to use .. The latter is probably cleaner long term.

I share the same sentiment. We can flesh this out during implementation.

https://prometheus.io/docs/prometheus/latest/configuration/template_reference/#alert-field-templates lists "convenience" as a reason. What's the actual difference except tying .Labels rather than $labels (which has approximately the same level of convenience)?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 23, 2019

What's the actual difference except tying .Labels rather than $labels (which has approximately the same level of convenience)?

There's no technical difference (though $labels would also work inside range as . gets overridden). My reason for doing this back when implementing it was that $labels would likely be far more familiar to users not familiar with Go templating that .Labels.

@matthiasr

This comment has been minimized.

Copy link
Contributor

matthiasr commented Jan 24, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.