Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added alert labels cannot be used in annotation #2454

Closed
JorritSalverda opened this Issue Feb 28, 2017 · 6 comments

Comments

Projects
None yet
4 participants
@JorritSalverda
Copy link
Contributor

JorritSalverda commented Feb 28, 2017

What did you do?

When I add a label in an alert via LABELS I would like to be able to use it immediately in the annotation, however that doesn't seem to be possible. The label does show up in the ALERTS timeline series and can be used in the alertmanager.

ALERT HaproxyErrorRateAboveThreshold
  IF sum(rate(haproxy_backend_http_responses_total{code="5xx"}[1m])) by (app,team,kubernetes_namespace) / sum(rate(haproxy_backend_http_responses_total[1m])) by (app,team,kubernetes_namespace) > 0.02
  FOR 5m
  LABELS { severity = "page", kubernetes_cluster = "cluster-name", gcloud_project = "project-name" }
  ANNOTATIONS {
    summary = "Error rate of {{ $labels.app }} is above threshold",
    description = "Error rate of {{ $labels.app }}.{{ $labels.kubernetes_namespace }}.{{ $labels.kubernetes_cluster }} is above threshold of 2% for more than 5 minutes.",
  }

What did you expect to see?

I would expect {{ $labels.kubernetes_cluster }} in the annotation description to be replaced with the value I just set in the LABELS section.

What did you see instead? Under which circumstances?

It gets replaced with an empty string.

Environment

  • System information:

Linux 4.4.21+ x86_64

  • Prometheus version:
prometheus, version 1.5.2 (branch: master, revision: bd1182d29f462c39544f94cc822830e1c64cf55b)
  build user:       root@1a01c5f68840
  build date:       20170210-16:23:28
  go version:       go1.7.5

  • Alertmanager version:
alertmanager, version 0.5.1 (branch: master, revision: 0ea1cac51e6a620ec09d053f0484b97932b5c902)
  build user:       root@fb407787b8bf
  build date:       20161125-08:14:40
  go version:       go1.7.3

  • Prometheus configuration file:

prometheus.yml

# By default, scrape targets every 15 seconds.
global:
  scrape_interval:     15s

# A scrape configuration for running Prometheus on a Kubernetes cluster.
# This uses separate scrape configs for cluster components (i.e. API server, node)
# and services to allow each to use different authentication configs.
#
# Kubernetes labels will be added as Prometheus labels on metrics via the
# `labelmap` relabeling action.

# Scrape config for API servers.
#
# Kubernetes exposes API servers as endpoints to the default/kubernetes
# service so this uses `endpoints` role and uses relabelling to only keep
# the endpoints associated with the default/kubernetes service using the
# default named port `https`. This works for single API server deployments as
# well as HA API server deployments.
scrape_configs:
- job_name: 'kubernetes-apiservers'

  kubernetes_sd_configs:
  - role: endpoints

  # Default to scraping over https. If required, just disable this or change to
  # `http`.
  scheme: https

  # This TLS & bearer token file config is used to connect to the actual scrape
  # endpoints for cluster components. This is separate to discovery auth
  # configuration because discovery & scraping are two separate concerns in
  # Prometheus. The discovery auth config is automatic if Prometheus runs inside
  # the cluster. Otherwise, more config options have to be provided within the
  # <kubernetes_sd_config>.
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    # If your node certificates are self-signed or use a different CA to the
    # master CA, then disable certificate verification below. Note that
    # certificate verification is an integral part of a secure infrastructure
    # so this should only be disabled in a controlled environment. You can
    # disable certificate verification by uncommenting the line below.
    #
    insecure_skip_verify: true
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

  # Keep only the default/kubernetes service endpoints for the https port. This
  # will add targets for each API server which Kubernetes adds an endpoint to
  # the default/kubernetes service.
  relabel_configs:
  - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
    action: keep
    regex: default;kubernetes;https

- job_name: 'kubernetes-nodes'

  # Default to scraping over https. If required, just disable this or change to
  # `http`.
  scheme: https

  # This TLS & bearer token file config is used to connect to the actual scrape
  # endpoints for cluster components. This is separate to discovery auth
  # configuration because discovery & scraping are two separate concerns in
  # Prometheus. The discovery auth config is automatic if Prometheus runs inside
  # the cluster. Otherwise, more config options have to be provided within the
  # <kubernetes_sd_config>.
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    # If your node certificates are self-signed or use a different CA to the
    # master CA, then disable certificate verification below. Note that
    # certificate verification is an integral part of a secure infrastructure
    # so this should only be disabled in a controlled environment. You can
    # disable certificate verification by uncommenting the line below.
    #
    insecure_skip_verify: true
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

  kubernetes_sd_configs:
  - role: node

  relabel_configs:
  - action: labelmap
    regex: __meta_kubernetes_node_label_(.+)

# Scrape config for service endpoints.
#
# The relabeling allows the actual service scrape endpoint to be configured
# via the following annotations:
#
# * `prometheus.io/scrape`: Only scrape services that have a value of `true`
# * `prometheus.io/scheme`: If the metrics endpoint is secured then you will need
# to set this to `https` & most likely set the `tls_config` of the scrape config.
# * `prometheus.io/path`: If the metrics path is not `/metrics` override this.
# * `prometheus.io/port`: If the metrics are exposed on a different port to the
# service then set this appropriately.
- job_name: 'kubernetes-service-endpoints'

  kubernetes_sd_configs:
  - role: endpoints

  relabel_configs:
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
    action: keep
    regex: true
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
    action: replace
    target_label: __scheme__
    regex: (https?)
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
    action: replace
    target_label: __metrics_path__
    regex: (.+)
  - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
    action: replace
    target_label: __address__
    regex: (.+)(?::\d+);(\d+)
    replacement: $1:$2
  - action: labelmap
    regex: __meta_kubernetes_service_label_(.+)
  - source_labels: [__meta_kubernetes_namespace]
    action: replace
    target_label: kubernetes_namespace
  - source_labels: [__meta_kubernetes_service_name]
    action: replace
    target_label: kubernetes_name

# Example scrape config for probing services via the Blackbox Exporter.
#
# The relabeling allows the actual service scrape endpoint to be configured
# via the following annotations:
#
# * `prometheus.io/probe`: Only probe services that have a value of `true`
- job_name: 'kubernetes-services'

  metrics_path: /probe
  params:
    module: [http_2xx]

  kubernetes_sd_configs:
  - role: service

  relabel_configs:
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
    action: keep
    regex: true
  - source_labels: [__address__]
    action: keep
    regex: (.*?)(:80|:443)
  - source_labels: [__address__]
    regex: (.*?)(:80)
    target_label: __param_target
    replacement: http://${1}
  - source_labels: [__address__]
    regex: (.*?)(:443)
    target_label: __param_target
    replacement: https://${1}
  - source_labels: [__param_target,__meta_kubernetes_service_annotation_prometheus_io_probe_path]
    separator: ';'
    regex: (.*?);(.*?)
    target_label: __param_target
    replacement: ${1}${2}
  - target_label: __address__
    replacement: blackbox-exporter
  - source_labels: [__param_target]
    target_label: instance
  - action: labelmap
    regex: __meta_kubernetes_service_label_(.+)
  - source_labels: [__meta_kubernetes_namespace]
    target_label: kubernetes_namespace
  - source_labels: [__meta_kubernetes_service_name]
    target_label: kubernetes_name

# Example scrape config for pods
#
# The relabeling allows the actual pod scrape endpoint to be configured via the
# following annotations:
#
# * `prometheus.io/scrape`: Only scrape pods that have a value of `true`
# * `prometheus.io/path`: If the metrics path is not `/metrics` override this.
# * `prometheus.io/port`: Scrape the pod on the indicated port instead of the default of `9102`.
- job_name: 'kubernetes-pods'

  kubernetes_sd_configs:
  - role: pod

  relabel_configs:
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
    action: keep
    regex: true
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
    action: replace
    target_label: __metrics_path__
    regex: (.+)
  - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
    action: replace
    regex: (.+):(?:\d+);(\d+)
    replacement: ${1}:${2}
    target_label: __address__
  - action: labelmap
    regex: __meta_kubernetes_pod_label_(.+)
  - source_labels: [__meta_kubernetes_namespace]
    action: replace
    target_label: kubernetes_namespace
  - source_labels: [__meta_kubernetes_pod_name]
    action: replace
    target_label: kubernetes_pod_name

# load rule files
rule_files:
- '/prometheus-rules/alert.rules'

alert.rules:

# alert for any elasticsearch cluster that has a red states for > 5m
ALERT ElasticSearchClusterRedStatus
  IF max(es_cluster_status) by (cluster, team) == 2
  FOR 5m
  LABELS { severity = "page", kubernetes_cluster = "production-europe-west1-c", gcloud_project = "***" }
  ANNOTATIONS {
    summary = "ElasticSearch {{ $labels.cluster }} status is red",
    description = "ElasticSearch {{ $labels.cluster }}.{{ $labels.kubernetes_namespace }}.{{ $labels.kubernetes_cluster }} status is red for more than 5 minutes.",
  }

# alert for haproxy error rate going above 2% for > 5m
ALERT HaproxyErrorRateAboveThreshold
  IF sum(rate(haproxy_backend_http_responses_total{code="5xx"}[1m])) by (app,team,kubernetes_namespace) / sum(rate(haproxy_backend_http_responses_total[1m])) by (app,team,kubernetes_namespace) > 0.02
  FOR 5m
  LABELS { severity = "page", kubernetes_cluster = "production-europe-west1-c", gcloud_project = "***" }
  ANNOTATIONS {
    summary = "Error rate of {{ $labels.app }} is above threshold",
    description = "Error rate of {{ $labels.app }}.{{ $labels.kubernetes_namespace }}.{{ $labels.kubernetes_cluster }} is above threshold of 2% for more than 5 minutes.",
  }

# alert for nginx error rate going above 2% for > 5m
ALERT NginxErrorRateAboveThreshold
  IF sum(rate(nginx_http_requests_total{status=~"5[0-9]+"}[1m])) by (app,team,kubernetes_namespace) / sum(rate(nginx_http_requests_total[1m])) by (app,team,kubernetes_namespace) > 0.02
  FOR 5m
  LABELS { severity = "page", kubernetes_cluster = "production-europe-west1-c", gcloud_project = "***" }
  ANNOTATIONS {
    summary = "Error rate of {{ $labels.app }} is above threshold",
    description = "Error rate of {{ $labels.app }}.{{ $labels.kubernetes_namespace }}.{{ $labels.kubernetes_cluster }} is above threshold of 2% for more than 5 minutes.",
  }

# alert for directly instrumented error rate going above 2% for > 5m
ALERT InstrumentedErrorRateAboveThreshold
  IF sum(rate(http_requests_total{code=~"5[0-9]+"}[1m])) by (app,team,kubernetes_namespace) / sum(irate(http_requests_total[1m])) by (app,team,kubernetes_namespace) > 0.02
  FOR 5m
  LABELS { severity = "page", kubernetes_cluster = "production-europe-west1-c", gcloud_project = "***" }
  ANNOTATIONS {
    summary = "Error rate of {{ $labels.app }} is above threshold",
    description = "Error rate of {{ $labels.app }}.{{ $labels.kubernetes_namespace }}.{{ $labels.kubernetes_cluster }} is above threshold of 2% for more than 5 minutes.",
  }

ALERT ServiceUnavailable
  IF min(probe_success) by (app,team,kubernetes_namespace) < 1
  FOR 5m
  LABELS { severity = "page", kubernetes_cluster = "production-europe-west1-c", gcloud_project = "***" }
  ANNOTATIONS {
    summary = "Service {{ $labels.app }} is unavailable",
    description = "Service {{ $labels.app }}.{{ $labels.kubernetes_namespace }}.{{ $labels.kubernetes_cluster }} is unavailable for more than 5 minutes.",
  }

ALERT CertificateExpiringEarlyWarning
  IF sum(probe_ssl_earliest_cert_expiry) by (app,team,kubernetes_namespace) - time() < 86400 * 15
  FOR 5m
  LABELS { kubernetes_cluster = "production-europe-west1-c", gcloud_project = "***" }
  ANNOTATIONS {
    summary = "Certificate for service {{ $labels.app }} expires in 15 days",
    description = "Certificate for service {{ $labels.app }}.{{ $labels.kubernetes_namespace }}.{{ $labels.kubernetes_cluster }} expires in 15 days; redeploy the application to renew the certificate."
,
  }

ALERT CertificateExpiringLateWarning
  IF sum(probe_ssl_earliest_cert_expiry) by (app,team,kubernetes_namespace) - time() < 86400 * 5
  FOR 5m
  LABELS { severity = "page", kubernetes_cluster = "production-europe-west1-c", gcloud_project = "***" }
  ANNOTATIONS {
    summary = "Certificate for service {{ $labels.app }} expires in 5 days",
    description = "Certificate for service {{ $labels.app }}.{{ $labels.kubernetes_namespace }}.{{ $labels.kubernetes_cluster }} expires in 5 days; redeploy the application to renew the certificate.",

  }
@mattbostock

This comment has been minimized.

Copy link
Contributor

mattbostock commented Feb 28, 2017

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Feb 28, 2017

Similar has been requested before, the question is what if someone wants to use an annotation value in a label or another label value in a label?

As Matt suggests, this should probably be a target label that's plumbed through.

@cubranic

This comment has been minimized.

Copy link

cubranic commented Jul 24, 2017

@brian-brazil I'm sorry to comment on a report closed six months ago, but it seemed better than filing a new one.

I wanted to point out that:

  • the templating available to define alert labels is far richer than what can programmatically be done with target labels (regex rewriting)
  • sometimes it's really useful to have this alert-related labelling kept as part of the alert rule than a job
  • the semantics doesn't have to be complicated: e.g., labels are fully defined before annotations are added, and all values defined in LABELS are added at once at the end, so that they are not available except in ANNOTATIONS. (Just like variables defined by let in Lisps.)
@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 25, 2017

I don't see how those points are relevant here, the issue was in the expression.

@cubranic

This comment has been minimized.

Copy link

cubranic commented Jul 25, 2017

As you wish @brian-brazil, here is a new report: #2992.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.