Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

federate fails with labels like pod-template-hash #5461

Closed
caarlos0 opened this Issue Apr 15, 2019 · 6 comments

Comments

Projects
None yet
2 participants
@caarlos0
Copy link

caarlos0 commented Apr 15, 2019

Bug Report

What did you do?
I'm federating a prometheus running in a kubernetes cluster into another prometheus instance, something like:

  - job_name: 'federate'
    scrape_interval: 30s
    scrape_timeout: 30s
    honor_labels: true
    metrics_path: '/federate'
    params:
      'match[]':
        - '{job=~"(kubernetes-pods|kubernetes-service-endpoints|kubernetes-nodes-cadvisor|kubernetes-nodes)"}'
    static_configs:
      - targets:
        - 'foo.bar:80'
        labels:
          service: kubernetes
          env: explore

What did you expect to see?

Metrics being federated

What did you see instead? Under which circumstances?

expected equal, got "INVALID"

I did curl -s add | prom metrics check and found this:

error while linting: text format parsing error in line 14936: expected '=' after label name, found '-'

Line 14936 has this metric:

controller_runtime_reconcile_time_seconds_bucket{controller="jupyter-controller",instance="172.16.3.15:8080",job="kubernetes-pods",kubernetes_namespace="operator",kubernetes_pod_name="manager-6b7d97c669-sr9xs",le="5",operator="manager",pod-template-hash="6b7d97c669"} 8 1555338376838

Seems like the problem is this pod-template-hash="6b7d97c669". It works on the prometheus running inside the cluster, but when federating, the other prometheus fails to read the metrics because they are indeed invalid.

Environment

  • System information:

    irrelevant afaik

  • Prometheus version:

    2.9.0 on both prometheus instances

  • Alertmanager version:

    irrelevant afaik

  • Prometheus configuration file:

its the default generated by the prometheus helm chart.

  • Alertmanager configuration file:
    irrelevant.

  • Logs:

Apr 15 15:07:53 i-ops-us-west2-prometheus-1 prometheus[21819]: level=warn ts=2019-04-15T15:07:53.563Z caller=scrape.go:933 component="scrape manager" scrape_pool=federate target="http://foo.bar:80/federate?match%5B%5D=%7Bjob%3D~%22%28kubernetes-pods%7Ckubernetes-service-endpoints%7Ckubernetes-nodes-cadvisor%7Ckubernetes-nodes%29%22%7D" msg="append failed" err="expected equal, got \"INVALID\""
@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Apr 15, 2019

Hmm, I have broken this... Let me send a patch. Well I suspected #5443 but not so sure now...

@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Apr 15, 2019

Can you still share the configuration of the Prometheus server running inside the cluster?

@caarlos0

This comment has been minimized.

Copy link
Author

caarlos0 commented Apr 15, 2019

Can you still share the configuration of the Prometheus server running inside the cluster?

sure!

global:
  evaluation_interval: 1m
  scrape_interval: 1m
  scrape_timeout: 10s
rule_files:
- /etc/config/rules
- /etc/config/alerts
scrape_configs:
- job_name: prometheus
  static_configs:
  - targets:
    - localhost:9090
- bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  job_name: kubernetes-apiservers
  kubernetes_sd_configs:
  - role: endpoints
  relabel_configs:
  - action: keep
    regex: default;kubernetes;https
    source_labels:
    - __meta_kubernetes_namespace
    - __meta_kubernetes_service_name
    - __meta_kubernetes_endpoint_port_name
  scheme: https
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    insecure_skip_verify: true
- bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  job_name: kubernetes-nodes
  kubernetes_sd_configs:
  - role: node
  relabel_configs:
  - action: labelmap
    regex: __meta_kubernetes_node_label_(.+)
  - replacement: kubernetes.default.svc:443
    target_label: __address__
  - regex: (.+)
    replacement: /api/v1/nodes/$1/proxy/metrics
    source_labels:
    - __meta_kubernetes_node_name
    target_label: __metrics_path__
  scheme: https
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    insecure_skip_verify: true
- bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  job_name: kubernetes-nodes-cadvisor
  kubernetes_sd_configs:
  - role: node
  relabel_configs:
  - action: labelmap
    regex: __meta_kubernetes_node_label_(.+)
  - replacement: kubernetes.default.svc:443
    target_label: __address__
  - regex: (.+)
    replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
    source_labels:
    - __meta_kubernetes_node_name
    target_label: __metrics_path__
  scheme: https
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    insecure_skip_verify: true
- job_name: kubernetes-service-endpoints
  kubernetes_sd_configs:
  - role: endpoints
  relabel_configs:
  - action: keep
    regex: true
    source_labels:
    - __meta_kubernetes_service_annotation_prometheus_io_scrape
  - action: replace
    regex: (https?)
    source_labels:
    - __meta_kubernetes_service_annotation_prometheus_io_scheme
    target_label: __scheme__
  - action: replace
    regex: (.+)
    source_labels:
    - __meta_kubernetes_service_annotation_prometheus_io_path
    target_label: __metrics_path__
  - action: replace
    regex: ([^:]+)(?::\d+)?;(\d+)
    replacement: $1:$2
    source_labels:
    - __address__
    - __meta_kubernetes_service_annotation_prometheus_io_port
    target_label: __address__
  - action: labelmap
    regex: __meta_kubernetes_service_label_(.+)
  - action: replace
    source_labels:
    - __meta_kubernetes_namespace
    target_label: kubernetes_namespace
  - action: replace
    source_labels:
    - __meta_kubernetes_service_name
    target_label: kubernetes_name
  - action: replace
    source_labels:
    - __meta_kubernetes_pod_node_name
    target_label: kubernetes_node
- honor_labels: true
  job_name: prometheus-pushgateway
  kubernetes_sd_configs:
  - role: service
  relabel_configs:
  - action: keep
    regex: pushgateway
    source_labels:
    - __meta_kubernetes_service_annotation_prometheus_io_probe
- job_name: kubernetes-services
  kubernetes_sd_configs:
  - role: service
  metrics_path: /probe
  params:
    module:
    - http_2xx
  relabel_configs:
  - action: keep
    regex: true
    source_labels:
    - __meta_kubernetes_service_annotation_prometheus_io_probe
  - source_labels:
    - __address__
    target_label: __param_target
  - replacement: blackbox
    target_label: __address__
  - source_labels:
    - __param_target
    target_label: instance
  - action: labelmap
    regex: __meta_kubernetes_service_label_(.+)
  - source_labels:
    - __meta_kubernetes_namespace
    target_label: kubernetes_namespace
  - source_labels:
    - __meta_kubernetes_service_name
    target_label: kubernetes_name
- job_name: kubernetes-pods
  kubernetes_sd_configs:
  - role: pod
  relabel_configs:
  - action: keep
    regex: true
    source_labels:
    - __meta_kubernetes_pod_annotation_prometheus_io_scrape
  - action: replace
    regex: (.+)
    source_labels:
    - __meta_kubernetes_pod_annotation_prometheus_io_path
    target_label: __metrics_path__
  - action: replace
    regex: ([^:]+)(?::\d+)?;(\d+)
    replacement: $1:$2
    source_labels:
    - __address__
    - __meta_kubernetes_pod_annotation_prometheus_io_port
    target_label: __address__
  - action: labelmap
    regex: __meta_kubernetes_pod_label_(.+)
  - action: replace
    source_labels:
    - __meta_kubernetes_namespace
    target_label: kubernetes_namespace
  - action: replace
    source_labels:
    - __meta_kubernetes_pod_name
    target_label: kubernetes_pod_name
@caarlos0

This comment has been minimized.

Copy link
Author

caarlos0 commented Apr 15, 2019

Hmm, I have broken this... Let me send a patch.

it happens even to the best of us 😀
thanks for the hard work!

@caarlos0

This comment has been minimized.

Copy link
Author

caarlos0 commented Apr 15, 2019

BTW: any workarounds? I tried to drop that label but couldn't because the label name is invalid 🤔

@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Apr 15, 2019

I've got no workaround for the federation server. You could drop the offending label on the inside Prometheus server:

relabel_configs:
  ...
  - action: labeldrop
    regex: "pod-template-hash"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.