Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong addresses discovered from k8s for daemonsets in host networking #2871

Open
discordianfish opened this Issue Jun 23, 2017 · 3 comments

Comments

Projects
None yet
4 participants
@discordianfish
Copy link
Member

discordianfish commented Jun 23, 2017

Somehow prometheus scrapes the wrong IPs for my node-exporter pods. They are using host networking and I'm running Prometheus 1.6.3.

Update: Looks like it's an upstream issue after all: __address__ as discovered is already wrong, I assumed it would use the pod IP. Filled kubernetes/kubernetes#48396 but keeping this open to track that.

It looks like the SD state in Prometheus itself is correct, both these commands:

Prometheus:

$ curl 'http://localhost:8001/api/v1/namespaces/default/pods/prometheus-1/proxy/api/v1/targets' | jq '.data.activeTargets[]|select(.discoveredLabels.__meta_kubernetes_endpoints_name == "node-exporter")|{"name": .discoveredLabels.__meta_kubernetes_pod_name, "ip": .discoveredLabels.__meta_kubernetes_pod_ip}' |jq -s 'sort_by(.ip)'

Kubernetes:

$ kubectl get pod -l app=node-exporter -o json|jq '.items[]|{"name": .metadata.name, "ip": .status.hostIP}'|jq -s 'sort_by(.ip)'

Show this list:

[
  {
    "name": "node-exporter-2b14f",
    "ip": "10.32.130.2"
  },
  {
    "name": "node-exporter-nkqhr",
    "ip": "10.32.130.3"
  },
  {
    "name": "node-exporter-7w9pq",
    "ip": "10.32.130.4"
  },
  {
    "name": "node-exporter-j3hfm",
    "ip": "10.32.130.5"
  },
  {
    "name": "node-exporter-pp1sq",
    "ip": "10.32.130.6"
  }
]

But Prometheus tries to scrape node-exporter-2b14f for example on 10.32.130.3 (all others are inconsistent too) while it's actually running on __meta_kubernetes_pod_host_ip as correctly shown in target UI's tooltip:

Before relabeling:
__address__="10.32.130.3:9100"
__meta_kubernetes_endpoint_port_name="http-metrics"
__meta_kubernetes_endpoint_port_protocol="TCP"
__meta_kubernetes_endpoint_ready="false"
__meta_kubernetes_endpoints_name="node-exporter"
__meta_kubernetes_namespace="default"
__meta_kubernetes_pod_annotation_kubernetes_io_created_by="{"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"DaemonSet","namespace":"default","name":"node-exporter","uid":"8f589710-4c7a-11e7-b4bd-42010af00125","apiVersion":"extensions","resourceVersion":"10138846"}} "
__meta_kubernetes_pod_container_name="node-exporter"
__meta_kubernetes_pod_container_port_name="scrape"
__meta_kubernetes_pod_container_port_number="9100"
__meta_kubernetes_pod_container_port_protocol="TCP"
__meta_kubernetes_pod_host_ip="10.32.130.2"
__meta_kubernetes_pod_ip="10.32.130.2"
__meta_kubernetes_pod_label_app="node-exporter"
__meta_kubernetes_pod_label_pod_template_generation="1"
__meta_kubernetes_pod_name="node-exporter-2b14f"
__meta_kubernetes_pod_node_name="gke-xx-at-pool-small-3338bda2-l2dp"
__meta_kubernetes_pod_ready="true"
__meta_kubernetes_service_annotation_kubectl_kubernetes_io_last_applied_configuration="{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{"prometheus.io/scrape":"true"},"labels":{"app":"node-exporter","k8s-app":"node-exporter"},"name":"node-exporter","namespace":"default"},"spec":{"ports":[{"name":"http-metrics","port":9100,"protocol":"TCP"}],"selector":{"app":"node-exporter"}}} "
__meta_kubernetes_service_annotation_prometheus_io_scrape="true"
__meta_kubernetes_service_label_app="node-exporter"
__meta_kubernetes_service_label_k8s_app="node-exporter"
__meta_kubernetes_service_name="node-exporter"
__metrics_path__="/metrics"
__scheme__="http"
job="kubernetes-service-endpoints"

I'm using the prometheus/k8s sample config 1:1 for this job:

    # Scrape config for service endpoints.
    #
    # The relabeling allows the actual service scrape endpoint to be configured
    # via the following annotations:
    #
    # * `prometheus.io/scrape`: Only scrape services that have a value of `true`
    # * `prometheus.io/scheme`: If the metrics endpoint is secured then you will need
    # to set this to `https` & most likely set the `tls_config` of the scrape config.
    # * `prometheus.io/path`: If the metrics path is not `/metrics` override this.
    # * `prometheus.io/port`: If the metrics are exposed on a different port to the
    # service then set this appropriately.
    - job_name: 'kubernetes-service-endpoints'

      kubernetes_sd_configs:
      - role: endpoints

      relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
        action: replace
        target_label: __scheme__
        regex: (https?)
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
        action: replace
        target_label: __address__
        regex: (.+)(?::\d+);(\d+)
        replacement: $1:$2
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
        target_label: kubernetes_name

All these inconsistencies happened after I (accidentally, thanks firebase billing UI) shut down my cluster and restarted it. It appears that while the SD updates, the relabling rules don't seem to apply.

@discordianfish

This comment has been minimized.

Copy link
Member Author

discordianfish commented Jul 1, 2017

Just ran into this again and again with a daemonset. Possibly related?

@discordianfish discordianfish changed the title Relabling results become stale Wrong addresses discovered from k8s for daemonsets in host networking Jul 2, 2017

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Jul 3, 2017

Getting the node IP if the daemon set pods run in the host network namespace is intended behavior AFAIK. Only case where that's not working would be if nodes couldn't route to each other, or am I missing something?

@matthiasr

This comment has been minimized.

Copy link
Contributor

matthiasr commented Jul 3, 2017

as I understand (from the linked issue) the underlying problem is that the Endpoints object does not match the Pod state, mixing IPs and names freely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.