0.45.0 - cadvisor / malformed metrics #3162

reefland · 2022-08-25T18:15:58Z

I have successfully deployed cadvisor 0.45.0 (tried v0.45.0-containerd-cri as well) as daemonset on K3S Kubernetes / Containerd. I've only applied the cadvisor-args.yaml overlay as the others did not seem relevant.

History
The bundled K3s (v1.24.3+k3s1) containerd is disabled as it does not support ZFS snapshotter. Instead I'm using the containerd from Ubuntu 22.04 (1.5.9-0ubuntu3) and while it functions perfectly with containers for K3s and ZFS snapshotter, it does not work properly with kubelet / cAdvisor / Prometheus as image= and container= are missing. And a simple Prometheus query such as:

container_cpu_usage_seconds_total{image!=""}

Returned an empty set.

What I See Now
It was suggested I try this cadvisor instead, and it is better.. almost but not quiet right. Hopefully I'm just missing something. Now that same Prometheus query returns 111 rows, here is an example for 3:

container_cpu_usage_seconds_total{container="cadvisor", container_label_io_kubernetes_container_name="alertmanager", container_label_io_kubernetes_pod_namespace="monitoring", cpu="total", endpoint="http", id="/kubepods/burstable/pod7c0573cd-bba4-4f94-960f-c54cce2bc50e/5ff787742594c67500f255b9926c305246807e92303b43a19c7b95ba1d13dd59", image="quay.io/prometheus/alertmanager:v0.24.0", instance="10.42.0.143:8080", job="monitoring/cadvisor-prometheus-podmonitor", name="5ff787742594c67500f255b9926c305246807e92303b43a19c7b95ba1d13dd59", namespace="cadvisor", pod="cadvisor-tqbj6"}

container_cpu_usage_seconds_total{container="cadvisor", container_label_io_kubernetes_container_name="application-controller", container_label_io_kubernetes_pod_namespace="argocd", cpu="total", endpoint="http", id="/kubepods/burstable/pod9a033e88-9e20-43ef-8632-4551484be608/cedd2605364b981d2b5ec2d5e1eb6ae23abc39d64acf984b85e4f73b8e0a2689", image="quay.io/argoproj/argocd:v2.4.11", instance="10.42.0.143:8080", job="monitoring/cadvisor-prometheus-podmonitor", name="cedd2605364b981d2b5ec2d5e1eb6ae23abc39d64acf984b85e4f73b8e0a2689", namespace="cadvisor", pod="cadvisor-tqbj6"}

container_cpu_usage_seconds_total{container="cadvisor", container_label_io_kubernetes_container_name="applicationset-controller", container_label_io_kubernetes_pod_namespace="argocd", cpu="total", endpoint="http", id="/kubepods/pod5fc900fe-c754-4fe6-a023-b132ab7b0693/6b7b4511e56a66368c210874739d34df90b229d4b69369556b2e9fcc0971abaa", image="quay.io/argoproj/argocd:v2.4.11", instance="10.42.0.143:8080", job="monitoring/cadvisor-prometheus-podmonitor", name="6b7b4511e56a66368c210874739d34df90b229d4b69369556b2e9fcc0971abaa", namespace="cadvisor", pod="cadvisor-tqbj6"}

What doesn't seem right:

All the containers now equal "cadvisor" instead of the value specified in container_label_io_kubernetes_container_name
All the namespace now equal "cadvisor" instead of the value specified in container_label_io_kubernetes_pod_namespace
All the pods now equal "cadvisor-tqbj6" instead of the value specified in id

A Prometheus Query of container_cpu_usage_seconds_total{image!="",container!="cadvisor"} returns an empty set.

Suggestions?

The text was updated successfully, but these errors were encountered:

BBQigniter · 2022-12-21T07:44:41Z

Had the same issue - and was becoming desperate and pulling my hair. Finally this is my config I came up with and it seems to work with cadvisor v0.46 on a Kubernetes cluster v1.24.8 setup via Rancher.

    # CADVISOR SCRAPE JOB for extra installed cadvisor because of k8s v1.24 with containerd problems where some labels just have empty values on RKE clusters
    - job_name: "kubernetes-cadvisor"
      kubernetes_sd_configs:
        - role: pod  # we get needed info from the pods
          namespaces:
            names: 
              - monitoring  # in namespace monitoring
          selectors:
            - role: pod
              label: "app=cadvisor"  # and only select the cadvisor pods with this label set as source
      metric_relabel_configs:  # we relabel some labels inside the scraped metrics
        # this should look at the scraped metric and replace/add the label inside
        - source_labels: [container_label_io_kubernetes_pod_namespace]
          target_label: "namespace"
        - source_labels: [container_label_io_kubernetes_pod_name]
          target_label: "pod"
        - source_labels: [container_label_io_kubernetes_container_name]
          target_label: "container"

Now the container_*-metrics have labels that are needed in the Grafana-Dashboards we use here for Kubernetes clusters. For example:

container_memory_usage_bytes{container="cadvisor", container_label_io_kubernetes_container_name="cadvisor", container_label_io_kubernetes_pod_name="cadvisor-x6pfx", container_label_io_kubernetes_pod_namespace="monitoring", id="/kubepods/burstable/pod08586cc5-da59-499a-a60b-f7bf859ce7a5/77b8b44fce648487d4ed47dd9b143148e6cccb53ba2a73bfe9277d22f1a305d7", image="sha256:78367b75ee31241d19875ea7a1a6fa06aa42377bba54dbe8eac3f4722fd036b5", instance="10.42.2.139:8080", job="kubernetes-cadvisor", name="k8s_cadvisor_cadvisor-x6pfx_monitoring_08586cc5-da59-499a-a60b-f7bf859ce7a5_0", namespace="monitoring", pod="cadvisor-x6pfx"}

this blog https://valyala.medium.com/how-to-use-relabeling-in-prometheus-and-victoriametrics-8b90fc22c4b2 helped a lot to understand how the different relabel_configs work.

reefland · 2022-12-22T03:22:51Z

That helped a little... I now have working container name, but still no pod returned by the external cadvisor:

container_cpu_usage_seconds_total{namespace="monitoring", container="grafana"}

container_cpu_usage_seconds_total{container="grafana", container_label_io_kubernetes_container_name="grafana", container_label_io_kubernetes_pod_namespace="monitoring", cpu="total", id="/kubepods/besteffort/pode00da7e6-0e0f-4cd9-aa75-b1e9bab32b38/8959546a3f87530a3059f775191d254d43db3a8ccf17bfa98495ab25a869326d", image="docker.io/grafana/grafana:9.2.4", instance="10.42.0.9:8080", job="kubernetes-cadvisor", name="8959546a3f87530a3059f775191d254d43db3a8ccf17bfa98495ab25a869326d", namespace="monitoring"}

See it has the name="8959546a3f87530a3059f775191d254d43db3a8ccf17bfa98495ab25a869326d" value and not the pod name.

Whereas the kubelet cadvisor does have the pod name:

container_cpu_usage_seconds_total{namespace="monitoring", pod=~"grafana.*"}:

container_cpu_usage_seconds_total{cpu="total", endpoint="https-metrics", id="/kubepods/besteffort/pode00da7e6-0e0f-4cd9-aa75-b1e9bab32b38", instance="testlinux", job="kubelet", metrics_path="/metrics/cadvisor", namespace="monitoring", node="testlinux", pod="grafana-ff88df95-lbvr2", service="prometheus-kubelet"}

Did you apply any of the overlays such as cadvisor-args.yaml ??

BBQigniter · 2022-12-22T13:37:09Z

hmm, strange.

not completely sure what's going on here on our systems as the cadvisor-stuff was set up by a colleague who left the company a few weeks ago, left a mess, and I have now to figure out how to fix the prometheus/prometheus-operator setup etc. :|

TLDR. I had a look and it seems the cadvisors run with following arguments :D

--housekeeping_interval=2s 
--max_housekeeping_interval=15s 
--event_storage_event_limit=default=0 
--event_storage_age_limit=default=0 
--enable_metrics=app,cpu,disk,diskIO,memory,network,process 
--docker_only 
--store_container_labels=false 
--whitelisted_container_labels=io.kubernetes.container.name, io.kubernetes.pod.name,io.kubernetes.pod.namespace, io.kubernetes.pod.name,io.kubernetes.pod.name

you see io.kubernetes.pod.name is in there multiple times 🤷 - where it's only seen once in the example

reefland · 2022-12-22T14:46:35Z

Even, stranger.. I noticed that the only 2 I had working were the ones with NO SPACES in the --whitelisted_container_labels field as shown above (from cadvisor-args.yaml overlay file). I removed that space and it started to work!

Weird.

MisderGAO · 2023-01-11T18:09:01Z

Had the same issue - and was becoming desperate and pulling my hair. Finally this is my config I came up with and it seems to work with cadvisor v0.46 on a Kubernetes cluster v1.24.8 setup via Rancher.

    # CADVISOR SCRAPE JOB for extra installed cadvisor because of k8s v1.24 with containerd problems where some labels just have empty values on RKE clusters
    - job_name: "kubernetes-cadvisor"
      kubernetes_sd_configs:
        - role: pod  # we get needed info from the pods
          namespaces:
            names: 
              - monitoring  # in namespace monitoring
          selectors:
            - role: pod
              label: "app=cadvisor"  # and only select the cadvisor pods with this label set as source
      metric_relabel_configs:  # we relabel some labels inside the scraped metrics
        # this should look at the scraped metric and replace/add the label inside
        - source_labels: [container_label_io_kubernetes_pod_namespace]
          target_label: "namespace"
        - source_labels: [container_label_io_kubernetes_pod_name]
          target_label: "pod"
        - source_labels: [container_label_io_kubernetes_container_name]
          target_label: "container"

Now the container_*-metrics have labels that are needed in the Grafana-Dashboards we use here for Kubernetes clusters. For example:

container_memory_usage_bytes{container="cadvisor", container_label_io_kubernetes_container_name="cadvisor", container_label_io_kubernetes_pod_name="cadvisor-x6pfx", container_label_io_kubernetes_pod_namespace="monitoring", id="/kubepods/burstable/pod08586cc5-da59-499a-a60b-f7bf859ce7a5/77b8b44fce648487d4ed47dd9b143148e6cccb53ba2a73bfe9277d22f1a305d7", image="sha256:78367b75ee31241d19875ea7a1a6fa06aa42377bba54dbe8eac3f4722fd036b5", instance="10.42.2.139:8080", job="kubernetes-cadvisor", name="k8s_cadvisor_cadvisor-x6pfx_monitoring_08586cc5-da59-499a-a60b-f7bf859ce7a5_0", namespace="monitoring", pod="cadvisor-x6pfx"}

this blog https://valyala.medium.com/how-to-use-relabeling-in-prometheus-and-victoriametrics-8b90fc22c4b2 helped a lot to understand how the different relabel_configs work.

I have the same problem on

rke: 1.24.8
monitoring stack deployed from rancher-monitoring chart (where cadvisor doesn't existe)

container_cpu_usage_seconds_total

all results returned from above promQL does not have image field, it's quite strange.
the same monitoring chart works well in rke v1.20.8 (without cadvisor)

BBQigniter mentioned this issue Feb 3, 2023

[BUG] metrics from kubelets cadvisor are missing labels rancher/rancher#39185

Open

ivanfavi mentioned this issue Apr 7, 2023

Some of the basic dashboards not working right prometheus-community/helm-charts#3058

Closed

muhammad-asn mentioned this issue Apr 26, 2023

The CPU and RAM suggestions return question mark or none robusta-dev/krr#26

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.45.0 - cadvisor / malformed metrics #3162

0.45.0 - cadvisor / malformed metrics #3162

reefland commented Aug 25, 2022

BBQigniter commented Dec 21, 2022

reefland commented Dec 22, 2022

BBQigniter commented Dec 22, 2022

reefland commented Dec 22, 2022

MisderGAO commented Jan 11, 2023

0.45.0 - cadvisor / malformed metrics #3162

0.45.0 - cadvisor / malformed metrics #3162

Comments

reefland commented Aug 25, 2022

BBQigniter commented Dec 21, 2022

reefland commented Dec 22, 2022

BBQigniter commented Dec 22, 2022

reefland commented Dec 22, 2022

MisderGAO commented Jan 11, 2023