Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.45.0 - cadvisor / malformed metrics #3162

Open
reefland opened this issue Aug 25, 2022 · 5 comments
Open

0.45.0 - cadvisor / malformed metrics #3162

reefland opened this issue Aug 25, 2022 · 5 comments

Comments

@reefland
Copy link

I have successfully deployed cadvisor 0.45.0 (tried v0.45.0-containerd-cri as well) as daemonset on K3S Kubernetes / Containerd. I've only applied the cadvisor-args.yaml overlay as the others did not seem relevant.

History
The bundled K3s (v1.24.3+k3s1) containerd is disabled as it does not support ZFS snapshotter. Instead I'm using the containerd from Ubuntu 22.04 (1.5.9-0ubuntu3) and while it functions perfectly with containers for K3s and ZFS snapshotter, it does not work properly with kubelet / cAdvisor / Prometheus as image= and container= are missing. And a simple Prometheus query such as:

container_cpu_usage_seconds_total{image!=""}

Returned an empty set.

What I See Now
It was suggested I try this cadvisor instead, and it is better.. almost but not quiet right. Hopefully I'm just missing something. Now that same Prometheus query returns 111 rows, here is an example for 3:

container_cpu_usage_seconds_total{container="cadvisor", container_label_io_kubernetes_container_name="alertmanager", container_label_io_kubernetes_pod_namespace="monitoring", cpu="total", endpoint="http", id="/kubepods/burstable/pod7c0573cd-bba4-4f94-960f-c54cce2bc50e/5ff787742594c67500f255b9926c305246807e92303b43a19c7b95ba1d13dd59", image="quay.io/prometheus/alertmanager:v0.24.0", instance="10.42.0.143:8080", job="monitoring/cadvisor-prometheus-podmonitor", name="5ff787742594c67500f255b9926c305246807e92303b43a19c7b95ba1d13dd59", namespace="cadvisor", pod="cadvisor-tqbj6"}

container_cpu_usage_seconds_total{container="cadvisor", container_label_io_kubernetes_container_name="application-controller", container_label_io_kubernetes_pod_namespace="argocd", cpu="total", endpoint="http", id="/kubepods/burstable/pod9a033e88-9e20-43ef-8632-4551484be608/cedd2605364b981d2b5ec2d5e1eb6ae23abc39d64acf984b85e4f73b8e0a2689", image="quay.io/argoproj/argocd:v2.4.11", instance="10.42.0.143:8080", job="monitoring/cadvisor-prometheus-podmonitor", name="cedd2605364b981d2b5ec2d5e1eb6ae23abc39d64acf984b85e4f73b8e0a2689", namespace="cadvisor", pod="cadvisor-tqbj6"}

container_cpu_usage_seconds_total{container="cadvisor", container_label_io_kubernetes_container_name="applicationset-controller", container_label_io_kubernetes_pod_namespace="argocd", cpu="total", endpoint="http", id="/kubepods/pod5fc900fe-c754-4fe6-a023-b132ab7b0693/6b7b4511e56a66368c210874739d34df90b229d4b69369556b2e9fcc0971abaa", image="quay.io/argoproj/argocd:v2.4.11", instance="10.42.0.143:8080", job="monitoring/cadvisor-prometheus-podmonitor", name="6b7b4511e56a66368c210874739d34df90b229d4b69369556b2e9fcc0971abaa", namespace="cadvisor", pod="cadvisor-tqbj6"}

What doesn't seem right:

  • All the containers now equal "cadvisor" instead of the value specified in container_label_io_kubernetes_container_name
  • All the namespace now equal "cadvisor" instead of the value specified in container_label_io_kubernetes_pod_namespace
  • All the pods now equal "cadvisor-tqbj6" instead of the value specified in id

A Prometheus Query of container_cpu_usage_seconds_total{image!="",container!="cadvisor"} returns an empty set.

Suggestions?

@BBQigniter
Copy link

Had the same issue - and was becoming desperate and pulling my hair. Finally this is my config I came up with and it seems to work with cadvisor v0.46 on a Kubernetes cluster v1.24.8 setup via Rancher.

    # CADVISOR SCRAPE JOB for extra installed cadvisor because of k8s v1.24 with containerd problems where some labels just have empty values on RKE clusters
    - job_name: "kubernetes-cadvisor"
      kubernetes_sd_configs:
        - role: pod  # we get needed info from the pods
          namespaces:
            names: 
              - monitoring  # in namespace monitoring
          selectors:
            - role: pod
              label: "app=cadvisor"  # and only select the cadvisor pods with this label set as source
      metric_relabel_configs:  # we relabel some labels inside the scraped metrics
        # this should look at the scraped metric and replace/add the label inside
        - source_labels: [container_label_io_kubernetes_pod_namespace]
          target_label: "namespace"
        - source_labels: [container_label_io_kubernetes_pod_name]
          target_label: "pod"
        - source_labels: [container_label_io_kubernetes_container_name]
          target_label: "container"

Now the container_*-metrics have labels that are needed in the Grafana-Dashboards we use here for Kubernetes clusters. For example:

container_memory_usage_bytes{container="cadvisor", container_label_io_kubernetes_container_name="cadvisor", container_label_io_kubernetes_pod_name="cadvisor-x6pfx", container_label_io_kubernetes_pod_namespace="monitoring", id="/kubepods/burstable/pod08586cc5-da59-499a-a60b-f7bf859ce7a5/77b8b44fce648487d4ed47dd9b143148e6cccb53ba2a73bfe9277d22f1a305d7", image="sha256:78367b75ee31241d19875ea7a1a6fa06aa42377bba54dbe8eac3f4722fd036b5", instance="10.42.2.139:8080", job="kubernetes-cadvisor", name="k8s_cadvisor_cadvisor-x6pfx_monitoring_08586cc5-da59-499a-a60b-f7bf859ce7a5_0", namespace="monitoring", pod="cadvisor-x6pfx"}

this blog https://valyala.medium.com/how-to-use-relabeling-in-prometheus-and-victoriametrics-8b90fc22c4b2 helped a lot to understand how the different relabel_configs work.

@reefland
Copy link
Author

That helped a little... I now have working container name, but still no pod returned by the external cadvisor:

container_cpu_usage_seconds_total{namespace="monitoring", container="grafana"}

container_cpu_usage_seconds_total{container="grafana", container_label_io_kubernetes_container_name="grafana", container_label_io_kubernetes_pod_namespace="monitoring", cpu="total", id="/kubepods/besteffort/pode00da7e6-0e0f-4cd9-aa75-b1e9bab32b38/8959546a3f87530a3059f775191d254d43db3a8ccf17bfa98495ab25a869326d", image="docker.io/grafana/grafana:9.2.4", instance="10.42.0.9:8080", job="kubernetes-cadvisor", name="8959546a3f87530a3059f775191d254d43db3a8ccf17bfa98495ab25a869326d", namespace="monitoring"}
  • See it has the name="8959546a3f87530a3059f775191d254d43db3a8ccf17bfa98495ab25a869326d" value and not the pod name.

Whereas the kubelet cadvisor does have the pod name:

container_cpu_usage_seconds_total{namespace="monitoring", pod=~"grafana.*"}:

container_cpu_usage_seconds_total{cpu="total", endpoint="https-metrics", id="/kubepods/besteffort/pode00da7e6-0e0f-4cd9-aa75-b1e9bab32b38", instance="testlinux", job="kubelet", metrics_path="/metrics/cadvisor", namespace="monitoring", node="testlinux", pod="grafana-ff88df95-lbvr2", service="prometheus-kubelet"}

Did you apply any of the overlays such as cadvisor-args.yaml ??

@BBQigniter
Copy link

hmm, strange.

not completely sure what's going on here on our systems as the cadvisor-stuff was set up by a colleague who left the company a few weeks ago, left a mess, and I have now to figure out how to fix the prometheus/prometheus-operator setup etc. :|

TLDR. I had a look and it seems the cadvisors run with following arguments :D

--housekeeping_interval=2s 
--max_housekeeping_interval=15s 
--event_storage_event_limit=default=0 
--event_storage_age_limit=default=0 
--enable_metrics=app,cpu,disk,diskIO,memory,network,process 
--docker_only 
--store_container_labels=false 
--whitelisted_container_labels=io.kubernetes.container.name, io.kubernetes.pod.name,io.kubernetes.pod.namespace, io.kubernetes.pod.name,io.kubernetes.pod.name

you see io.kubernetes.pod.name is in there multiple times 🤷 - where it's only seen once in the example

@reefland
Copy link
Author

Even, stranger.. I noticed that the only 2 I had working were the ones with NO SPACES in the --whitelisted_container_labels field as shown above (from cadvisor-args.yaml overlay file). I removed that space and it started to work!

image

Weird.

@MisderGAO
Copy link

Had the same issue - and was becoming desperate and pulling my hair. Finally this is my config I came up with and it seems to work with cadvisor v0.46 on a Kubernetes cluster v1.24.8 setup via Rancher.

    # CADVISOR SCRAPE JOB for extra installed cadvisor because of k8s v1.24 with containerd problems where some labels just have empty values on RKE clusters
    - job_name: "kubernetes-cadvisor"
      kubernetes_sd_configs:
        - role: pod  # we get needed info from the pods
          namespaces:
            names: 
              - monitoring  # in namespace monitoring
          selectors:
            - role: pod
              label: "app=cadvisor"  # and only select the cadvisor pods with this label set as source
      metric_relabel_configs:  # we relabel some labels inside the scraped metrics
        # this should look at the scraped metric and replace/add the label inside
        - source_labels: [container_label_io_kubernetes_pod_namespace]
          target_label: "namespace"
        - source_labels: [container_label_io_kubernetes_pod_name]
          target_label: "pod"
        - source_labels: [container_label_io_kubernetes_container_name]
          target_label: "container"

Now the container_*-metrics have labels that are needed in the Grafana-Dashboards we use here for Kubernetes clusters. For example:

container_memory_usage_bytes{container="cadvisor", container_label_io_kubernetes_container_name="cadvisor", container_label_io_kubernetes_pod_name="cadvisor-x6pfx", container_label_io_kubernetes_pod_namespace="monitoring", id="/kubepods/burstable/pod08586cc5-da59-499a-a60b-f7bf859ce7a5/77b8b44fce648487d4ed47dd9b143148e6cccb53ba2a73bfe9277d22f1a305d7", image="sha256:78367b75ee31241d19875ea7a1a6fa06aa42377bba54dbe8eac3f4722fd036b5", instance="10.42.2.139:8080", job="kubernetes-cadvisor", name="k8s_cadvisor_cadvisor-x6pfx_monitoring_08586cc5-da59-499a-a60b-f7bf859ce7a5_0", namespace="monitoring", pod="cadvisor-x6pfx"}

this blog https://valyala.medium.com/how-to-use-relabeling-in-prometheus-and-victoriametrics-8b90fc22c4b2 helped a lot to understand how the different relabel_configs work.

I have the same problem on

  • rke: 1.24.8
  • monitoring stack deployed from rancher-monitoring chart (where cadvisor doesn't existe)

container_cpu_usage_seconds_total

all results returned from above promQL does not have image field, it's quite strange.
the same monitoring chart works well in rke v1.20.8 (without cadvisor)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants