Cadvisor not reporting Container/Image metadata #473

cfchad · 2019-05-13T14:45:43Z

Describe the bug
When making the call to retrieve metrics via Cadvisor, the Container and Images values are empty in all values.

container_tasks_state{container="",container_name="",id="/system.slice/lxd.socket",image="",name="",namespace="",pod="",pod_name="",state="running"} 0 1557525150119

To Reproduce
Install k3s via multipass https://medium.com/@zhimin.wen/running-k3s-with-multipass-on-mac-fbd559966f7c

kubectl get --raw /api/v1/nodes/k3s/proxy/metrics/cadvisor

Expected behavior
container and image values should be populated

Additional context
Wondering if it might be related to #213

cfchad · 2019-05-15T20:30:18Z

@ibuildthecloud do you have any insight into this? Did you happen to run into any issues of the sorts? I saw that you are hosting the fork for cadvisor for k3s.

fore5fire · 2019-06-05T18:38:40Z

I'm running into the same issue, making cadvisor metrics in prometheus basically unusable since I think container_name is the only way to distinguish between certain aggregated and individual metrics.

cfchad · 2019-06-07T13:21:44Z

@lsmith130 seems like a pretty important metric. I am unable to see memory statistics at the pod and container level.

ahmedmagdiosman · 2019-07-14T22:01:25Z

I've ran into this as well with some metrics like container_network_receive_bytes_total

using Prometheus returns (just showing 1 entry, all others are similar):

{beta_kubernetes_io_arch="amd64",beta_kubernetes_io_os="linux",id="/",instance="node1",interface="enp2s0",job="kubernetes-cadvisor",kubernetes_io_arch="amd64",kubernetes_io_hostname="node1",kubernetes_io_os="linux"}`

value: 318908420765

I found these issues with metrics related to Disk I/O and Network I/O.

Edit: this could be related to google/cadvisor#2249 (containerd instead of docker runtime)

carlosedp · 2019-08-21T17:13:48Z

There are other metrics that also don't have pod and image metadata like container_cpu_usage_seconds_total and container_memory_working_set_bytes, for example.

On K3s:

container_cpu_usage_seconds_total{cpu="total",endpoint="https-metrics",id="/kubepods/burstable/pod52bf597a-c3ac-11e9-93c5-080027881c8e",instance="10.0.2.15:10250",job="kubelet",namespace="monitoring",node="ubuntu-k3s",pod="prometheus-k8s-0",pod_name="prometheus-k8s-0",service="kubelet"}

container_memory_working_set_bytes{endpoint="https-metrics",id="/kubepods/burstable/pod52bf597a-c3ac-11e9-93c5-080027881c8e",instance="10.0.2.15:10250",job="kubelet",namespace="monitoring",node="ubuntu-k3s",pod="prometheus-k8s-0",pod_name="prometheus-k8s-0",service="kubelet"}

K8s:

container_cpu_usage_seconds_total{container="prometheus",container_name="prometheus",cpu="total",endpoint="http-metrics",id="/kubepods/burstable/pod9ac8dd9a-544e-4389-a00b-0a6441f95b22/8c63324993ebf607921e797588b22e4e6792001ac0f10a4291b910818e3e26b5",image="prom/prometheus@sha256:8f34c18cf2ccaf21e361afd18e92da2602d0fa23a8917f759f906219242d8572",instance="10.0.2.15:10255",job="kubelet",name="k8s_prometheus_prometheus-k8s-0_monitoring_9ac8dd9a-544e-4389-a00b-0a6441f95b22_0",namespace="monitoring",node="minikube",pod="prometheus-k8s-0",pod_name="prometheus-k8s-0",service="kubelet"}

container_memory_working_set_bytes{container="POD",container_name="POD",endpoint="http-metrics",id="/kubepods/burstable/pod9ac8dd9a-544e-4389-a00b-0a6441f95b22/f63eba5bae95994f82e873e64e834a07bea0f6bcb14ce89ffa6f090cf02b57d7",image="k8s.gcr.io/pause:3.1",instance="10.0.2.15:10255",job="kubelet",name="k8s_POD_prometheus-k8s-0_monitoring_9ac8dd9a-544e-4389-a00b-0a6441f95b22_0",namespace="monitoring",node="minikube",pod="prometheus-k8s-0",pod_name="prometheus-k8s-0",service="kubelet"}

This breaks many metrics from the monitoring stack and rules provided by https://github.com/carlosedp/cluster-monitoring and the kube-prometheus project libs.

ibuildthecloud · 2019-08-22T16:05:54Z

@carlosedp are you running k3s with the embedded containerd or with docker?

carlosedp · 2019-08-22T16:07:58Z

Did the default deploy with k3s server.

cfchad · 2019-08-22T17:43:18Z

I believe running it with docker does fix many of the issues. but I see the recommendation to use containerd instead. not sure what the cost benefit would be between having the metrics vs using docker with k3s

geekdave · 2019-08-22T17:56:03Z

To expand on @cfchad 's comment, the docs state:

k3s includes and defaults to containerd. Why? Because it’s just plain better. If you want to run with Docker first stop and think, “Really? Do I really want more headache?” If still yes then you just need to run the agent with the --docker flag.

https://rancher.com/docs/k3s/latest/en/configuration/#containerd-and-docker

@ibuildthecloud before I try switching to --docker would you mind speaking to the above disclaimer? What kinds of headaches do you imagine users would face by switching to docker? I'd love to have cadvisor metrics working with k3s so I can understand my container resource usage better, but I want to know what I'm signing up for, and if I'm trading one problem for another. 😄

carlosedp · 2019-08-22T18:49:01Z

I deployed with Docker but now my ingress routes are not working. Any difference on creating ingress resources by using Docker?

carlosedp · 2019-08-22T20:18:06Z

@ibuildthecloud confirming that when K3s is deployed with Docker as runtime, the cadvisor metrics have all metadata to be used on the monitoring stack:

container_cpu_usage_seconds_total{container="POD",container_name="POD",cpu="total",endpoint="https-metrics",id="/kubepods/burstable/pod1fbbf5ec-c519-11e9-a3e3-080027881c8e/a5187b99b06f032724be1076852701735a3f40573d1e69e5f180c531a2ac4ab2",image="k8s.gcr.io/pause:3.1",instance="10.0.2.15:10250",job="kubelet",name="k8s_POD_prometheus-k8s-0_monitoring_1fbbf5ec-c519-11e9-a3e3-080027881c8e_0",namespace="monitoring",node="ubuntu-k3s",pod="prometheus-k8s-0",pod_name="prometheus-k8s-0",service="kubelet"}

ibuildthecloud · 2019-08-22T22:31:50Z

@carlosedp Thanks for finding that out. We'll have to look into why containerd doesn't report the metrics.

geekdave · 2019-08-23T16:09:33Z

@carlosedp Regarding your comment:

I deployed with Docker but now my ingress routes are not working.

Saw your tweet about the Traefik metrics so I was wondering if you had to do anything special to get ingress working again.

carlosedp · 2019-08-23T16:30:10Z

Ah yes, please disregard that comment. Was my environment that was messed up.
If you edit the Traefik helm chart with k3s kubectl edit helmchart traefik -n kube-system and add metrics.prometheus.enabled: "true" config to spec.set, it starts exposing the metrics. Then rebuild the monitoring stack with Traefik module on.

I opened a PR to K3s so it will expose the Traefik metrics by default when it gets merged.

ludopaquet · 2019-09-24T08:53:49Z

Hello, any news on this topic ? I will try to change my k3s cluster to add docker too.

carlosedp · 2019-11-14T14:08:58Z

Hi @ibuildthecloud , any news on why K3s is not reporting cadvisor metrics without Docker as the runtime? Thanks!

ludopaquet · 2019-11-14T14:10:11Z

Hello, I think it's ok now with 0.10.x @carlosedp

carlosedp · 2019-11-14T14:25:14Z

Cool, gonna check and report back! Thanks!

carlosedp · 2019-11-14T15:32:57Z

Tested on K3s v0.10.2 and metrics are generated without requiring --docker as runtime.

cjellick · 2019-11-14T15:35:29Z

Thanks for checking!

carlosedp · 2019-11-14T18:50:25Z

I might have spoken too early. Starting from fresh K3s, I deployed the stack and CPU metrics still doesn't show up.

Here is an example rule that fails:

record: node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate
expr: sum
  by(namespace, pod, container) (rate(container_cpu_usage_seconds_total{container!="POD",image!="",job="kubelet"}[5m]))
  * on(namespace, pod) group_left(node) max by(namespace, pod, node) (kube_pod_info)

The metrics don't have image and container tags so metrics are blank. Can you reopen this @cjellick.

pckbls · 2019-12-02T16:18:12Z

I'm running k3s v1.0.0 (18bd921) on a bunch of Raspberry Pis and also ran into this particular problem. Unfortunately neither the containerd nor docker container engines populate the image and container fields.

$ kubectl get --raw /api/v1/nodes/rpi-k3s-master/proxy/metrics/cadvisor | grep container_cpu_usage_seconds_total

[...]
container_cpu_usage_seconds_total{container="",cpu="total",id="/kubepods/burstable/podcf2436bb-d82f-4464-9ede-df28261829cb/b52c4241d9a75c5e8a375a69b8f8dd38c25df4820c236ca7c99b7600d0f598d5",image="",name="",namespace="",pod=""} 0.088365464 1575303361185
container_cpu_usage_seconds_total{container="",cpu="total",id="/system.slice/docker.service",image="",name="",namespace="",pod=""} 3.604857392 1575303359295
container_cpu_usage_seconds_total{container="",cpu="total",id="/systemd/system.slice",image="",name="",namespace="",pod=""} 2867.487169506 1575303363690

Any idea how to fix or at least further investigate this issue? I do lack a bit of background knowledge in that area.

@carlosedp a few months back you wrote:

I deployed with Docker but now my ingress routes are not working. Any difference on creating ingress resources by using Docker?

I have also noticed that when using the Docker engine, I cannot access services from the outside world. I could address this problem by running sudo iptables -P FORWARD ACCEPT on all my nodes.

NicklasWallgren · 2019-12-03T16:37:48Z

I have also encountered this particular issue since k3s 0.9+.

container_cpu_usage_seconds_total is no longer outputting relevant metadata, the image field is missing.

NicklasWallgren · 2019-12-03T16:51:39Z

container_memory_rss have also lost metadata, such as container.

borg286 · 2020-01-08T16:59:07Z

Does anyone have a reference to where cadvisor calculates the container name? Perhaps k3s is simply not specifying enough data when it creates containers.

Vad1mo · 2020-03-06T22:33:03Z

I tried with v1.17.3+k3s1 (5b17a17)

the --kubelet-arg containerd=/run/k3s/containerd/containerd.sock is set, I can see the settings running in the current process

ps aux | grep /usr/local/bin/k3s
root       543 12.7  2.0 799424 671676 ?       Ssl  23:25   0:45 /usr/local/bin/k3s server --no-deploy traefik --default-local-storage-path /data --node-external-ip 192.168.10.131 --kubelet-arg containerd=/run/k3s/containerd/containerd.sock
root      6781  0.0  0.0   6180   884 pts/0    S+   23:31   0:00 grep /usr/local/bin/k3s

However the node-exporter doesn't show container_cpu_**, I guess the metrics are provided via Cadvisor only

brondum · 2020-03-17T18:40:53Z

@Vad1mo Did you ever figure this one out ?

Vad1mo · 2020-03-17T22:28:19Z

yes it works
I added --kubelet-arg containerd=/run/k3s/containerd/containerd.sock

and Here my helmfile that i use from code-chris/helm-charts

repositories:
  - name: stable
    url: "https://kubernetes-charts.storage.googleapis.com/"
  - name: code-chris
    #url: https://code-chris.github.io/helm-charts -- Wait until 8gears patch is merged
    url: git+https://github.com/8gears/helm-charts@charts/cadvisor?sparse=1
releases:
  - name: "cadvisor"
    chart: "code-chris/cadvisor"
    namespace: "monitoring"
    values:
      - metrics:
          enabled: true
      - resources:
          limits:
            cpu: 500m
            memory: 256Mi
          requests:
            cpu: 100m
            memory: 128Mi
  - name: "prometheus-operator"
    chart: "stable/prometheus-operator"
    # version: "6.2.1"
    namespace: "monitoring"

brondum · 2020-03-18T07:36:14Z

@Vad1mo Any reason to use code-chris' cadvisor instead of the "builtin" metrics server from k3s ?

Vad1mo · 2020-03-18T07:45:23Z

metrics server was designed to provide metrics used for autoscaling says the FAQ.
cadvisor exposes all the container_* metrics
prometheus-operator comes with dashboards that need container_* metrics, we wanted to reuse that.

I am not sure if the metrics server is exposes a similar amount of metrics as cadvisor.

brondum · 2020-03-18T07:49:54Z

It seems to do that actually, at least from k3s perspective, except i am still missing those node_namespace_pod_container.
That is whats bugging the dashboards i think.

Vad1mo · 2020-03-18T07:55:43Z

Do you have a reference at hand what is exposed?
I would like to compare them

brondum · 2020-03-18T08:04:40Z

I think you have a point there! And when i look at my scrape config i actually just scrape the following:
https://kubernetes.default.svc:443/api/v1/nodes/<nodename>/proxy/metrics/cadvisor.

Maybe that does not contain those.

Vad1mo · 2020-03-18T08:12:15Z

thx I see some similar metrics, do you use the prometheus operator and its dashboard?

brondum · 2020-03-18T08:13:49Z

I actually dont use the operator, i wanted to go low tech and learn some prometheus scrape configs instead of using the CRDs. but i do use the mixin dashboards. which is where i have my problems :D

Vad1mo · 2020-03-18T08:19:10Z

would you mind having a quick chat, i think we are working on something similar https://meet.google.com/xjg-vsgh-zwu

Vad1mo · 2020-03-18T08:29:25Z

I am puzzling to find out how to tell Prometheus what URL to scrape. If that is the correct scrapping URL
https://kubernetes.default.svc:443/api/v1/nodes//proxy/metrics/cadvisor?

Doesn't seem to be dynamic so hardcoding nodenames is required.

brondum · 2020-03-18T08:34:23Z

Actually my scrape for that looks like this:

 # Scrape config for Kubelet cAdvisor.
  - job_name: 'kubernetes-cadvisor'
    scheme: https
    tls_config:
      ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    kubernetes_sd_configs:
    - role: node
    relabel_configs:
    - action: labelmap
      regex: __meta_kubernetes_node_label_(.+)
    - target_label: __address__
      replacement: kubernetes.default.svc:443
    - source_labels: [__meta_kubernetes_node_name]
      regex: (.+)
      target_label: __metrics_path__
      replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
    # fix for mixin:
    - source_labels: [__meta_kubernetes_namespace]
      action: replace
      target_label: namespace
    - source_labels: [__meta_kubernetes_pod_name]
      action: replace
      target_label: pod
    - source_labels: [__meta_kubernetes_pod_node_name]
      action: replace
      target_label: node

carlosedp · 2020-03-18T14:28:58Z

I've deployed a fresh K3s (v1.17.3+k3s1) into an ARM64 node and updated my monitoring stack (https://github.com/carlosedp/cluster-monitoring) with latest mixins.

The json output of the metrics still don't include some information but the dashboards are populated correctly:

container_cpu_usage_seconds_total{cpu="total",endpoint="https-metrics",id="/kubepods/burstable/podcd2b5231-b73c-44d7-b6a1-4a51dab6e21f",instance="192.168.15.15:10250",job="kubelet",metrics_path="/metrics/cadvisor",namespace="monitoring",node="odroidn2",pod="prometheus-k8s-0",service="kubelet"}

I believe most problems related to this have been fixed now.

brondum · 2020-03-21T15:59:59Z

@carlosedp Neat!, but what i dont cant seem to understand is where does the node_namespace_pod_container come from.

Just spun up a fresh k3s and yes that works, but i cant look the node_namespace_pod_container from Prometheus, but Grafana gets it somehow ?

What am i missing here?

carlosedp · 2020-03-21T16:04:37Z

They are Prometheus Rules... composed of many other fields, like:

- expr: |
    sum by (cluster, namespace, pod, container) (
      rate(container_cpu_usage_seconds_total{job="kubelet", metrics_path="/metrics/cadvisor", image!="", container!="POD"}[5m])
    ) * on (cluster, namespace, pod) group_left(node) topk by (cluster, namespace, pod) (
      1, max by(cluster, namespace, pod, node) (kube_pod_info)
    )
  record: node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate

The rule node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate comes from the expression above. It's like a "view" on SQL or something like that.

Most are defined here: https://github.com/carlosedp/cluster-monitoring/blob/master/manifests/prometheus-rules.yaml

brondum · 2020-03-21T16:22:37Z

Now it makes sense! Perfect, thank you very much for that explanation. :D

sandys · 2020-05-06T09:33:30Z

@Vad1mo - have you gotten this to work ? we are not using metrics-server in our k3s install and are instead using prometheus-adapter.
We are looking for CPU, etc metrics being exported from cadvisor and hooked into kubernetes via the prometheus adapter.
did you get this to work ? could you explain how

Vad1mo · 2020-05-06T10:04:34Z

Yes, it works now, was my config for Prometheus operator. cAdvisor is not needed

sandys · 2020-05-06T10:18:17Z

@Vad1mo that's great to know! can you share your prometheus operator config ?

Would like to know two additional things:

did you setup k3s to use docker instead of containerd ? lots of people are reporting that cadvisor doesnt work properly without that.
do you apply your prometheus operator config after setting up prometheus operator ? or did you include this config within the deploy of the operator itself (by forking it)

any help would be very welcome!

Vad1mo · 2020-05-06T10:23:22Z

containerd with - containerd: /run/k3s/containerd/containerd.sock
its part of the operator installation that also installs prometheus

we use https://github.com/cloudposse/helmfiles/blob/master/releases/prometheus-operator.yaml
and this is our modification to it.

 - global:
          rbac:
            create: true
            pspEnabled: true
        defaultRules:
          create: true
          rules:
            kubernetesResources: false
        additionalPrometheusRulesMap:
          # These rules are copied from https://raw.githubusercontent.com/coreos/kube-prometheus/release-0.1/manifests/prometheus-rules.yaml
          # Only CPUThrottlingHigh has been modified, to be replaced with a customizable version
          # to reduce alerts caused by https://github.com/kubernetes/kubernetes/pull/63437
          kubernetes-resources:
            groups:
              - name: kubernetes-resources
                rules:
                  - alert: KubeCPUOvercommit
                    annotations:
                      message: Cluster has overcommitted CPU resource requests for Pods and cannot tolerate node failure.
                      runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecpuovercommit
                    expr: |-
                      sum(namespace_name:kube_pod_container_resource_requests_cpu_cores:sum)
                        /
                      sum(node:node_num_cpu:sum)
                        >
                      (count(node:node_num_cpu:sum)-1) / count(node:node_num_cpu:sum)
                    for: 5m
                    labels:
                      severity: warning
                  - alert: KubeMemOvercommit
                    annotations:
                      message: Cluster has overcommitted memory resource requests for Pods and cannot tolerate node failure.
                      runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubememovercommit
                    expr: |-
                      sum(namespace_name:kube_pod_container_resource_requests_memory_bytes:sum)
                        /
                      sum(node_memory_MemTotal_bytes)
                        >
                      (count(node:node_num_cpu:sum)-1)
                        /
                      count(node:node_num_cpu:sum)
                    for: 5m
                    labels:
                      severity: warning
                  - alert: KubeCPUOvercommit
                    annotations:
                      message: Cluster has overcommitted CPU resource requests for Namespaces.
                      runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecpuovercommit
                    expr: |-
                      sum(kube_resourcequota{job="kube-state-metrics", type="hard", resource="cpu"})
                        /
                      sum(node:node_num_cpu:sum)
                        > 1.5
                    for: 5m
                    labels:
                      severity: warning
                  - alert: KubeMemOvercommit
                    annotations:
                      message: Cluster has overcommitted memory resource requests for Namespaces.
                      runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubememovercommit
                    expr: |-
                      sum(kube_resourcequota{job="kube-state-metrics", type="hard", resource="memory"})
                        /
                      sum(node_memory_MemTotal_bytes{job="node-exporter"})
                        > 1.5
                    for: 5m
                    labels:
                      severity: warning
                  - alert: KubeQuotaExceeded
                    annotations:
                      message: Namespace {{`{{ $labels.namespace }}`}} is using {{`{{ printf "%0.0f" $value }}`}}% of its {{`{{ $labels.resource }}`}} quota.
                      runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubequotaexceeded
                    expr: |-
                      100 * kube_resourcequota{job="kube-state-metrics", type="used"}
                        / ignoring(instance, job, type)
                      (kube_resourcequota{job="kube-state-metrics", type="hard"} > 0)
                        > 90
                    for: 15m
                    labels:
                      severity: warning
                  # Original rule is 25% for 15 minutes
                  - alert: CPUThrottlingHigh-{{- env "PROMETHEUS_OPERATOR_RULES_CPU_THROTTLING_HIGH_THRESHOLD_PERCENT" | default "50" -}}-{{- env "PROMETHEUS_OPERATOR_RULES_CPU_THROTTLING_HIGH_THRESHOLD_TIME" | default "25m" }}
                    annotations:
                      message: '{{`{{ printf "%0.0f" $value }}`}}% throttling of CPU in namespace {{`{{ $labels.namespace }}`}} for container {{`{{ $labels.container_name }}`}} in pod {{`{{ $labels.pod_name }}`}}.'
                      runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-cputhrottlinghigh
                    expr: |-
                      100 * sum(increase(container_cpu_cfs_throttled_periods_total{container_name!="", }[5m])) by (container_name, pod_name, namespace)
                        /
                      sum(increase(container_cpu_cfs_periods_total{}[5m])) by (container_name, pod_name, namespace)
                        > {{ env "PROMETHEUS_OPERATOR_RULES_CPU_THROTTLING_HIGH_THRESHOLD_PERCENT" | default "50" }}
                    for: 25m
                    labels:
                      severity: warning
        prometheusOperator:
          enabled: true
          # log level must be one of "all", "debug",	"info", "warn",	"error", "none"
          logLevel: "warn"
          resources:
            limits:
              cpu: "100m"
              memory: "96Mi"
            requests:
              cpu: "20m"
              memory: "48Mi"
          image:
            pullPolicy: "IfNotPresent"
        prometheus:
          enabled: true
          podDisruptionBudget:
            enabled: false
          ingress:
            enabled: false
          additionalServiceMonitors: []
          additionalPodMonitors: []
          prometheusSpec:
            replicas: 1
            retention: 45d
            logLevel: "warn"
            podMetadata:
              annotations:
                "cluster-autoscaler.kubernetes.io/safe-to-evict": "true"
            scrapeInterval: ""
            evaluationInterval: ""
            ## If true, a nil or {} value for prometheus.prometheusSpec.ruleSelector will cause the
            ## prometheus resource to be created with selectors based on values in the helm deployment,
            ## which will also match the PrometheusRule resources created.
            ## If false, a nil or or {} value for ruleSelector will select all PrometheusRule resources.
            ruleSelectorNilUsesHelmValues: false
            ## serviceMonitorSelectorNilUsesHelmValues works just like ruleSelectorNilUsesHelmValues
            serviceMonitorSelectorNilUsesHelmValues: false
            #externalUrl: "{{- env "PROMETHEUS_PROMETHEUS_EXTERNAL_URL" | default (print "https://api." (env "KOPS_CLUSTER_NAME") "/api/v1/namespaces/monitoring/services/prometheus-operator-prometheus:web/proxy/") }}"
            resources:
              limits:
                cpu: 300m
                memory: 1526Mi
              requests:
                cpu: 75m
                memory: 768Mi
        alertmanager:
          enabled: true
          ## Alertmanager configuration directives
          ## ref: https://prometheus.io/docs/alerting/configuration/#configuration-file
          ##      https://prometheus.io/webtools/alerting/routing-tree-editor/
          ##
          config:
            global:
              resolve_timeout: 5m
            route:
              group_by:
                - "alertname"
                - "namespace"
              group_wait: 30s
              group_interval: 5m
              repeat_interval: 12h
              receiver: "general"
              routes:
                - match:
                    alertname: Watchdog
                  receiver: "null"
            receivers:
              - name: "null"
              - name: "general"
            templates:
              - ./*.tmpl
          alertmanagerSpec:
            #externalUrl: "{{- env "PROMETHEUS_ALERTMANAGER_EXTERNAL_URL" | default (print "https://api." (env "KOPS_CLUSTER_NAME") "/api/v1/namespaces/monitoring/services/prometheus-operator-alertmanager:web/proxy/") }}"
            resources:
              limits:
                cpu: "200m"
                memory: "96Mi"
              requests:
                cpu: "10m"
                memory: "24Mi"
        grafana:
          # https://github.com/helm/charts/tree/dfa02f9b117a29de889f9c35e0b3abb6012a0877/stable/grafana#configuration
          enabled: true
          adminPassword: "CHANGEME"
          defaultDashboardsEnabled: true
          sidecar:
            dashboards:
              enabled: true
              searchNamespace: ALL
              label: grafana_dashboard
          plugins:
            - grafana-piechart-panel
          resources:
            limits:
              cpu: "250m"
              memory: "128Mi"
            requests:
              cpu: "25m"
              memory: "72Mi"
          grafana.ini:
            dataproxy:
              # default is 30 seconds
              timeout: 90
            server:
            # root_url: "https://api.{{- env "KOPS_CLUSTER_NAME" }}/api/v1/namespaces/kube-system/services/prometheus-operator-grafana:service/proxy/"
            # root_url: "{{- env "PROMETHEUS_GRAFANA_ROOT_URL" | default (print "https://api." (env "KOPS_CLUSTER_NAME") "/api/v1/namespaces/monitoring/services/prometheus-operator-grafana:service/proxy/") }}"
            auth.anonymous:
              enabled: true
              org_role: Admin
        kubeStateMetrics:
          enabled: true
        kubeApiServer:
          enabled: true
        kubelet:
          enabled: true
          # In general, few clusters are set up to allow kublet to authenticate a bearer token, and
          # the HTTPS endpoint requires authentication, so Prometheus cannot access it.
          # The HTTP endpoint does not require authentication, so Prometheus can access it.
          # See https://github.com/coreos/prometheus-operator/issues/926
          serviceMonitor:
            https: true
        kubeControllerManager:
          enabled: true
        coreDns:
          enabled: true
        kubeDns:
          enabled: false
        kubeEtcd:
          # Access to etcd is a huge security risk, so nodes are blocked from accessing it.
          # Therefore Prometheus cannot access it without extra setup, which is beyond the scope of this helmfile.
          # See https://github.com/kubernetes/kops/issues/5852
          #     https://github.com/kubernetes/kops/issues/4975#issuecomment-381055946
          #     https://github.com/coreos/prometheus-operator/issues/2397
          #     https://github.com/coreos/prometheus-operator/blob/v0.19.0/contrib/kube-prometheus/docs/Monitoring%20external%20etcd.md
          #     https://gist.github.com/jhohertz/476bd616d4171649a794b8c409f8d548
          # So we disable it since it is not going to work anyway
          enabled: false
        kubeScheduler:
          enabled: true
        nodeExporter:
          enabled: true
    # set:
    # - name: "alertmanager.templateFiles.deployment\\.tmpl"
    #   file: values/kube-prometheus.alerts.template

sandys · 2020-05-06T10:30:36Z

@Vad1mo thank you so much for this !

are you by chance using this for autoscaling using prometheus-adapter ? if you are - what has been your experience there ?

adelcast · 2020-11-12T00:37:43Z

For now, I have to use the docker runtime, since containerd mounts volumes as root:root, which break my Druid installation. Using the docker runtime, I get metrics that are missing container/image:

container_cpu_system_seconds_total{container="",id="/",image="",name="",namespace="",pod=""} 385870.64 1605141064258

the workarounds mentioned here (--kubelet-arg containerd=/run/k3s/containerd/containerd.sock) don't apply to docker. Any guidance on what to do to make sure cadvisor populates container/image using the docker runtime?

I am running k3s version v1.18.9+k3s1 (630bebf)

thanks!

tbcdns · 2021-01-29T20:50:03Z

Same issue here, it was fine before installing the latest version (v1.20.2+k3s1)

kubectl get --raw /api/v1/nodes/<node>/proxy/metrics/cadvisor contains no value for image and container labels.

brandond · 2021-01-29T21:30:16Z

@tbcdns I think you're looking for #2831 - this one has been closed for a long time.

carlosedp mentioned this issue Aug 21, 2019

Some metrics from cadvisor have no data on K3s carlosedp/cluster-monitoring#17

Closed

cjellick closed this as completed Nov 14, 2019

erikwilson reopened this Nov 14, 2019

cjellick added this to the v1.x - Backlog milestone Nov 15, 2019

NicklasWallgren mentioned this issue Dec 3, 2019

K3s 0.9 - 1.0+ Missing metrics carlosedp/cluster-monitoring#25

Closed

zube bot added [zube]: Done and removed [zube]: To Test labels Jan 24, 2020

davidnuzik removed the [zube]: Done label Sep 23, 2020

a-thaler mentioned this issue Jul 14, 2022

kublet metrics not available with k8s 1.24 kyma-project/kyma#14809

Closed

abeatrix mentioned this issue Oct 4, 2022

Empty labels for container, image, and name in cAdvisor #6210

Closed

Cadvisor not reporting Container/Image metadata #473

Cadvisor not reporting Container/Image metadata #473

Comments

cfchad commented May 13, 2019

cfchad commented May 15, 2019

fore5fire commented Jun 5, 2019

cfchad commented Jun 7, 2019

ahmedmagdiosman commented Jul 14, 2019 • edited Loading

carlosedp commented Aug 21, 2019 • edited Loading

ibuildthecloud commented Aug 22, 2019

carlosedp commented Aug 22, 2019

cfchad commented Aug 22, 2019

geekdave commented Aug 22, 2019 • edited Loading

carlosedp commented Aug 22, 2019

carlosedp commented Aug 22, 2019

ibuildthecloud commented Aug 22, 2019

geekdave commented Aug 23, 2019

carlosedp commented Aug 23, 2019 • edited Loading

ludopaquet commented Sep 24, 2019

carlosedp commented Nov 14, 2019

ludopaquet commented Nov 14, 2019

carlosedp commented Nov 14, 2019

carlosedp commented Nov 14, 2019

cjellick commented Nov 14, 2019

carlosedp commented Nov 14, 2019

pckbls commented Dec 2, 2019

NicklasWallgren commented Dec 3, 2019

NicklasWallgren commented Dec 3, 2019

borg286 commented Jan 8, 2020

Vad1mo commented Mar 6, 2020 • edited Loading

brondum commented Mar 17, 2020

Vad1mo commented Mar 17, 2020

brondum commented Mar 18, 2020

Vad1mo commented Mar 18, 2020 • edited Loading

brondum commented Mar 18, 2020

Vad1mo commented Mar 18, 2020

brondum commented Mar 18, 2020

Vad1mo commented Mar 18, 2020

brondum commented Mar 18, 2020 • edited Loading

Vad1mo commented Mar 18, 2020

Vad1mo commented Mar 18, 2020

brondum commented Mar 18, 2020

carlosedp commented Mar 18, 2020

brondum commented Mar 21, 2020

carlosedp commented Mar 21, 2020 • edited Loading

brondum commented Mar 21, 2020

sandys commented May 6, 2020

Vad1mo commented May 6, 2020 • edited Loading

sandys commented May 6, 2020

Vad1mo commented May 6, 2020

sandys commented May 6, 2020

adelcast commented Nov 12, 2020

tbcdns commented Jan 29, 2021

brandond commented Jan 29, 2021

ahmedmagdiosman commented Jul 14, 2019 •

edited

Loading

carlosedp commented Aug 21, 2019 •

edited

Loading

geekdave commented Aug 22, 2019 •

edited

Loading

carlosedp commented Aug 23, 2019 •

edited

Loading

Vad1mo commented Mar 6, 2020 •

edited

Loading

Vad1mo commented Mar 18, 2020 •

edited

Loading

brondum commented Mar 18, 2020 •

edited

Loading

carlosedp commented Mar 21, 2020 •

edited

Loading

Vad1mo commented May 6, 2020 •

edited

Loading