Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cAdvisor metrics stopped working correctly in K3s 1.20 #2831

Closed
eplightning opened this issue Jan 20, 2021 · 15 comments
Closed

cAdvisor metrics stopped working correctly in K3s 1.20 #2831

eplightning opened this issue Jan 20, 2021 · 15 comments
Assignees
Labels
kind/bug Something isn't working
Milestone

Comments

@eplightning
Copy link

Environmental Info:
K3s Version:
k3s version v1.20.0+k3s2 (2ea6b16)
go version go1.15.5

Node(s) CPU architecture, OS, and Version:
Linux kube-master0 5.4.0-54-generic #60-Ubuntu SMP Fri Nov 6 10:37:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:
1 master, 2 workers, embedded containerd (no docker, no custom CRI)

Describe the bug:
cAdvisor is unable to connect to containerd resulting in mostly empty labels (container, image, name) in metrics.

Adding --kubelet-arg containerd=/run/k3s/containerd/containerd.sock to k3s launch args fixes the issue.

Steps To Reproduce:

k3s server

curl -k --cert /var/lib/rancher/k3s/server/tls/client-admin.crt --key /var/lib/rancher/k3s/server/tls/client-admin.key https://127.0.0.1:6443/api/v1/nodes/NODE_NAME/proxy/metrics/cadvisor

Expected behavior:
Metrics from cAdvisor have non-empty labels:

container_cpu_load_average_10s{container="fluent-bit",id="/kubepods/burstable/pod227a7799-c04b-419a-9d96-98b5ca911666/67d16f0aa8f6e214914c9769a55e8154a9f133646b36974d03b8a3f185ae3e38",image="docker.io/fluent/fluent-bit:1.6",name="67d16f0aa8f6e214914c9769a55e8154a9f133646b36974d03b8a3f185ae3e38",namespace="logging",pod="fluent-bit-vw9cg"} 0 1611138060164

Actual behavior:
cAdvisor metric's labels are empty

container_cpu_load_average_10s{container="",id="/kubepods/burstable/pod227a7799-c04b-419a-9d96-98b5ca911666/67d16f0aa8f6e214914c9769a55e8154a9f133646b36974d03b8a3f185ae3e38",image="",name="",namespace="",pod=""} 0 1611135942449

Additional context / logs:
Most likely regression caused by 5b318d0

That value is used for both argsMap["container-runtime-endpoint"] and argsMap["containerd"] and it seems the containerd one cannot be an URI

@brandond
Copy link
Contributor

brandond commented Jan 20, 2021

Have you tried v1.20.2? Possibly related to kubernetes/kubernetes#97006

Although the fact that overriding the arg resolves it is interesting.

@brandond brandond self-assigned this Jan 20, 2021
@brandond brandond added the kind/bug Something isn't working label Jan 20, 2021
@eplightning
Copy link
Author

Upgrading to v1.20.2+k3s1 (1d4adb0) didn't help, still need that extra kubelet arg for proper labels.

The only change I noticed was presence of additional metrics, presumably added by kubernetes/kubernetes#97006

@mnorrsken
Copy link
Contributor

Great! It is working for me also by adding kubelet-arg. 👍
I had the same problem and I thought the "kubernetes masters" wanted me to do this in prometheus queries:
label_replace(container_cpu_load_average_10s,"container_id", "containerd://$1","id","(?:/.+){3}/(.*)") * on (container_id) group_left(container) (kube_pod_container_info)
😄 @eplightning you saved me a week of creating overcomplicated dashboards!

@brandond brandond added this to To Triage in Development [DEPRECATED] via automation Jan 25, 2021
@brandond brandond moved this from To Triage to To Test in Development [DEPRECATED] Jan 25, 2021
@HaveFun83
Copy link

same here. Thanks a lot for the workaround.

@ShylajaDevadiga
Copy link
Contributor

Issue was reproducible on k3s v1.20.2+k3s1

container_spec_cpu_shares{container="",id="/kubepods/burstable/pod9ac68d71-726d-476e-a3e0-3ad9f01765a6/445915b33977b7677603af439673633fc812ed98af2d3cdc1d85d3b937f6654a",image="",name="",namespace="",pod=""}

Validated metrics have non-empty labels using commit id k3s version v1.20.2+k3s-c5e2676d

container_cpu_load_average_10s{container="",id="/kubepods/burstable/podf75b023d-1a57-4228-826b-7f6e57ab978c/685fd28b35a34709ed4295841a54f2df7118813575eb12da095a43fccb92e0d9",image="docker.io/rancher/pause:3.1",name="685fd28b35a34709ed4295841a54f2df7118813575eb12da095a43fccb92e0d9",namespace="kube-system",pod="coredns-854c77959c-x42xt"} 0 1612513308508

@lackhoa
Copy link

lackhoa commented Mar 18, 2021

Sorry for commenting on a closed issue, but I am using v1.20.2+k3s1 and adding the kubelet arg doesn't solve the issue for me.
Furthermore, is there any documentation on the containerd kubelet arg?

  • I can't see it anywhere in the config (other args I added are all there)
    kubectl get --raw "/api/v1/nodes/${NODE}/proxy/configz"
  • I don't see it in the docs.

@brandond
Copy link
Contributor

brandond commented Mar 18, 2021

@lackhoa the fix isn't in v1.20.2+k3s1. QA tested on a post-release CI build (v1.20.2+k3s-c5e2676d) off master. That commit is included in v1.20.4+k3s1; please use that.

Not all kubelet args are documented; for historical reasons all cadvisor args are also valid kubelet args, despite their not being in the docs.

@lackhoa
Copy link

lackhoa commented Mar 18, 2021

@brandond Ah thanks, I'll try that.
But given that the workaround worked for OP, I assume it'd work for me.

@eplightning
Copy link
Author

I think the containerd flag was never actually documented since it's consumed by cAdvisor, not directly by kubelet. It's deprecated now but still required if you're running containerd on non-default socket path. More details kubernetes/kubernetes#89903

@brandond
Copy link
Contributor

Yeah, it's one of those things that make it clear that dockershim is still the only thing that upstream actually tests, despite all the big talk about deprecating it. You run into all kinds of weird issues if you actually use a different runtime.

@lackhoa
Copy link

lackhoa commented Mar 19, 2021

Alright, I can confirm that my problem has been resolved after upgrading to v1.20.4+k3s1, with the added containerd arg.
However, what tripped me up is that some time series belonging to the metrics container_cpu_cfs_periods_total still have empty container field.
Don't know what to make of it, but I'm happy that the container field is back.

@brandond
Copy link
Contributor

You don't need to add the containerd arg on v1.20.4.

@zalegrala
Copy link

Are there docs for how to enable these metrics? I'm looking to use a dashboard that makes use of a metric kube_pod_container_info but I don't know how to enable it or what to scrape.

@discordianfish
Copy link
Contributor

I'm running v1.24.2+k3s2 and still have to pass --kubelet-arg containerd=/run/k3s/containerd/containerd.sock to make this work.
Another regression?

@brandond
Copy link
Contributor

@discordianfish No, you're just on an old version of K3s. Update to the latest 1.24 patch release.

@k3s-io k3s-io locked and limited conversation to collaborators Nov 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Something isn't working
Projects
No open projects
Development

No branches or pull requests

9 participants