Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to update correctly - validate CRI v1 image API for endpoint \"unix:///run/containerd-stargz-grpc/containerd-stargz-grpc.sock\" #9858

Closed
plsnotracking opened this issue Apr 2, 2024 · 4 comments

Comments

@plsnotracking
Copy link

Environmental Info:
K3s Version:

❯ k3s --version
k3s version v1.28.8+k3s1 (653dd61a)
go version go1.21.8

Node(s) CPU architecture, OS, and Version:

❯ uname -a
Linux macstudio 6.6.3-413.asahi.fc39.aarch64+16k #1 SMP PREEMPT_DYNAMIC Sat Jan 27 17:19:54 UTC 2024 aarch64 GNU/Linux

Cluster Configuration:
1 server
8 agents

Describe the bug:
k3s was working fine with v1.28.7, and then I performed a manual upgrade, and I borked it somehow.

Steps To Reproduce:

  • Take a live k3s cluster
  • Set env vars
# k3s options
export K3S_KUBECONFIG_MODE="644"
export INSTALL_K3S_NAME="homelab"
export INSTALL_K3S_SELINUX_WARN="true"
export INSTALL_K3S_CHANNEL="stable"
export INSTALL_K3S_EXEC=" \
    --disable servicelb \
    --disable traefik \
    --etcd-expose-metrics \
    --tls-san 100.119.138.30, 100.91.83.52 \
    --snapshotter=stargz \
    --node-label arch=arm64 \
    --node-label type=mac"
  • perform the upgrade by - curl -sfL https://get.k3s.io | sh -

Expected behavior:
Should upgrade correctly, and resume cluster operations.

Actual behavior:
kubelet exited: failed to run Kubelet: validate service connection: validate CRI v1 image API for endpoint \"unix:///run/containerd-stargz-grpc/containerd-stargz-grpc.sock\": rpc error:

Additional context / logs:

Apr 02 12:01:57 macstudio k3s[1531490]: time="2024-04-02T12:01:57-07:00" level=info msg="Module iptable_filter was already loaded"
Apr 02 12:01:57 macstudio k3s[1531490]: time="2024-04-02T12:01:57-07:00" level=warning msg="SELinux is enabled on this host, but k3s has not been started with --selinux - containerd SELinux support is disabled"
Apr 02 12:01:57 macstudio k3s[1531490]: time="2024-04-02T12:01:57-07:00" level=info msg="Logging containerd to /var/lib/rancher/k3s/agent/containerd/containerd.log"
Apr 02 12:01:57 macstudio k3s[1531490]: time="2024-04-02T12:01:57-07:00" level=info msg="Running containerd -c /var/lib/rancher/k3s/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/k3s/agent/containerd"
Apr 02 12:01:58 macstudio k3s[1531490]: I0402 12:01:58.399181 1531490 storage_scheduling.go:111] all system priority classes are created successfully or already exist.
Apr 02 12:01:58 macstudio k3s[1531490]: time="2024-04-02T12:01:58-07:00" level=info msg="containerd is now running"
Apr 02 12:01:58 macstudio k3s[1531490]: time="2024-04-02T12:01:58-07:00" level=info msg="Running kubelet --address=0.0.0.0 --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --cgroup-driver=systemd --client-ca-file=/var/lib/rancher/k3s/agent/client-ca.crt --cloud-provider=external --cluster-dns=10.43.0.10 --cluster-domain=cluster.local --container-runtime-endpoint=unix:///run/k3s/containerd/containerd.sock --containerd=/run/k3s/containerd/containerd.sock --eviction-hard=imagefs.available<5%,nodefs.available<5% --eviction-minimum-reclaim=imagefs.available=10%,nodefs.available=10% --fail-swap-on=false --feature-gates=CloudDualStackNodeIPs=true --healthz-bind-address=127.0.0.1 --hostname-override=macstudio --image-service-endpoint=unix:///run/containerd-stargz-grpc/containerd-stargz-grpc.sock --kubeconfig=/var/lib/rancher/k3s/agent/kubelet.kubeconfig --node-ip=192.168.1.2,2600:1700:5b70:2820::c --node-labels=arch=arm64,type=mac --pod-infra-container-image=rancher/mirrored-pause:3.6 --pod-manifest-path=/var/lib/rancher/k3s/agent/pod-manifests --read-only-port=0 --resolv-conf=/etc/resolv.conf --serialize-image-pulls=false --tls-cert-file=/var/lib/rancher/k3s/agent/serving-kubelet.crt --tls-private-key-file=/var/lib/rancher/k3s/agent/serving-kubelet.key"
Apr 02 12:01:58 macstudio k3s[1531490]: time="2024-04-02T12:01:58-07:00" level=info msg="Connecting to proxy" url="wss://127.0.0.1:6443/v1-k3s/connect"
Apr 02 12:01:58 macstudio k3s[1531490]: time="2024-04-02T12:01:58-07:00" level=info msg="Handling backend connection request [macstudio]"
Apr 02 12:01:58 macstudio k3s[1531490]: Flag --cloud-provider has been deprecated, will be removed in 1.25 or later, in favor of removing cloud provider code from Kubelet.
Apr 02 12:01:58 macstudio k3s[1531490]: Flag --containerd has been deprecated, This is a cadvisor flag that was mistakenly registered with the Kubelet. Due to legacy concerns, it will follow the standard CLI deprecation timeline before being removed.
Apr 02 12:01:58 macstudio k3s[1531490]: Flag --pod-infra-container-image has been deprecated, will be removed in a future release. Image garbage collector will get sandbox image information from CRI.
Apr 02 12:01:58 macstudio k3s[1531490]: I0402 12:01:58.971331 1531490 server.go:202] "--pod-infra-container-image will not be pruned by the image garbage collector in kubelet and should also be set in the remote runtime"
Apr 02 12:01:58 macstudio k3s[1531490]: I0402 12:01:58.974270 1531490 server.go:462] "Kubelet version" kubeletVersion="v1.28.8+k3s1"
Apr 02 12:01:58 macstudio k3s[1531490]: I0402 12:01:58.974312 1531490 server.go:464] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
Apr 02 12:01:58 macstudio k3s[1531490]: I0402 12:01:58.975887 1531490 dynamic_cafile_content.go:157] "Starting controller" name="client-ca-bundle::/var/lib/rancher/k3s/agent/client-ca.crt"
Apr 02 12:01:58 macstudio k3s[1531490]: time="2024-04-02T12:01:58-07:00" level=info msg="Annotations and labels have already set on node: macstudio"
Apr 02 12:01:58 macstudio k3s[1531490]: Error: failed to run Kubelet: validate service connection: validate CRI v1 image API for endpoint "unix:///run/containerd-stargz-grpc/containerd-stargz-grpc.sock": rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /run/k3s/containerd/containerd.sock: connect: connection refused"
Apr 02 12:01:58 macstudio k3s[1531490]: time="2024-04-02T12:01:58-07:00" level=error msg="kubelet exited: failed to run Kubelet: validate service connection: validate CRI v1 image API for endpoint \"unix:///run/containerd-stargz-grpc/containerd-stargz-grpc.sock\": rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial unix /run/k3s/containerd/containerd.sock: connect: connection refused\""
Apr 02 12:01:59 macstudio systemd[1]: k3s-homelab.service: Main process exited, code=exited, status=1/FAILURE
Apr 02 12:01:59 macstudio systemd[1]: k3s-homelab.service: Failed with result 'exit-code'.
Apr 02 12:01:59 macstudio systemd[1]: k3s-homelab.service: Unit process 5120 (containerd-shim) remains running after unit stopped.
Apr 02 12:01:59 macstudio systemd[1]: k3s-homelab.service: Unit process 5124 (containerd-shim) remains running after unit stopped.
@brandond
Copy link
Contributor

brandond commented Apr 2, 2024

Check the containerd log file to see why the stargz image service failed to start?

@plsnotracking
Copy link
Author

I'm not sure what exactly I should be looking for, I did 2 of the following things

  1. Tried to look for containerd process in the systemctl list:
> sudo systemctl status c
calamares-firstboot.service
canberra-system-bootup.service
canberra-system-shutdown-reboot.service
canberra-system-shutdown.service
chronyd-restricted.service
chronyd.service
chrony-wait.service
colord.service
console-getty.service
cri-containerd-00929d54ad0b8a92ff5f054af2de926cfa3773a9398423605a4143482c4394cf.scope
cri-containerd-07ca53cd37b27531a4d8b183671d45c5c1e743d690a326912d18c4e544486a29.scope
cri-containerd-11e90b800025bc1b5a957667b201242e1114f3f7696961aff40c2b55d3c754ad.scope
cri-containerd-1275d534f038661b348508b7e0f351dc29a71bc50b197c1856692dd91b26a7da.scope
cri-containerd-3743f054fcc5c03cbc16c5c4735dc30e137163bab1741b9529419cd1250c5011.scope
cri-containerd-3c2eda63665777dbea4e695d44c4335a97b5dc4776d61caea512dc3b6dcadeca.scope
cri-containerd-3e21587a0351ab677d250a94e7e3b581732a7fd9f13514c80fffb9ffee6c0115.scope
cri-containerd-4cf0e915f7713c48f0bfd6c8ebcb867d30bf73acb4dcd0a9433f028bcd602a73.scope
cri-containerd-4d26eb29f9b80e309f03a55da87f40c8d8c12969eed852b9656a91f325d9c770.scope
cri-containerd-4d4fb1a0697a2c5fd470be019194ba77b16a2ccd0876ed4fd255f27ec8f87514.scope
cri-containerd-4da8a4bfc8875a1518d04f67f6861a9886aa6128b3dbaec71f97bfcb30f8cb4e.scope
cri-containerd-4dd232f35e0808936bc5c37b9c10252f27e15c8741552eac1878aa3fd45e1a27.scope
cri-containerd-5337ecd060d2544ec1c21a4db86239093276a1d3706af7d9937fd61e1af13288.scope
cri-containerd-5aa75578011ea56e22ab32d00a8f39377aafe0d1fac4e0f0fc754b6b3df7c38a.scope
cri-containerd-5d99140586c23b64516c52e59183f54e58aef9abf8c3793a6dc9176a8dc808cc.scope
cri-containerd-5ea8fbaa4c61db90cd72bca4394af526833304a0a4a47622b8e8ff8937940aa7.scope
cri-containerd-63333f2f2a4ef2f43cd39448ebb66848b70ecb2ca3d3713073a101a834f9cfc6.scope
cri-containerd-65b1ec7303ee5a676e718d51010c79640f808b13991b219778cc2f83550798c6.scope
cri-containerd-671509b5adce43c9235e32f83c076781d246116047bb06cb44153bfc6dc7a76c.scope
cri-containerd-811beae091d42a754dceee8ecb0057a4f718f72255ffef0099d3f00ffb01131e.scope
cri-containerd-87682967070ffb750df06f639ccf2a0978b809dcaa5b469cee7cb29d276af54f.scope
cri-containerd-8a7b30f9788b46be33cf33b1445a302521fdb3442b2737de5d35c404ec7fde46.scope
cri-containerd-96ef6b57c0e87a0bd24a8cb892e156a3899a8b7cb4ffe55bd47b15560ec7467d.scope
cri-containerd-9853a1ba17a782901badb8d4c91869feacb9f359f76971cadae0edfd5224d849.scope
cri-containerd-9865a3469562a240a41758270d13ae25e8793a536b8dd1d5e982c70d3859d4da.scope
cri-containerd-9a2aef3b732baabbcdd1a4214b77b1ff0f0ed17f486fd1e0a3d8bf13de1759d4.scope
cri-containerd-9b29000bcfae49553faca5928b64b34152391f455d7439efa51c0af9920d8bb3.scope
cri-containerd-9b49c96ab6f647de8a7898885fd879247c1aee96df5cd333d8644c7e8ef36192.scope
cri-containerd-9ffc976e76b78b7ec5267a90f420705342778f9f9c4e44f0d2bf1bf4339a4bb7.scope
cri-containerd-a062baf09a5d1568c79c40ce4ea98f36be485acabd91c847c6f489931a9bd38b.scope
cri-containerd-a75fca490cbb887003ff5a5ca2bcddb82d6ed4fee7bb8910f18981bd8a479071.scope
cri-containerd-b29cbbead966212a77ca0a9ce48cc67d106ab22e09f3a42629b9010b24bfcfc8.scope
cri-containerd-b921298245f162be15dd3ec27562c1a96deb3671a5995939a8ab5241715be258.scope
cri-containerd-b9a3a4c2e3cd44a9fb4bea646d29d3fcd394e0b5da3c026d7f5c810ade7e11cb.scope
cri-containerd-bbd1ca30e5b4c4cd551cd4cffb6b614b8c570413af2e3c173f85ad81564ea2dc.scope
cri-containerd-bd7305b5626337a772d7f0aecf0a37ce95cd3c05fbaa17dee288d8759f601627.scope
cri-containerd-c0b6c37c2512dc64d3476baf48fa2f78f132137c398bb6b803c0c04942199054.scope
cri-containerd-c24173bca03a715e77a4f9d32bec0df9724bcf99387d0bca3ce5cb68bb1d5588.scope
cri-containerd-ca72fd1ed8b61c406a25b6768161056bb3338dc6894780164a0e021948c50bdd.scope
cri-containerd-d278cc8d0064f6ee152c9c7d507bc5aca66a944d5e3e86a58c65214929adaf3a.scope
cri-containerd-d5213cca6a795e245ac5bf6f182db5ecece36b5467b1595c54706a1612d25010.scope
cri-containerd-d7bb219e5a3862f6e3f249e7345955ad087f5a784286822388142eb092650192.scope
cri-containerd-e5596a2c4633d685f668b56a0f5f4a9dcb3659217456563abf553c02f438569d.scope
cri-containerd-e899a38d9b60635359b3e2739a8ad1bf9f6cc0b87e0f7814f07a7417efecc6a9.scope
cri-containerd-f9c8324d7be54748689de44f5e3b9d2d4808495a096de1e0a3a368092965988c.scope
crond.service
cryptsetup-pre.target
cryptsetup.target
ctrl-alt-del.target
cups-browsed.service
cups.path
cups.service
cups.socket
  1. The other thing I tried was cat /var/log/co, but there is containerd app/process to cat logs from.

Any other ways to see what I messed up?

@brandond
Copy link
Contributor

brandond commented Apr 3, 2024

The log is at /var/lib/rancher/k3s/agent/containerd/containerd.log

@plsnotracking
Copy link
Author

Saw a bunch of these but it eventually came back up by itself.

time="2024-04-02T11:53:26.897694415-07:00" level=error msg="post event" error="failed to connect: dial unix /run/k3s/containerd/containerd.sock.ttrpc: connect: connection refused"
time="2024-04-02T11:53:26.956829149-07:00" level=error msg="post event" error="failed to connect: dial unix /run/k3s/containerd/containerd.sock.ttrpc: connect: connection refused"
time="2024-04-02T11:53:27.011435217-07:00" level=error msg="post event" error="failed to connect: dial unix /run/k3s/containerd/containerd.sock.ttrpc: connect: connection refused"
time="2024-04-02T11:53:41.963038517-07:00" level=error msg="forward event" error="failed to connect: dial unix /run/k3s/containerd/containerd.sock.ttrpc: connect: connection refused"
time="2024-04-02T11:53:42.016557210-07:00" level=error msg="forward event" error="failed to connect: dial unix /run/k3s/containerd/containerd.sock.ttrpc: connect: connection refused"
time="2024-04-02T11:54:35.420320769-07:00" level=error msg="post event" error="failed to connect: dial unix /run/k3s/containerd/containerd.sock.ttrpc: connect: connection refused"
time="2024-04-02T12:37:38.474185393-07:00" level=error msg="ContainerStatus for \"34a3c7bac19e967dcde9a4e1e0d730762c37558665c1d4819dda51c1857c1362\" failed" error="rpc error: code = NotFound desc = an error occurred when try to find container \"34a3c7bac19e967dcde9a4e1e0d730762c37558665c1d4819dda51c1857c1362\": not found"
time="2024-04-02T14:00:00.820014557-07:00" level=error msg="ContainerStatus for \"8a7b30f9788b46be33cf33b1445a302521fdb3442b2737de5d35c404ec7fde46\" failed" error="rpc error: code = NotFound desc = an error occurred when try to find container \"8a7b30f9788b46be33cf33b1445a302521fdb3442b2737de5d35c404ec7fde46\": not found"
time="2024-04-02T14:00:00.820471057-07:00" level=error msg="ContainerStatus for \"162c0819119565a84d7bddfff0b177ef64ec75960c0dd4e23ad594dabf5649c6\" failed" error="rpc error: code = NotFound desc = an error occurred when try to find container \"162c0819119565a84d7bddfff0b177ef64ec75960c0dd4e23ad594dabf5649c6\": not found"
time="2024-04-02T14:00:37.045066423-07:00" level=error msg="ContainerStatus for \"bbd1ca30e5b4c4cd551cd4cffb6b614b8c570413af2e3c173f85ad81564ea2dc\" failed" error="rpc error: code = NotFound desc = an error occurred when try to find container \"bbd1ca30e5b4c4cd551cd4cffb6b614b8c570413af2e3c173f85ad81564ea2dc\": not found"
time="2024-04-02T14:00:37.045322506-07:00" level=error msg="ContainerStatus for \"0a02945c89438cd4b63fbbe1bb420ff8fd89ed6bbb1799f35e887fb17035e418\" failed" error="rpc error: code = NotFound desc = an error occurred when try to find container \"0a02945c89438cd4b63fbbe1bb420ff8fd89ed6bbb1799f35e887fb17035e418\": not found"
time="2024-04-02T14:00:51.179059858-07:00" level=error msg="ContainerStatus for \"9ffc976e76b78b7ec5267a90f420705342778f9f9c4e44f0d2bf1bf4339a4bb7\" failed" error="rpc error: code = NotFound desc = an error occurred when try to find container \"9ffc976e76b78b7ec5267a90f420705342778f9f9c4e44f0d2bf1bf4339a4bb7\": not found"
time="2024-04-02T14:00:51.179641066-07:00" level=error msg="ContainerStatus for \"077691ba4fc266debd3d78828a8b36adc918a4cd9f40395e7a32b1f669379833\" failed" error="rpc error: code = NotFound desc = an error occurred when try to find container \"077691ba4fc266debd3d78828a8b36adc918a4cd9f40395e7a32b1f669379833\": not found"
time="2024-04-02T14:01:04.229868050-07:00" level=error msg="ContainerStatus for \"671509b5adce43c9235e32f83c076781d246116047bb06cb44153bfc6dc7a76c\" failed" error="rpc error: code = NotFound desc = an error occurred when try to find container \"671509b5adce43c9235e32f83c076781d246116047bb06cb44153bfc6dc7a76c\": not found"
time="2024-04-02T14:01:04.230341008-07:00" level=error msg="ContainerStatus for \"b4658df241469fd9cd79031aae0b0661e213aafd2220766211d017fffb8a9d70\" failed" error="rpc error: code = NotFound desc = an error occurred when try to find container \"b4658df241469fd9cd79031aae0b0661e213aafd2220766211d017fffb8a9d70\": not found"
time="2024-04-02T14:01:21.327287402-07:00" level=error msg="ContainerStatus for \"3743f054fcc5c03cbc16c5c4735dc30e137163bab1741b9529419cd1250c5011\" failed" error="rpc error: code = NotFound desc = an error occurred when try to find container \"3743f054fcc5c03cbc16c5c4735dc30e137163bab1741b9529419cd1250c5011\": not found"
time="2024-04-02T14:01:21.327883152-07:00" level=error msg="ContainerStatus for \"3db686d251bad3e560b61f6bc9bc84f0402f01cf9415adde13696b6e4744d3b6\" failed" error="rpc error: code = NotFound desc = an error occurred when try to find container \"3db686d251bad3e560b61f6bc9bc84f0402f01cf9415adde13696b6e4744d3b6\": not found"
time="2024-04-02T18:02:35.232330615-07:00" level=error msg="ContainerStatus for \"4de91c29e3ce8d21c4fb2ca66b44cbc05a172d2ce2615dd56aee93ea2777a5cf\" failed" error="rpc error: code = NotFound desc = an error occurred when try to find container \"4de91c29e3ce8d21c4fb2ca66b44cbc05a172d2ce2615dd56aee93ea2777a5cf\": not found"

Thanks for all the help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done Issue
Development

No branches or pull requests

2 participants