Bump containerd to v1.6.14-k3s1 to fix issue with pod network info not being written to metadata store #6692

brandond · 2023-01-04T18:58:53Z

Containerd >= v1.6.9 is affected by an issue that causes pod network info to not be written to disk, causing CNI info to be lost when containerd is restarted. This in turn causes the kubelet to restart all non-host-network pods whenever k3s is restarted.

CRI: Sandbox IP not present after containerd restart containerd/containerd#7843

We need to update to v1.6.14-k3s1, which includes the backported fix from containerd/containerd#7845

ShylajaDevadiga · 2023-01-06T20:04:05Z

Validated on k3s version v1.26.0-rc1+k3s2, containerd version containerd://1.6.14-k3s1

Environment Details

Infrastructure
Cloud EC2 instance

Node(s) CPU architecture, OS, and Version:
Ubuntu 20.04

Cluster Configuration:
Multinode, 3 server 1 agent with etcd backend

Config.yaml:

Steps to reproduce the issue

Create cluster
Deploy workloads
Restart all k3s server and agent
The pods have restarted and Events show Pod sandbox changed, it will be killed and re-created.

Results from reproducing the issue on k3s version v1.26.0+k3s1:

$ kubectl get pods -A
NAMESPACE     NAME                                          READY   STATUS      RESTARTS        AGE
default       frontend-5hqks                                1/1     Running     1 (5m36s ago)   6m22s
default       frontend-6974x                                1/1     Running     1 (5m46s ago)   6m22s
default       frontend-fvkdk                                1/1     Running     1 (5m43s ago)   6m22s
default       frontend-lpmlx                                1/1     Running     1 (5m43s ago)   6m22s
default       nginx-clusterip-deployment-764ff5cd84-j4chn   1/1     Running     1 (5m36s ago)   7m30s
default       nginx-clusterip-deployment-764ff5cd84-vppnb   1/1     Running     1 (5m46s ago)   7m30s
default       nginx-loadbalancer-pod-66955c697f-l4hwk       1/1     Running     1 (5m37s ago)   6m57s
default       nginx-loadbalancer-pod-66955c697f-rpnk8       1/1     Running     1 (5m43s ago)   6m57s
kube-system   coredns-5c6b6c5476-49www                      1/1     Running     1 (5m43s ago)   13m
kube-system   helm-install-traefik-42hp2                    0/1     Completed   1               13m

$ kubectl describe pod frontend-6974x |tail -12
  ----    ------          ----  ----               -------
  Normal  Scheduled       11m   default-scheduler  Successfully assigned default/frontend-6974x to ip-172-31-9-126
  Normal  Pulling         11m   kubelet            Pulling image "nginx"
  Normal  Pulled          11m   kubelet            Successfully pulled image "nginx" in 3.142336675s (3.142345134s including waiting)
  Normal  Created         11m   kubelet            Created container webserver
  Normal  Started         11m   kubelet            Started container webserver
  Normal  SandboxChanged  11m   kubelet            Pod sandbox changed, it will be killed and re-created.
  Normal  Killing         11m   kubelet            Stopping container webserver

Results from validation k3s version v1.26.0-rc1+k3s2:

$ kubectl get pods -A
NAMESPACE     NAME                                          READY   STATUS      RESTARTS   AGE
default       frontend-66kln                                1/1     Running     0          33m
default       frontend-7lcnl                                1/1     Running     0          33m
default       frontend-vv48d                                1/1     Running     0          33m
default       frontend-zccdh                                1/1     Running     0          33m
default       nginx-clusterip-deployment-764ff5cd84-4cjrh   1/1     Running     0          46m
default       nginx-clusterip-deployment-764ff5cd84-cfp98   1/1     Running     0          46m
default       nginx-loadbalancer-pod-66955c697f-dpzfz       1/1     Running     0          32m
default       nginx-loadbalancer-pod-66955c697f-mhfx7       1/1     Running     0          32m
default       testingress-d8hvn                             1/1     Running     0          33m
default       testingress-vqlfl                             1/1     Running     0          33m
kube-system   coredns-5c6b6c5476-svpbx                      1/1     Running     0          4h38m

$ kubectl describe pods -A |grep -i sandbox

brandond added this to the v1.26.0+k3s2 milestone Jan 4, 2023

brandond self-assigned this Jan 4, 2023

brandond added kind/bug Something isn't working priority/critical-urgent area/cni labels Jan 4, 2023

brandond added this to To Triage in Development [DEPRECATED] via automation Jan 4, 2023

brandond moved this from To Triage to Peer Review in Development [DEPRECATED] Jan 4, 2023

This was referenced Jan 4, 2023

Bump containerd to v1.6.14-k3s1 #6693

Merged

[release-1.25] Bump containerd to v1.6.14-k3s1 #6694

Merged

[release-1.24] Bump containerd to v1.6.14-k3s1 #6695

Merged

brandond changed the title ~~Bump k3s to fix containerd CNI issue~~ Bump containerd to v1.6.14-k3s1 to fix issue with pod network info not being written to metadata store Jan 4, 2023

brandond moved this from Peer Review to To Test in Development [DEPRECATED] Jan 6, 2023

This was referenced Jan 6, 2023

[Backport v1.24] Bump containerd to v1.6.14-k3s1 #6703

Closed

[Backport v1.25] Bump containerd to v1.6.14-k3s1 #6704

Closed

mdrahman-suse assigned ShylajaDevadiga Jan 6, 2023

ShylajaDevadiga closed this as completed Jan 6, 2023

Development [DEPRECATED] automation moved this from To Test to Done Issue / Merged PR Jan 6, 2023

This was referenced Jan 6, 2023

Exclude December r1 releases from channel server #6706

Merged

Bump containerd to v1.6.15-k3s1 #6721

Closed

brandond mentioned this issue Jan 23, 2023

[release-1.23] Bump containerd to v1.5.16-k3s2 to fix issue with pod network info not being written to metadata store #6809

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump containerd to v1.6.14-k3s1 to fix issue with pod network info not being written to metadata store #6692

Bump containerd to v1.6.14-k3s1 to fix issue with pod network info not being written to metadata store #6692

brandond commented Jan 4, 2023

ShylajaDevadiga commented Jan 6, 2023

Bump containerd to v1.6.14-k3s1 to fix issue with pod network info not being written to metadata store #6692

Bump containerd to v1.6.14-k3s1 to fix issue with pod network info not being written to metadata store #6692

Comments

brandond commented Jan 4, 2023

ShylajaDevadiga commented Jan 6, 2023

Validated on k3s version v1.26.0-rc1+k3s2, containerd version containerd://1.6.14-k3s1

Environment Details

Steps to reproduce the issue