Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump containerd to v1.6.14-k3s1 to fix issue with pod network info not being written to metadata store #6692

Closed
brandond opened this issue Jan 4, 2023 · 1 comment
Assignees
Labels
area/cni kind/bug Something isn't working
Milestone

Comments

@brandond
Copy link
Contributor

brandond commented Jan 4, 2023

Containerd >= v1.6.9 is affected by an issue that causes pod network info to not be written to disk, causing CNI info to be lost when containerd is restarted. This in turn causes the kubelet to restart all non-host-network pods whenever k3s is restarted.

We need to update to v1.6.14-k3s1, which includes the backported fix from containerd/containerd#7845

@brandond brandond added this to the v1.26.0+k3s2 milestone Jan 4, 2023
@brandond brandond self-assigned this Jan 4, 2023
@brandond brandond added this to To Triage in Development [DEPRECATED] via automation Jan 4, 2023
@brandond brandond moved this from To Triage to Peer Review in Development [DEPRECATED] Jan 4, 2023
@brandond brandond changed the title Bump k3s to fix containerd CNI issue Bump containerd to v1.6.14-k3s1 to fix issue with pod network info not being written to metadata store Jan 4, 2023
@brandond brandond moved this from Peer Review to To Test in Development [DEPRECATED] Jan 6, 2023
@ShylajaDevadiga
Copy link
Contributor

Validated on k3s version v1.26.0-rc1+k3s2, containerd version containerd://1.6.14-k3s1

Environment Details

Infrastructure
Cloud EC2 instance

Node(s) CPU architecture, OS, and Version:
Ubuntu 20.04

Cluster Configuration:
Multinode, 3 server 1 agent with etcd backend

Config.yaml:

Steps to reproduce the issue

  1. Create cluster
  2. Deploy workloads
  3. Restart all k3s server and agent
  4. The pods have restarted and Events show Pod sandbox changed, it will be killed and re-created.

Results from reproducing the issue on k3s version v1.26.0+k3s1:

$ kubectl get pods -A
NAMESPACE     NAME                                          READY   STATUS      RESTARTS        AGE
default       frontend-5hqks                                1/1     Running     1 (5m36s ago)   6m22s
default       frontend-6974x                                1/1     Running     1 (5m46s ago)   6m22s
default       frontend-fvkdk                                1/1     Running     1 (5m43s ago)   6m22s
default       frontend-lpmlx                                1/1     Running     1 (5m43s ago)   6m22s
default       nginx-clusterip-deployment-764ff5cd84-j4chn   1/1     Running     1 (5m36s ago)   7m30s
default       nginx-clusterip-deployment-764ff5cd84-vppnb   1/1     Running     1 (5m46s ago)   7m30s
default       nginx-loadbalancer-pod-66955c697f-l4hwk       1/1     Running     1 (5m37s ago)   6m57s
default       nginx-loadbalancer-pod-66955c697f-rpnk8       1/1     Running     1 (5m43s ago)   6m57s
kube-system   coredns-5c6b6c5476-49www                      1/1     Running     1 (5m43s ago)   13m
kube-system   helm-install-traefik-42hp2                    0/1     Completed   1               13m

$ kubectl describe pod frontend-6974x |tail -12
  ----    ------          ----  ----               -------
  Normal  Scheduled       11m   default-scheduler  Successfully assigned default/frontend-6974x to ip-172-31-9-126
  Normal  Pulling         11m   kubelet            Pulling image "nginx"
  Normal  Pulled          11m   kubelet            Successfully pulled image "nginx" in 3.142336675s (3.142345134s including waiting)
  Normal  Created         11m   kubelet            Created container webserver
  Normal  Started         11m   kubelet            Started container webserver
  Normal  SandboxChanged  11m   kubelet            Pod sandbox changed, it will be killed and re-created.
  Normal  Killing         11m   kubelet            Stopping container webserver

Results from validation k3s version v1.26.0-rc1+k3s2:

$ kubectl get pods -A
NAMESPACE     NAME                                          READY   STATUS      RESTARTS   AGE
default       frontend-66kln                                1/1     Running     0          33m
default       frontend-7lcnl                                1/1     Running     0          33m
default       frontend-vv48d                                1/1     Running     0          33m
default       frontend-zccdh                                1/1     Running     0          33m
default       nginx-clusterip-deployment-764ff5cd84-4cjrh   1/1     Running     0          46m
default       nginx-clusterip-deployment-764ff5cd84-cfp98   1/1     Running     0          46m
default       nginx-loadbalancer-pod-66955c697f-dpzfz       1/1     Running     0          32m
default       nginx-loadbalancer-pod-66955c697f-mhfx7       1/1     Running     0          32m
default       testingress-d8hvn                             1/1     Running     0          33m
default       testingress-vqlfl                             1/1     Running     0          33m
kube-system   coredns-5c6b6c5476-svpbx                      1/1     Running     0          4h38m

$ kubectl describe pods -A |grep -i sandbox

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cni kind/bug Something isn't working
Projects
No open projects
Development

No branches or pull requests

2 participants