New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubelet goes to a cyclic restart loop when inconsistent container list received from runtimeservice #21
Comments
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
/remove-lifecycle stale |
Same issue observed on v1.20.4. |
We are facing the similar issue on v1.18.19 . Kubelet on some nodes goes in a restart loop with same error. I am going to try ^ solution and I hope it works for us as well . This issue is blocking the whole release. |
/kind bug |
Find broken container ( |
Hit same issue on kubernetes 1.21.2. I wonder
|
We are also experiencing this issue on 1.19.10. The resolution given by @artur9010 works for us. |
We created a customized image for kubelet as mentioned by @mikkosest and we were able to upgrade our kube clusters to 1.18.19 and did not faced this issue from last 3 months. |
Experiencing this issue with kubernetes 1.21.5 and containerd, the exited container without a corresponding pod is always a calico-node container |
Experiencing the same with Cillium |
We got the same with a lot of containers. In our instance we had a node that had been moved from one cluster to another cluster and then back. I had to essentially stop and remove every container running that was previously booted on the node until Kubelet finally booted. V - 1.21.4, and CNI is Calico. |
Hello, This is a mirror repo and is not monitored: https://github.com/kubernetes/kubelet#where-does-it-come-from Please file issues against https://github.com/kubernetes/kubernetes/issues /close |
@ehashman: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Kubelet 1.19.3
When node joining is started with kubeadm, PodSandbox is created for Multus, Podsandbox dies for unknown reason. Problem is that this podsandbox container removal is skipped because container ID is not found in pods list (ContainerStatus[]) in Kubelet.
Later when container manager make queries
runtimeService.ListContainers(nil) and runtimeService.ListPodsandbox(nil)
and loop containers, one of the containers has reference to the died PodSandbox which is not anymore in Podsandbox list returned for runtimeService. This leads to Kubelet fatal crash. Because there is no working logic to cleanup non-existing Podsandbox reference from container returned inruntimeService.ListContainers(nil)
kubelet start to crash in loop.kubelet[5992]: I1217 11:30:42.639790 5992 kubelet.go:1898] SyncLoop (PLEG): ignore irrelevant event: &pleg.PodLifecycleEvent{ID:"68224015-de33-4879-a229-b8eee8538b89", Type:"ContainerDied", Data:"894f35dca3eda57ad ef28b69acd0607efdeb34e8814e87e196bc163305576028"} 2020-12-17T09:30:42.640070+00:00 base-image-2 kubelet[5992]: W1217 11:30:42.639799 5992 pod_container_deletor.go:79] Container " 894f35dca3eda57adef28b69acd0607efdeb34e8814e87e196bc163305576028" not found in pod's containers 2020-12-17T09:30:43.234857+00:00 base-image-2 kubelet[5992]: I1217 11:30:43.232179 5992 generic.go:155] GenericPLEG: 68224015-de 33-4879-a229-b8eee8538b89/894f35dca3eda57adef28b69acd0607efdeb34e8814e87e196bc163305576028: exited -> non-existent kubelet.go:1325] Failed to start ContainerManager failed to build map of initial containers from runtime: no PodsandBox found with Id '894f35dca3eda57adef28b69acd0607efdeb34e8814e87e196bc163305576028'
Workaround to add runtimeService.RemoveContainer call for this podsandbox container in container manager
The text was updated successfully, but these errors were encountered: