Kubelet goes to a cyclic restart loop when inconsistent container list received from runtimeservice #21

mikkosest · 2020-12-22T21:44:58Z

Kubelet 1.19.3
When node joining is started with kubeadm, PodSandbox is created for Multus, Podsandbox dies for unknown reason. Problem is that this podsandbox container removal is skipped because container ID is not found in pods list (ContainerStatus[]) in Kubelet.

Later when container manager make queries runtimeService.ListContainers(nil) and runtimeService.ListPodsandbox(nil) and loop containers, one of the containers has reference to the died PodSandbox which is not anymore in Podsandbox list returned for runtimeService. This leads to Kubelet fatal crash. Because there is no working logic to cleanup non-existing Podsandbox reference from container returned inruntimeService.ListContainers(nil) kubelet start to crash in loop.
kubelet[5992]: I1217 11:30:42.639790 5992 kubelet.go:1898] SyncLoop (PLEG): ignore irrelevant event: &pleg.PodLifecycleEvent{ID:"68224015-de33-4879-a229-b8eee8538b89", Type:"ContainerDied", Data:"894f35dca3eda57ad ef28b69acd0607efdeb34e8814e87e196bc163305576028"} 2020-12-17T09:30:42.640070+00:00 base-image-2 kubelet[5992]: W1217 11:30:42.639799 5992 pod_container_deletor.go:79] Container " 894f35dca3eda57adef28b69acd0607efdeb34e8814e87e196bc163305576028" not found in pod's containers 2020-12-17T09:30:43.234857+00:00 base-image-2 kubelet[5992]: I1217 11:30:43.232179 5992 generic.go:155] GenericPLEG: 68224015-de 33-4879-a229-b8eee8538b89/894f35dca3eda57adef28b69acd0607efdeb34e8814e87e196bc163305576028: exited -> non-existent kubelet.go:1325] Failed to start ContainerManager failed to build map of initial containers from runtime: no PodsandBox found with Id '894f35dca3eda57adef28b69acd0607efdeb34e8814e87e196bc163305576028'

Workaround to add runtimeService.RemoveContainer call for this podsandbox container in container manager

func buildContainerMapFromRuntime(runtimeService internalapi.RuntimeService) (containermap.ContainerMap, error) {
	podSandboxMap := make(map[string]string)
	podSandboxList, _ := runtimeService.ListPodSandbox(nil)
	for _, p := range podSandboxList {
		podSandboxMap[p.Id] = p.Metadata.Uid
	}

	containerMap := containermap.NewContainerMap()
	containerList, _ := runtimeService.ListContainers(nil)
	for _, c := range containerList {
		if _, exists := podSandboxMap[c.PodSandboxId]; !exists {
Line added------------> runtimeService.RemoveContainer(c.Id)
			return nil, fmt.Errorf("no PodsandBox found with Id '%s'", c.PodSandboxId)
		}
		containerMap.Add(podSandboxMap[c.PodSandboxId], c.Metadata.Name, c.Id)
	}

	return containerMap, nil
}

The text was updated successfully, but these errors were encountered:

fejta-bot · 2021-03-23T08:55:43Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

tmmorin · 2021-04-12T08:01:00Z

/remove-lifecycle stale

ansilh · 2021-06-09T06:15:59Z

Same issue observed on v1.20.4.

amitsingla · 2021-06-15T19:03:16Z

We are facing the similar issue on v1.18.19 . Kubelet on some nodes goes in a restart loop with same error. I am going to try ^ solution and I hope it works for us as well . This issue is blocking the whole release.

ialidzhikov · 2021-06-22T14:12:28Z

/kind bug
/sig node

artur9010 · 2021-07-07T16:18:59Z

Find broken container (docker ps -a --filter "label=io.kubernetes.sandbox.id=894f35dca3eda57adef28b69acd0607efdeb34e8814e87e196bc163305576028" <--- id from error message) and remove it using docker rm ID, then restart kubelet.

YanzhaoLi · 2021-08-09T12:43:11Z

Hit same issue on kubernetes 1.21.2. I wonder

Is it expected that PodSandbox died for unknown reason?
Why is kubelet not aware of died PodSandbox ?

hjkatz · 2021-09-21T18:14:09Z

We are also experiencing this issue on 1.19.10. The resolution given by @artur9010 works for us.

amitsingla · 2021-09-21T18:57:13Z

We created a customized image for kubelet as mentioned by @mikkosest and we were able to upgrade our kube clusters to 1.18.19 and did not faced this issue from last 3 months.

fungusakafungus · 2021-10-14T10:15:15Z

Experiencing this issue with kubernetes 1.21.5 and containerd, the exited container without a corresponding pod is always a calico-node container

TobiasDeBruijn · 2021-10-24T14:49:31Z

Experiencing this issue with kubernetes 1.21.5 and containerd, the exited container without a corresponding pod is always a calico-node container

Experiencing the same with Cillium

ADustyOldMuffin · 2021-11-12T20:56:56Z

We got the same with a lot of containers. In our instance we had a node that had been moved from one cluster to another cluster and then back. I had to essentially stop and remove every container running that was previously booted on the node until Kubelet finally booted.

V - 1.21.4, and CNI is Calico.

ehashman · 2021-11-16T20:15:52Z

Hello,

This is a mirror repo and is not monitored: https://github.com/kubernetes/kubelet#where-does-it-come-from

Please file issues against https://github.com/kubernetes/kubernetes/issues

/close

k8s-ci-robot · 2021-11-16T20:16:13Z

@ehashman: Closing this issue.

In response to this:

Hello,

This is a mirror repo and is not monitored: https://github.com/kubernetes/kubelet#where-does-it-come-from

Please file issues against https://github.com/kubernetes/kubernetes/issues

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 23, 2021

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 12, 2021

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. sig/node Categorizes an issue or PR as relevant to SIG Node. labels Jun 22, 2021

GoldElysium mentioned this issue Nov 9, 2021

Failing to start: failed to build map of initial containers k3s-io/k3s#4356

Closed

1 task

SergeyKanzhelev added this to Triage in SIG Node Bugs Nov 10, 2021

k8s-ci-robot closed this as completed Nov 16, 2021

SIG Node Bugs automation moved this from Triage to Done Nov 16, 2021

xinfengliu mentioned this issue Dec 10, 2021

buildContainerMapFromRuntime do not return err when PodSandbox not exist kubernetes/kubernetes#98225

Closed

xinfengliu mentioned this issue Jan 24, 2022

Make kubelet container manager not abort on PodSandbox not existing kubernetes/kubernetes#107683

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubelet goes to a cyclic restart loop when inconsistent container list received from runtimeservice #21

Kubelet goes to a cyclic restart loop when inconsistent container list received from runtimeservice #21

mikkosest commented Dec 22, 2020 •

edited

fejta-bot commented Mar 23, 2021

tmmorin commented Apr 12, 2021

ansilh commented Jun 9, 2021

amitsingla commented Jun 15, 2021

ialidzhikov commented Jun 22, 2021

artur9010 commented Jul 7, 2021

YanzhaoLi commented Aug 9, 2021

hjkatz commented Sep 21, 2021

amitsingla commented Sep 21, 2021 •

edited

fungusakafungus commented Oct 14, 2021

TobiasDeBruijn commented Oct 24, 2021

ADustyOldMuffin commented Nov 12, 2021 •

edited

ehashman commented Nov 16, 2021

k8s-ci-robot commented Nov 16, 2021

Kubelet goes to a cyclic restart loop when inconsistent container list received from runtimeservice #21

Kubelet goes to a cyclic restart loop when inconsistent container list received from runtimeservice #21

Comments

mikkosest commented Dec 22, 2020 • edited

fejta-bot commented Mar 23, 2021

tmmorin commented Apr 12, 2021

ansilh commented Jun 9, 2021

amitsingla commented Jun 15, 2021

ialidzhikov commented Jun 22, 2021

artur9010 commented Jul 7, 2021

YanzhaoLi commented Aug 9, 2021

hjkatz commented Sep 21, 2021

amitsingla commented Sep 21, 2021 • edited

fungusakafungus commented Oct 14, 2021

TobiasDeBruijn commented Oct 24, 2021

ADustyOldMuffin commented Nov 12, 2021 • edited

ehashman commented Nov 16, 2021

k8s-ci-robot commented Nov 16, 2021

mikkosest commented Dec 22, 2020 •

edited

amitsingla commented Sep 21, 2021 •

edited

ADustyOldMuffin commented Nov 12, 2021 •

edited