Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The new pod is stuck in CrashLoopBackOff #82512

Open
sofat1989 opened this issue Sep 10, 2019 · 8 comments · May be fixed by #82559

Comments

@sofat1989
Copy link

commented Sep 10, 2019

It happened usually on static pods.
Here we have kube-controller-manager.manifest under the dir /etc/kubernetes/manifest where kubelet will watch.
When we update the file kube-controller-manager.manifest, the container will be running, but the pod will be in CrashLoopBackOff, and kubelet will create new container. Because the running container occupied the net port, the new container will be exited, like this

Mon Sep  9 03:39:03 -07 2019
f3c76bd7e49cf       567cbdec6059a       3 seconds ago       Exited              kube-controller-manager   3                   77fd9b0b811cd
a6fea08550199       567cbdec6059a       27 seconds ago      Running             kube-controller-manager   0                   77fd9b0b811cd

we found these clues.

  • From the containers creating perspective:
  1. The out-of-updated kube-controller (container 53ccc516f4743, pod 3f1062a5c9bf9) is terminated at 03:38:35. The newly created pod/container (77fd9b0b811cd/a6fea08550199) is created at 03:38:35. At 03:38:36, container a6fea08550199 becomes running, but kubelet created a new container e80be462debdd. It exited because of binding port failed.
Mon Sep  9 03:38:34 -07 2019
53ccc516f4743       567cbdec6059a       9 seconds ago       Running             kube-controller-manager   16                  3f1062a5c9bf9
d6b68ccdf1338       567cbdec6059a       39 seconds ago      Exited              kube-controller-manager   15                  3f1062a5c9bf9
5b5f356863dd5       02794955c9c6a       11 days ago         Running             etcd-container            0                   2b6e793eaf51d
2a89bd6d6df5b       02794955c9c6a       11 days ago         Running             etcd-container            0                   2bb63f51bf5a5
Mon Sep  9 03:38:35 -07 2019
a6fea08550199       567cbdec6059a       Less than a second ago   Created             kube-controller-manager   0                   77fd9b0b811cd
53ccc516f4743       567cbdec6059a       10 seconds ago           Exited              kube-controller-manager   16                  3f1062a5c9bf9
5b5f356863dd5       02794955c9c6a       11 days ago              Running             etcd-container            0                   2b6e793eaf51d
2a89bd6d6df5b       02794955c9c6a       11 days ago              Running             etcd-container            0                   2bb63f51bf5a5
Mon Sep  9 03:38:36 -07 2019
e80be462debdd       567cbdec6059a       Less than a second ago   Exited              kube-controller-manager   1                   77fd9b0b811cd
a6fea08550199       567cbdec6059a       1 second ago             Running             kube-controller-manager   0                   77fd9b0b811cd
  • From the kubelet' logs
  1. at 03:38:35.860332, kubelet created the container a6fea08550199. Then kubelet did relisting. It got the container status is created and updated the status in the cache.
Sep 09 03:38:35  kubelet[74646]: I0909 03:38:35.860332   74646 kuberuntime_container.go:124]: startContainer pod kube-controller-manager-kubernetes-master-*, container id a6fea08550199dfa93fdc2715d6216676263146459611624492a9ef67ba28e3
  1. At the same time, due to events (for example, podsandbox started) received by kubelet, it will add some podUpdates to the podWorker. It will add these podUpdates to pod workers' lastUndeliveredWorkUpdate
Sep 09 03:38:35 kubelet[74646]: I0909 03:38:35.861660   74646 kubelet.go:1937] SyncLoop (PLEG): "kube-controller-manager-kubernetes-master-*_default(6d138ed774827ebf9516febee0985e63)", event: &pleg.PodLifecycleEvent{ID:"6d138ed774827ebf9516febee0985e63", Type:"ContainerStarted", Data:"77fd9b0b811cdfc00174ff0c69bed411385f59156d7a6482dc08d82f3f7890d7"}
Sep 09 03:38:35 kubelet[74646]: I0909 03:38:35.861688   74646 kubelet.go:2144] fesu: HandlePodSyncs, pod kube-controller-manager-*, UID 6d138ed774827ebf9516febee0985e63, type SyncPodSyn
  1. So when the SyncPod is finished in step 1, it will check the new updates produced in step 2.
    https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/pod_workers.go#L193
    (WarpUp -> checkForUpdates -> lastUndeliveredWorkUpdate exists -> trigger new pod sync)

  2. This round of SyncPod, it will read the containers' status from cache. The container status is created. In the logic of computePodActions, it will treat this kind of container is dead and then create a new container.

    message := fmt.Sprintf("Container %+v is dead, but RestartPolicy says that we should restart it.", container)

Sep 09 03:38:36 kubelet[74646]: I0909 03:38:36.349800   74646 kuberuntime_manager.go:541] Container {Name:kube-controller-manager *} is dead, but RestartPolicy says that we should restart it.
Sep 09 03:38:36 kubelet[74646]: I0909 03:38:36.349829   74646 kuberuntime_manager.go:609] computePodActions got {KillPod:false CreateSandbox:false SandboxID:77fd9b0b811cdfc00174ff0c69bed411385f59156d7a6482dc08d82f3f7890d7 Attempt:0 NextInitContainerToStart:nil ContainersToStart:[0] ContainersToKill:map[]} for pod "kube-controller-manager-kubernetes-master*_default(6d138ed774827ebf9516febee0985e63)"
  1. Kubelet will create a new container, but the previous container a6fea08550199 becomes running. the new container exited. So in the kubelet cache, we have two containers' status like this.
"ContainerStatuses": [
    {
      "CreatedAt": "2019-09-09T04:05:15.327114101-07:00",
      "ExitCode": 1,
      "FinishedAt": "2019-09-09T04:05:15.706714124-07:00",
      "Hash": 4211662407,
      "ID": "containerd://4276675520cebe8592122f85bcaec3622b0a2e0c088ab64f7fd864f54821e46c",
      "Image": "*0.36.37",
      "ImageID": "sha256:567cbdec6059ac93714ccc53e9f555271b4f57d29440e3a7e819e9a12c0b351e",
      "Message": "",
      "Name": "kube-controller-manager",
      "Reason": "Error",
      "RestartCount": 11,
      "StartedAt": "2019-09-09T04:05:15.600147233-07:00",
      "State": "exited"
    },
    {
      "CreatedAt": "2019-09-09T03:38:35.859465638-07:00",
      "ExitCode": 0,
      "FinishedAt": "0001-01-01T00:00:00Z",
      "Hash": 4211662407,
      "ID": "containerd://a6fea08550199dfa93fdc2715d6216676263146459611624492a9ef67ba28e3d",
      "Image": "*release-0.36.37",
      "ImageID": "sha256:567cbdec6059ac93714ccc53e9f555271b4f57d29440e3a7e819e9a12c0b351e",
      "Message": "",
      "Name": "kube-controller-manager",
      "Reason": "",
      "RestartCount": 0,
      "StartedAt": "2019-09-09T03:38:36.344635806-07:00",
      "State": "running"
    }
  ],
  1. Next time, the kubelet computed the action, it only reads the first one. The container's status is exited, so it will always recreate this container.
computePodActions -- > FindContainerStatusByName

if containerStatus.Name == containerName {

For the general pods (not static pods), we did the same test. After the container 73c5497c78028 was running, kubelet got the second chance to update the pods (do syncPod)

Sep 09 18:47:15  kubelet[80044]: I0909 18:47:15.952693   80044 generic.go:147] GenericPLEG: dd21f9e2-d36c-11e9-8ed0-3cfdfe576498/73c5497c78028ad10e32dd5d930c376b62606f966ebc874392c6c52dbcbd88f8: non-existent -> unknown
Sep 09 18:47:16 kubelet[80044]: I0909 18:47:16.956477   80044 generic.go:147] GenericPLEG: dd21f9e2-d36c-11e9-8ed0-3cfdfe576498/73c5497c78028ad10e32dd5d930c376b62606f966ebc874392c6c52dbcbd88f8: unknown -> running
Sep 09 18:47:16 kubelet[80044]: I0909 18:47:16.957794   80044 kubelet.go:1937] SyncLoop (PLEG): "checknode-fesu_default(dd21f9e2-d36c-11e9-8ed0-3cfdfe576498)", event: &pleg.PodLifecycleEvent{ID:"dd21f9e2-d36c-11e9-8ed0-3cfdfe576498", Type:"ContainerStarted", Data:"73c5497c78028ad10e32dd5d930c376b62606f966ebc874392c6c52dbcbd88f8"}
Sep 09 18:48:38 kubelet[80044]: I0909 18:48:38.219579   80044 kuberuntime_manager.go:456] Syncing Pod "checknode_default(dd21f9e2-d36c-11e9-8ed0-3cfdfe576498)":

So I think there is a race condition for the container creating, especially when the container is not started too quickly.
we use the containerd v1.2.7

@sofat1989

This comment has been minimized.

Copy link
Author

commented Sep 10, 2019

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node and removed needs-sig labels Sep 10, 2019

@tedyu

This comment has been minimized.

Copy link
Contributor

commented Sep 10, 2019

How about the following change ?

diff --git a/pkg/kubelet/container/runtime.go b/pkg/kubelet/container/runtime.go
index 9699a5b5cf..5d0b1e7327 100644
--- a/pkg/kubelet/container/runtime.go
+++ b/pkg/kubelet/container/runtime.go
@@ -317,12 +317,17 @@ type ContainerStatus struct {
 // FindContainerStatusByName returns container status in the pod status with the given name.
 // When there are multiple containers' statuses with the same name, the first match will be returned.
 func (podStatus *PodStatus) FindContainerStatusByName(containerName string) *ContainerStatus {
+       var containerStat *ContainerStatus
        for _, containerStatus := range podStatus.ContainerStatuses {
                if containerStatus.Name == containerName {
-                       return containerStatus
+                       if containerStat == nil {
+                               containerStat = containerStatus
+                       } else if containerStat.State == ContainerStateExited {
+                               containerStat = containerStatus
+                       }
                }
        }
-       return nil
+       return containerStat
 }

 // Get container status of all the running containers in a pod

We try to find ContainerStatus whose state is not Exited.

@sofat1989

This comment has been minimized.

Copy link
Author

commented Sep 10, 2019

@tedyu from the codes you change, i think the return parameter of function FindContainerStatusByName is the state of the first container or the state of exited. I think you should change containerStat.State == ContainerStateExited to containerStat.State != ContainerStateExited

 // When there are multiple containers' statuses with the same name, the first match will be returned.
 func (podStatus *PodStatus) FindContainerStatusByName(containerName string) *ContainerStatus {
+       var containerStat *ContainerStatus
        for _, containerStatus := range podStatus.ContainerStatuses {
                if containerStatus.Name == containerName {
-                       return containerStatus
+                       if containerStat == nil {
+                               containerStat = containerStatus
+                       } else if containerStat.State != ContainerStateExited {
+                               containerStat = containerStatus
+                       }
                }
        }
-       return nil
+       return containerStat
 }

 // Get container status of all the running containers in a pod

By the way, i think we have multiple ways to avoid this.

  1. Try to find the non-exited containers when getting the containers' statuses.
    This is just a workaround for our case. There is another one case we can not fix by this way. Here kube-controller-manager will bind the port, so the second container will exit. But most of time, if the containers dont bind the port, the second container will not exit. We will have two containers are running.

  2. Killing the container like the unkown containers

  3. Skip recreating the container when the container is in created status

if containerStatus != nil && containerStatus.State == kubecontainer.ContainerStateUnknown {
					// If container is in unknown state, we don't know whether it
					// is actually running or not, always try killing it before
					// restart to avoid having 2 running instances of the same container.
					changes.ContainersToKill[containerStatus.ID] = containerToKillInfo{
						name:      containerStatus.Name,
						container: &pod.Spec.Containers[idx],
						message: fmt.Sprintf("Container is in %q state, try killing it before restart",
							containerStatus.State),
					}
				}

https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L568

We need to determine which one is better

@tedyu

This comment has been minimized.

Copy link
Contributor

commented Sep 10, 2019

Looking at my change, I think it should achieve #1.
containerStat is the status to be returned.
When containerStat.State == ContainerStateExited, it can be overwritten by subsequent status.

@tedyu

This comment has been minimized.

Copy link
Contributor

commented Sep 10, 2019

Inside the loop:

	for _, containerStatus := range podStatus.ContainerStatuses {

We can kill container(s) with ContainerStateUnknown state.
Meaning #1 and #2 can be done at the same time.

@tedyu tedyu referenced a pull request that will close this issue Sep 11, 2019
@sofat1989

This comment has been minimized.

Copy link
Author

commented Sep 11, 2019

@tedyu even If we combine #1 and #2, may it happen that container repeats being created and then killed ?

@tedyu

This comment has been minimized.

Copy link
Contributor

commented Sep 11, 2019

commit 9fa1ad2
Author: Yu-Ju Hong yjhong@google.com
Date: Mon Feb 13 21:09:44 2017 -0800

kubelet: handle containers in the "created" state

@yujuhong
Can you shed some light for the handling of created state ?

Thanks

@sofat1989

This comment has been minimized.

Copy link
Author

commented Sep 14, 2019

@tedyu @yujuhong created state should be not normal for kubelet syncPod. the root cause is that when kubelet computes the actions according to container status cache which is updated by relisting, at that time it is out of dated. Shall we query the status directly from container runtime?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.