Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

computePodActions() assumes "created" containers as dead #80005

Open
fahadullah opened this issue Jul 11, 2019 · 3 comments

Comments

Projects
None yet
3 participants
@fahadullah
Copy link

commented Jul 11, 2019

What happened:
Runtime started an already created (later running) container without killing the existing one. I added some extra logging in computePodActions() to log the container statuses below. There were two containers containerA, containerB in the pod.

GetPodStatus() returned containerB as "created" and containerA as "running" which was later used by SyncPod.

[2019-07-10T18:14:27.007210291Z] [DEBG] [github.com/kubernetes-incubator/cri-o/server/container_status.go:20] [455-77444] ContainerStatusRequest &
ContainerStatusRequest{ContainerId:9d354614f0f0fe7d1978c664b37caeaa5ded561cfc4440160c8db9b1684f6d5b,Verbose:false,}


[2019-07-10T18:14:27.040311838Z] [DEBG] [github.com/kubernetes-incubator/cri-o/server/container_status.go:126] [455-77444] ContainerStatusResponse: &ContainerStatusResponse{Status:&ContainerStatus{Id:9d354614f0f0fe7d1978c664b37caeaa5ded561cfc4440160c8db9b1684f6d5b,Metadata:&ContainerMetadata{Name:containerB,Attempt:0,},State:CONTAINER_CREATED,CreatedAt:1562782467008475373,StartedAt:0,FinishedAt:0,ExitCode:0 ... }

...

[2019-07-10T18:14:27.042497744Z] [DEBG] [github.com/kubernetes-incubator/cri-o/server/container_status.go:20] [455-77421] ContainerStatusRequest &ContainerStatusRequest{ContainerId:384900eade0d80664b45694a34240a00f5b309a1cc7a89918038d45da5e103

[2019-07-10T18:14:27.046934132Z] [DEBG] [github.com/kubernetes-incubator/cri-o/server/container_status.go:126] [455-77421] ContainerStatusResponse: &ContainerStatusResponse{Status:&ContainerStatus{Id:384900eade0d80664b45694a34240a00f5b309a1cc7a89918038d45da5e103f9,Metadata:&ContainerMetadata{Name:containerA,Attempt:0,},State:CONTAINER_RUNNING,CreatedAt:1562782353396385066,StartedAt:1562782353445905855,FinishedAt:0,ExitCode:0 ...}

...

[2019-07-10T18:14:27.054452285Z] [DEBG] [k8s.io/kubernetes/pkg/kubelet/pleg/generic.go:367] [455-312] PLEG: Write status for random-podname/somens: &container.PodStatus{ID:"ffb97fc8-a33d-11e9-936b-6e7d8bb0ce00", Name:"random-podname", Namespace:"somens", IP:"172.16.0.15", ContainerStatuses:[]*container.ContainerStatus{(*container.ContainerStatus)(0xc422e07260), (*container.ContainerStatus)(0xc425e180e0)}, ... (err: <nil>) at 2019-07-10 18:14:27.005444627 +0000 UTC m=+281.236666640

...

[2019-07-10T18:14:27.721003566Z] [DEBG] [k8s.io/kubernetes/pkg/kubelet/kuberuntime/kuberuntime_manager.go:440] [455-31723] PodStatus "random-podname_somens(ffb97fc8-a33d-11e9-936b-6e7d8bb0ce00)": &{ID:ffb97fc8-a33d-11e9-936b-6e7d8bb0ce00 Name:random-podname Namespace:somens IP:172.16.0.15 ContainerStatuses:[0xc422e07260 0xc425e180e0] SandboxStatuses:[&PodSandboxStatus{Id:847f1b4fde106f45086bb12e3080da3eb7e10bb270e0a81bfd44b6f06851a469,Metadata:&PodSandboxMetadata{Name:random-podname,Uid:ffb97fc8-a33d-11e9-936b-6e7d8bb0ce00,Namespace:somens,Attempt:0,},State:SANDBOX_READY,CreatedAt:1562782264738529888 ...}

[2019-07-10T18:14:27.721026835Z] [DEBG] [k8s.io/kubernetes/pkg/kubelet/kuberuntime/kuberuntime_manager.go:444] [455-31723] ContainerStatus "random-podname_somens(ffb97fc8-a33d-11e9-936b-6e7d8bb0ce00)": &{ID:{Type:cri-o ID:9d354614f0f0fe7d1978c664b37caeaa5ded561cfc4440160c8db9b1684f6d5b} Name:containerB State:created CreatedAt:2019-07-10 18:14:27.008475373 +0000 UTC StartedAt:0001-01-01 00:00:00 +0000 UTC FinishedAt:0001-01-01 00:00:00 +0000 UTC ExitCode:0 Image:<> Hash:1215756007 RestartCount:0 Reason: Message:}

[2019-07-10T18:14:27.721042818Z] [DEBG] [k8s.io/kubernetes/pkg/kubelet/kuberuntime/kuberuntime_manager.go:444] [455-31723] ContainerStatus "random-podname_somens(ffb97fc8-a33d-11e9-936b-6e7d8bb0ce00)": &{ID:{Type:cri-o ID:384900eade0d80664b45694a34240a00f5b309a1cc7a89918038d45da5e103f9} Name:containerA State:running CreatedAt:2019-07-10 18:12:33.396385066 +0000 UTC StartedAt:2019-07-10 18:12:33.445905855 +0000 UTC FinishedAt:0001-01-01 00:00:00 +0000 UTC ExitCode:0 Image:<> Hash:2937488709 RestartCount:0 Reason: Message:}

[2019-07-10T18:14:27.721174341Z] [INFO] [k8s.io/kubernetes/pkg/kubelet/kuberuntime/kuberuntime_manager.go:528] [455-31723] Container {Name:containerB ...} is dead, but RestartPolicy says that we should restart it.

[2019-07-10T18:14:27.721186876Z] [DEBG] [k8s.io/kubernetes/pkg/kubelet/kuberuntime/kuberuntime_manager.go:596] [455-31723] computePodActions got {KillPod:false CreateSandbox:false SandboxID:847f1b4fde106f45086bb12e3080da3eb7e10bb270e0a81bfd44b6f06851a469 Attempt:0 NextInitContainerToStart:nil ContainersToStart:[1] ContainersToKill:map[]} for pod "random-podname_somens(ffb97fc8-a33d-11e9-936b-6e7d8bb0ce00)"

While the "created" containerB became "running" and a subsequent list showed three running containers,

[2019-07-10T18:14:27.042395926Z] [DEBG] [github.com/kubernetes-incubator/cri-o/server/container_start.go:14] [455-77431] StartContainerRequest &StartContainerRequest{ContainerId:9d354614f0f0fe7d1978c664b37caeaa5ded561cfc4440160c8db9b1684f6d5b,}

[2019-07-10T18:14:27.420343830Z] [DEBG] [github.com/golang/glog/kubelet.go:187] [455-215] Event(v1.ObjectReference{Kind:"Pod", Namespace:"somens", Name:"random-podname", UID:"ffb97fc8-a33d-11e9-936b-6e7d8bb0ce00", APIVersion:"v1", ResourceVersion:"abce0cd0a589cbb9f7ef2fd51ae8801afd1dced2", FieldPath:"spec.containers{containerB}"}): type: 'Normal' reason: 'Started' Started container

...

[2019-07-10T18:14:32.348666847Z] [DEBG] [github.com/kubernetes-incubator/cri-o/server/container_list.go:127] [455-77626] ListContainersResponse: &ListContainersResponse{Containers:[
&Container{Id:65bae21c0a7f80856bf238fabaad144ba7c0cc17ed42d0af192962f56023e7fd,PodSandboxId:847f1b4fde106f45086bb12e3080da3eb7e10bb270e0a81bfd44b6f06851a469,Metadata:&ContainerMetadata{Name:containerB,Attempt:1,},Image:<>:CONTAINER_RUNNING,CreatedAt:1562782471135212589,Labels:map[string]string{io.kubernetes.container.name: containerB,io.kubernetes.pod.name: random-podname,io.kubernetes.pod.namespace: somens,io.kubernetes.pod.uid: ffb97fc8-a33d-11e9-936b-6e7d8bb0ce00,},Annotations:map[string]string{io.kubernetes.container.hash: 4876f6e7,io.kubernetes.container.ports: [{"name":"xxx","containerPort":12106,"protocol":"TCP"}],io.kubernetes.container.restartCount: 1,io.kubernetes.container.terminationMessagePath: /dev/termination-log,io.kubernetes.container.terminationMessagePolicy: File,io.kubernetes.pod.terminationGracePeriod: 5,},}

&Container{Id:9d354614f0f0fe7d1978c664b37caeaa5ded561cfc4440160c8db9b1684f6d5b,PodSandboxId:847f1b4fde106f45086bb12e3080da3eb7e10bb270e0a81bfd44b6f06851a469,Metadata:&ContainerMetadata{Name:containerB,Attempt:0,},Image:<>,State:CONTAINER_RUNNING,CreatedAt:1562782467008475373,Labels:map[string]string{io.kubernetes.container.name: containerB,io.kubernetes.pod.name: random-podname,io.kubernetes.pod.namespace: somens,io.kubernetes.pod.uid: ffb97fc8-a33d-11e9-936b-6e7d8bb0ce00,},Annotations:map[string]string{io.kubernetes.container.hash: 4876f6e7,io.kubernetes.container.ports: [{"name":"xxx","containerPort":12106,"protocol":"TCP"}],io.kubernetes.container.restartCount: 0,io.kubernetes.container.terminationMessagePath: /dev/termination-log,io.kubernetes.container.terminationMessagePolicy: File,io.kubernetes.pod.terminationGracePeriod: 5,},}

&Container{Id:384900eade0d80664b45694a34240a00f5b309a1cc7a89918038d45da5e103f9,PodSandboxId:847f1b4fde106f45086bb12e3080da3eb7e10bb270e0a81bfd44b6f06851a469,Metadata:&ContainerMetadata{Name:containerA,Attempt:0,},Image:<>,State:CONTAINER_RUNNING,CreatedAt:1562782353396385066,Labels:map[string]string{io.kubernetes.container.name: containerA,io.kubernetes.pod.name: random-podname,io.kubernetes.pod.namespace: somens,io.kubernetes.pod.uid: ffb97fc8-a33d-11e9-936b-6e7d8bb0ce00,},Annotations:map[string]string{io.kubernetes.container.hash: af168545,io.kubernetes.container.restartCount: 0,io.kubernetes.container.terminationMessagePath: /dev/termination-log,io.kubernetes.container.terminationMessagePolicy: File,io.kubernetes.pod.terminationGracePeriod: 5,},}],}

What you expected to happen:
I found, #41382, but I don't suppose kuberuntime was going to use that. Not sure of the clear solution here. It seems a little aggressive to kill "created" containers but I am missing how would they get cleaned up if they are never started.

How to reproduce it (as minimally and precisely as possible):
Requires just a pod create to hit this exact race

Anything else we need to know?:
Using k8s/cri-o 1.11.1 with a custom API server.

Environment:

  • Kubernetes version (use kubectl version): 1.11.1
  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release): Linux
  • Kernel (e.g. uname -a): 4.9.37
  • Install tools:
  • Network plugin and version (if this is a network-related bug):
  • Others:
@fahadullah

This comment has been minimized.

Copy link
Author

commented Jul 11, 2019

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node and removed needs-sig labels Jul 11, 2019

@mattjmcnaughton

This comment has been minimized.

Copy link
Contributor

commented Jul 11, 2019

Hi @fahadullah ! Thanks for reporting :)

Currently, 1.11.1 is not a supported k8s release. Would it be possible for you to upgrade to a supported release (i.e. 1.13, 1.14, or 1.15)? I think there's a chance that your issue is already solved in one of those releases, and even if not, it'll be easier for community members to help you debug if you're on a supported release.

Thanks :)

@fahadullah

This comment has been minimized.

Copy link
Author

commented Jul 11, 2019

@mattjmcnaughton Looking through the code in master, behaviour seems unchanged but I can give a try on a newer release. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.