Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow CNI cmdDel processing causes infra container to be deleted prematurely #89440

Closed
rajatchopra opened this issue Mar 24, 2020 · 7 comments · May be fixed by #89541
Closed

Slow CNI cmdDel processing causes infra container to be deleted prematurely #89440

rajatchopra opened this issue Mar 24, 2020 · 7 comments · May be fixed by #89541

Comments

@rajatchopra
Copy link
Contributor

@rajatchopra rajatchopra commented Mar 24, 2020

What happened: When a pod is deleted, it calls cmdDel on multus which in turn calls sriov-cni plugin multiple times for all the interfaces. This process takes time, but in the middle of it the infra container gets killed - and network namespace gets deleted.
The result of removing network namespace before cmdDel is done is that proper cleanup does not happen (and in sriov's case, the renaming isn't done and addresses get squandered).

What you expected to happen: Network namespace should continue to exist until CNI plugin is done with the cmdDel command. i.e. do not kill infra container out of band.

How to reproduce it (as minimally and precisely as possible):
Take a CNI plugin (sriov-cni in our case) and modify its DEL command and put a time.Sleep command with a large number (10 seconds?). See that the infra container gets killed before cmdDel is done.

Anything else we need to know?:
This likely belongs to this code block: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet_pods.go#L1041
where pods whose containers are dead, but infra container is still alive because CNI is working on it, a PodCleanup GC kicks in and kills the infra container too.

Suggestion: A fix where filtering out of pods based on termination status should also have some grace period. Likely here: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet_pods.go#L987
The grace period should be at least more than the GC kick-in periodic interval.

Environment: Master and previous versions

@rajatchopra

This comment has been minimized.

Copy link
Contributor Author

@rajatchopra rajatchopra commented Mar 24, 2020

/sig network

Attention: @dcbw @pmorie (from git blame and general guidance)
cc @blackgold (for investigating the issue)
cc @kubernetes/sig-network-bugs (for help in triaging)

@k8s-ci-robot k8s-ci-robot added sig/network and removed needs-sig labels Mar 24, 2020
@tedyu

This comment has been minimized.

Copy link
Contributor

@tedyu tedyu commented Mar 24, 2020

/sig node

@uablrek

This comment has been minimized.

Copy link
Contributor

@uablrek uablrek commented Mar 25, 2020

Duplicate? #88543

@thockin

This comment has been minimized.

Copy link
Member

@thockin thockin commented Apr 2, 2020

It sounds like this is a dup of #88543. Closing in favor of that one.

@thockin thockin closed this Apr 2, 2020
@blackgold

This comment has been minimized.

Copy link

@blackgold blackgold commented Apr 2, 2020

@thockin
I think the issue#88543 addresses "make sure that all the networking resources are deleted before removing the pod from the apiserver."

This issue addresses "make sure pause container is alive while cni is detaching devices from pause container"

I tested the PR #89667 and I observe that pause container is deleted before cni can delete all the devices. It does not fix this issue.

@rajatchopra

This comment has been minimized.

Copy link
Contributor Author

@rajatchopra rajatchopra commented Apr 7, 2020

@thockin This is a different issue (sounds similar, I agree).
The other one deals with: containers are deleted, volumes are cleaned up but cni may still be working. The desire is to not to remove the pod sandboxes in runtime until CNI is done.

This one is critical: pod spec's containers are deleted, infra container gets deleted but CNI may still be working. The desire is not to remove the infra container until CNI is done.

This addresses a different problem but will automatically address issue #88543.

@kmala Your opinion will be useful here.

@kmala

This comment has been minimized.

Copy link

@kmala kmala commented Apr 8, 2020

The issue i fixed #89667 is different from this and i looked into the PR #89541 and it is not the correct fix for my issue because it introduces a grace period to update the status of pod to status manager and after that grace period pod will be deleted irrespective of the network resources getting removed.
I looked into the issue description and my understanding is that your issue is because of the deletion of network namespace and not deletion of pod sandbox and based on the this comment in docker shim https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/dockershim/docker_sandbox.go#L252-L257 i assume that CNI should do its best to clean up on an empty network namespace but again i am not sure if that is something generalized to all CNI plugins/CRI plugins.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

7 participants
You can’t perform that action at this time.