-
Notifications
You must be signed in to change notification settings - Fork 39k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow CNI cmdDel processing causes infra container to be deleted prematurely #89440
Comments
/sig network Attention: @dcbw @pmorie (from git blame and general guidance) |
/sig node |
Duplicate? #88543 |
It sounds like this is a dup of #88543. Closing in favor of that one. |
@thockin This issue addresses "make sure pause container is alive while cni is detaching devices from pause container" I tested the PR #89667 and I observe that pause container is deleted before cni can delete all the devices. It does not fix this issue. |
@thockin This is a different issue (sounds similar, I agree). This one is critical: pod spec's containers are deleted, infra container gets deleted but CNI may still be working. The desire is not to remove the infra container until CNI is done. This addresses a different problem but will automatically address issue #88543. @kmala Your opinion will be useful here. |
The issue i fixed #89667 is different from this and i looked into the PR #89541 and it is not the correct fix for my issue because it introduces a grace period to update the status of pod to status manager and after that grace period pod will be deleted irrespective of the network resources getting removed. |
/reopen |
@BSWANG: You can't reopen an issue/PR unless you authored it or you are a collaborator. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen ping @BSWANG , any details? |
|
/cc |
What happened: When a pod is deleted, it calls cmdDel on multus which in turn calls sriov-cni plugin multiple times for all the interfaces. This process takes time, but in the middle of it the infra container gets killed - and network namespace gets deleted.
The result of removing network namespace before cmdDel is done is that proper cleanup does not happen (and in sriov's case, the renaming isn't done and addresses get squandered).
What you expected to happen: Network namespace should continue to exist until CNI plugin is done with the cmdDel command. i.e. do not kill infra container out of band.
How to reproduce it (as minimally and precisely as possible):
Take a CNI plugin (sriov-cni in our case) and modify its DEL command and put a time.Sleep command with a large number (10 seconds?). See that the infra container gets killed before cmdDel is done.
Anything else we need to know?:
This likely belongs to this code block: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet_pods.go#L1041
where pods whose containers are dead, but infra container is still alive because CNI is working on it, a PodCleanup GC kicks in and kills the infra container too.
Suggestion: A fix where filtering out of pods based on termination status should also have some grace period. Likely here: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet_pods.go#L987
The grace period should be at least more than the GC kick-in periodic interval.
Environment: Master and previous versions
The text was updated successfully, but these errors were encountered: