-
Notifications
You must be signed in to change notification settings - Fork 39.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
grace-period=0 --force Forced deletion failed #113717
Comments
/sig node I have doubts these commands are preemptive and the second call will stop the process initiated by the first
|
|
yeah, sorry I didn't read it in detail :). Indeed, in 1.22 there was a big refactor and cleanup on the kubelet and some releases notes were amended regarding the pod lifecycle #107348 This have some consequences as the ones you are highlighting, sig-node is the owner of this area, I know they are addressing some issues related to your question latey |
It is true that there are many similar problems, but so far have not seen the official development team to solve, and have not seen the correct PR |
yeah, maybe this is no supported, I just was giving you context so you can check this is not a duplicate or a known issue, then you have to wait for SIG node to triage this issue ... |
Okay, thank you very much |
on what version? 1.22 is almost out of support and earlier versions are out of support |
/triage needs-information |
I was on version 1.22 was changed. This feature is not supported from 1.22. 1.19 and 1.20 and 1.21 support this feature. I also looked at the code for version 1.23 and it is also not supported. |
/remove-triage needs-information I understand the problem now. This seems to be a change of behavior starting 1.22. The documentation https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination-forced is warning that the force deletion on a node may take some time. Are there any implications of this behavior beyond the prolonged life of a deleted pod? Would it be possible for you to try to schedule something on that node during the termination? Does it cause pod admission failure? |
Yes, a lot of changes were made to the pod_workers.go file after version 1.22. I also looked at the code comparison from version 1.19 to 1.25. When forcing deletion (grace period 0). apiserver will immediately delete pod information from etcd. But kubelt does not accept requests with a grace period of 0. If the grace period is 0, the default value (usually 30s) is used as the time to interact with CRI. That's why the forced deletion fails. |
/assign @Seaiii since you are working on it |
There is some confusion in this bug: Terminating a pod with grace period 0 does not mean "stop immediately" - it means "force delete the pod, and expect that all other parts of the system will try to clean the pod up as quickly as possible". However, because you remove the pod, you have nowhere to wait to see when the pod is deleted. Based on your description, if you want the pod to be deleted as quickly as possible and to know when the pod is done, you should:
Asking for a grace period of one means "delete as fast as possible" - the kubelet reserves the right to not send SIGKILL immediately (we sent TERM and then KILL a second later). The Kubelet has never issued SIGKILL to a process without some wait (even during eviction). When we first implemented grace period deletion, we sent requests asynchronously to docker, so about 7 years ago the logic was correct that if you sent 30, then 1, it would shorten and try to kill right away. However, in the transition to CRI we lost that behavior because we moved to a synchronous event loop (for good reasons). At the current time, the Kubelet is unable to shorten duration of grace period for a number of reasons. We're currently working on improving that, but it is dependent on: #107826 - to allow us to cancel wait loops cleanly with a context #107826 requires some additional review but is the key blocker. Once we have that, we'll gradually work through the blockers until grace periods behave like expected. But for now, use grace-period=1 rather than 0 - 0 is the same as 1 but you have no idea when the pod is done. No one should force delete pods unless they meet the criteria of "the node is dead and not coming back". |
/assign |
You are correct in this interpretation. |
As I know this is a potential security issue(not sure if this needs a CVE number) when the user gives a very large graceful period and then forces delete with --grace-period=0 to remove it on the apiserver side. In this case, the pod will be removed in kube cluster admin view, but indeed kubelet will not remove it until the graceful period ends. If I do this multi times, there will be many |
@SergeyKanzhelev Hi, The current 1.29 version doesn't deal with this bug yet, should we move forward with it? |
What happened?
A case where Forced deletion does not stop! (Example: container is in sleep 9999 state) when I first kubectl delete pod --grace-perios=30 and then when kubectl delete pod --grace-perios=0 -force. The second time the force delete will not have any effect on the kubelet. According to the latest documentation, gracePeriod, when set to 0 and force, triggers the kubelet to start cleaning up immediately. But looking at the source code, I found that if gracePeriod=0 and force. kubelt triggers the handler.HandlePodRemoves function and then goes to the kl.deletePod function, which is correct. But in the real processing, the UpdatePod function is used (I looked at several versions of the code before 1.22 and they all use the deletePod function). When entering the update, there is no judgment that gracePeriod=0 and force. All as a normal updatePod to deal with, which also led to, the second gracePeriod = 0 was treated as a normal update, stored in the lastUndeliveredWorkUpdate cache. And the first delete because of the need to interact with the CRI, need to wait for the CRI to return the results, so you need to wait for the first 30s processing success before processing this forced deletion. This is not in line with the documentation logic and code logic. There should be gracePeriod=0 to handle the logic code of forced delete. Here is the source code flow. 1.kubelet.go handler.HandlePodRemoves(u.Pods)
2.kubelet.go deletepod
3.kubelet.go kl.podWorkers.UpdatePod()
4. pod_workers.go
The code that follows is the gracePeriod time that is waited for when the kill container is in place. I looked at the code in version 1.19 and found that after the HandlePodRemoves function is triggered, there are two steps. 1. Close the podupdate channel and remove the lastUndeliveredWorkUpdate. 2. Open the goroutine to call CRI to remove the container. The code after version 1.22 does not have this step of the operation, that is, the abandonment of gracePeriod = 0 and force forced deletion, this problem should be marked in the documentation, but I did not see. I changed the source code locally and tested it to achieve force deletion, is this a bug and do I need to pull request?
What did you expect to happen?
After the first grace-period=30, the second grace-period=0 should be deleted immediately.
Now the situation is that apiserver will delete the pod, but kubelet will not delete the container
How can we reproduce it (as minimally and precisely as possible)?
Start a container
1.docker run name sh -c "sleep 99999"
2.kubectl delete pod name --grace-period=30
3.kubectl delete pod name --grace-period=0 --force
The third process will not execute successfully
Anything else we need to know?
no
Kubernetes version
Cloud provider
OS version
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
The text was updated successfully, but these errors were encountered: