grace-period=0 --force Forced deletion failed #113717

Seaiii · 2022-11-08T08:22:19Z

What happened?

A case where Forced deletion does not stop! (Example: container is in sleep 9999 state) when I first kubectl delete pod --grace-perios=30 and then when kubectl delete pod --grace-perios=0 -force. The second time the force delete will not have any effect on the kubelet. According to the latest documentation, gracePeriod, when set to 0 and force, triggers the kubelet to start cleaning up immediately. But looking at the source code, I found that if gracePeriod=0 and force. kubelt triggers the handler.HandlePodRemoves function and then goes to the kl.deletePod function, which is correct. But in the real processing, the UpdatePod function is used (I looked at several versions of the code before 1.22 and they all use the deletePod function). When entering the update, there is no judgment that gracePeriod=0 and force. All as a normal updatePod to deal with, which also led to, the second gracePeriod = 0 was treated as a normal update, stored in the lastUndeliveredWorkUpdate cache. And the first delete because of the need to interact with the CRI, need to wait for the CRI to return the results, so you need to wait for the first 30s processing success before processing this forced deletion. This is not in line with the documentation logic and code logic. There should be gracePeriod=0 to handle the logic code of forced delete. Here is the source code flow. 1.kubelet.go handler.HandlePodRemoves(u.Pods)

2.kubelet.go deletepod

3.kubelet.go kl.podWorkers.UpdatePod()

4. pod_workers.go

The code that follows is the gracePeriod time that is waited for when the kill container is in place. I looked at the code in version 1.19 and found that after the HandlePodRemoves function is triggered, there are two steps. 1. Close the podupdate channel and remove the lastUndeliveredWorkUpdate. 2. Open the goroutine to call CRI to remove the container. The code after version 1.22 does not have this step of the operation, that is, the abandonment of gracePeriod = 0 and force forced deletion, this problem should be marked in the documentation, but I did not see. I changed the source code locally and tested it to achieve force deletion, is this a bug and do I need to pull request?

What did you expect to happen?

After the first grace-period=30, the second grace-period=0 should be deleted immediately.
Now the situation is that apiserver will delete the pod, but kubelet will not delete the container

How can we reproduce it (as minimally and precisely as possible)?

Start a container
1.docker run name sh -c "sleep 99999"
2.kubectl delete pod name --grace-period=30
3.kubectl delete pod name --grace-period=0 --force
The third process will not execute successfully

Anything else we need to know?

no

Kubernetes version

$ kubectl version 1.22.2
# paste output here

Cloud provider

null

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

aojea · 2022-11-08T08:40:28Z

/sig node

I have doubts these commands are preemptive and the second call will stop the process initiated by the first

2.kubectl delete pod name --grace-period=30
3.kubectl delete pod name --grace-period=0 --force

Seaiii · 2022-11-08T08:44:57Z

I read the source code, it won't, it's executed sequentially, when the second request comes in, it will determine if the first execution is complete, if it's still being executed, it will be stored in the local cache and finished. Only after the first execution succeeds, it will go to the local cache to get the second request and execute it again. But by this time, the pod and container have already been finished by kill, and there is no need for them anymore.
According to the code logic of older versions, --grace-period=0 -force would be handled specially and the forced stop would be executed immediately, but not after version 1.22

aojea · 2022-11-08T09:02:35Z

yeah, sorry I didn't read it in detail :).

Indeed, in 1.22 there was a big refactor and cleanup on the kubelet and some releases notes were amended regarding the pod lifecycle #107348

This have some consequences as the ones you are highlighting, sig-node is the owner of this area, I know they are addressing some issues related to your question latey

#113145

Seaiii · 2022-11-09T01:21:25Z

It is true that there are many similar problems, but so far have not seen the official development team to solve, and have not seen the correct PR

aojea · 2022-11-09T11:04:36Z

It is true that there are many similar problems, but so far have not seen the official development team to solve, and have not seen the correct PR

yeah, maybe this is no supported, I just was giving you context so you can check this is not a duplicate or a known issue, then you have to wait for SIG node to triage this issue ...

Seaiii · 2022-11-09T11:26:56Z

It is true that there are many similar problems, but so far have not seen the official development team to solve, and have not seen the correct PR

Okay, thank you very much

SergeyKanzhelev · 2022-11-09T18:41:23Z

I changed the source code locally and tested it to achieve force deletion, is this a bug and do I need to pull request?

on what version? 1.22 is almost out of support and earlier versions are out of support

SergeyKanzhelev · 2022-11-09T18:41:31Z

/triage needs-information

Seaiii · 2022-11-10T01:09:00Z

I changed the source code locally and tested it to achieve force deletion, is this a bug and do I need to pull request?

on what version? 1.22 is almost out of support and earlier versions are out of support

I was on version 1.22 was changed. This feature is not supported from 1.22. 1.19 and 1.20 and 1.21 support this feature. I also looked at the code for version 1.23 and it is also not supported.

SergeyKanzhelev · 2022-12-09T17:43:24Z

/remove-triage needs-information
/triage accepted
/priority important-longterm

I understand the problem now. This seems to be a change of behavior starting 1.22. The documentation https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination-forced is warning that the force deletion on a node may take some time.

Are there any implications of this behavior beyond the prolonged life of a deleted pod? Would it be possible for you to try to schedule something on that node during the termination? Does it cause pod admission failure?

Seaiii · 2022-12-10T02:36:21Z

/remove-triage needs-information /triage accepted /priority important-longterm

I understand the problem now. This seems to be a change of behavior starting 1.22. The documentation https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination-forced is warning that the force deletion on a node may take some time.

Are there any implications of this behavior beyond the prolonged life of a deleted pod? Would it be possible for you to try to schedule something on that node during the termination? Does it cause pod admission failure?

Yes, a lot of changes were made to the pod_workers.go file after version 1.22. I also looked at the code comparison from version 1.19 to 1.25. When forcing deletion (grace period 0). apiserver will immediately delete pod information from etcd. But kubelt does not accept requests with a grace period of 0. If the grace period is 0, the default value (usually 30s) is used as the time to interact with CRI. That's why the forced deletion fails.
No other effects, I read the code in detail. Because the delete command has been initiated (CRI), the cached information of the pod has been deleted in the kubelet. pods will be accepted for creation again, but need to wait for the previous pod to be deleted successfully.
But this really doesn't follow the normal logic, because APISERVER and kubelet are no longer synchronized at this time.

SergeyKanzhelev · 2022-12-14T18:52:03Z

/assign @Seaiii since you are working on it

Seaiii · 2022-12-15T01:39:48Z

/assign @Seaiii since you are working on it
Yes, I can fix it. But wait for the PR #113883 to finish.because I don't have permission to launch two PRs at the same time.
please help to advance the progress of PR, thank you very much!

smarterclayton · 2023-02-14T16:34:51Z

There is some confusion in this bug:

Terminating a pod with grace period 0 does not mean "stop immediately" - it means "force delete the pod, and expect that all other parts of the system will try to clean the pod up as quickly as possible". However, because you remove the pod, you have nowhere to wait to see when the pod is deleted. Based on your description, if you want the pod to be deleted as quickly as possible and to know when the pod is done, you should:

kubectl delete pod name --grace-period=30
kubectl delete pod name --grace-period=1

Asking for a grace period of one means "delete as fast as possible" - the kubelet reserves the right to not send SIGKILL immediately (we sent TERM and then KILL a second later). The Kubelet has never issued SIGKILL to a process without some wait (even during eviction).

When we first implemented grace period deletion, we sent requests asynchronously to docker, so about 7 years ago the logic was correct that if you sent 30, then 1, it would shorten and try to kill right away. However, in the transition to CRI we lost that behavior because we moved to a synchronous event loop (for good reasons).

At the current time, the Kubelet is unable to shorten duration of grace period for a number of reasons. We're currently working on improving that, but it is dependent on:

#107826 - to allow us to cancel wait loops cleanly with a context
#113606 - so that we can cancel existing synchronous logic using the contexts
TODO - the kubelet must track when the first termination request was received and shorten others based on that
TODO - clean up context tracking in the pod worker
TODO - clean up the container runtime to properly handle "immediate" shutdown requests (bypass hooks, probably)

#107826 requires some additional review but is the key blocker. Once we have that, we'll gradually work through the blockers until grace periods behave like expected. But for now, use grace-period=1 rather than 0 - 0 is the same as 1 but you have no idea when the pod is done. No one should force delete pods unless they meet the criteria of "the node is dead and not coming back".

smarterclayton · 2023-02-17T14:48:39Z

/assign

Seaiii · 2023-03-01T01:19:56Z

There is some confusion in this bug:

You are correct in this interpretation.
The --grace-period=1 method is much more elegant.
It's just that we can't reduce the grace period time now. As soon as this problem is solved, this issue can be closed

pacoxu · 2023-04-06T02:05:30Z

As I know this is a potential security issue(not sure if this needs a CVE number) when the user gives a very large graceful period and then forces delete with --grace-period=0 to remove it on the apiserver side. In this case, the pod will be removed in kube cluster admin view, but indeed kubelet will not remove it until the graceful period ends. If I do this multi times, there will be many background-running containers in the cluster but no pod in my namespace.

Seaiii · 2024-01-12T01:02:14Z

Are there any implications of this behavior beyond the prolonged life of a deleted pod? Would it be possible for you to try to schedule something on that node during the termination? Does it cause pod admission failure?

@SergeyKanzhelev Hi, The current 1.29 version doesn't deal with this bug yet, should we move forward with it?

Seaiii added the kind/bug Categorizes issue or PR as related to a bug. label Nov 8, 2022

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 8, 2022

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 8, 2022

This was referenced Nov 9, 2022

modify force stop #113777

Closed

Forced deletion failed and the container remains #113778

Closed

k8s-ci-robot added the triage/needs-information Indicates an issue needs more information in order to work on it. label Nov 9, 2022

This was referenced Nov 14, 2022

The second time the pod is deleted the grace period does not take effect #113882

Closed

The second time the pod is deleted the grace period does not take effect #113883

Open

aojea mentioned this issue Dec 15, 2022

pod LifecycleHandler should support a sleep option #114465

Closed

Seaiii mentioned this issue Jan 6, 2023

REQUEST: New membership for Seaiii kubernetes/org#3919

Closed

9 tasks

k8s-ci-robot assigned smarterclayton Feb 17, 2023

smarterclayton mentioned this issue Feb 28, 2023

Kubelet should react more quickly to pod lifecycle transitions #113606

Open

Seaiii mentioned this issue Jan 12, 2024

Fix calculation of pod termination grace period #120451

Closed

HirazawaUi mentioned this issue Feb 26, 2024

force-delete pod execute prestop hook #123408

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

grace-period=0 --force Forced deletion failed #113717

grace-period=0 --force Forced deletion failed #113717

Seaiii commented Nov 8, 2022 •

edited

Loading

aojea commented Nov 8, 2022

Seaiii commented Nov 8, 2022

aojea commented Nov 8, 2022

Seaiii commented Nov 9, 2022

aojea commented Nov 9, 2022

Seaiii commented Nov 9, 2022

SergeyKanzhelev commented Nov 9, 2022

SergeyKanzhelev commented Nov 9, 2022

Seaiii commented Nov 10, 2022

SergeyKanzhelev commented Dec 9, 2022

Seaiii commented Dec 10, 2022

SergeyKanzhelev commented Dec 14, 2022

Seaiii commented Dec 15, 2022 •

edited

Loading

smarterclayton commented Feb 14, 2023

smarterclayton commented Feb 17, 2023

Seaiii commented Mar 1, 2023

pacoxu commented Apr 6, 2023

Seaiii commented Jan 12, 2024 •

edited

Loading

grace-period=0 --force Forced deletion failed #113717

grace-period=0 --force Forced deletion failed #113717

Comments

Seaiii commented Nov 8, 2022 • edited Loading

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

aojea commented Nov 8, 2022

Seaiii commented Nov 8, 2022

aojea commented Nov 8, 2022

Seaiii commented Nov 9, 2022

aojea commented Nov 9, 2022

Seaiii commented Nov 9, 2022

SergeyKanzhelev commented Nov 9, 2022

SergeyKanzhelev commented Nov 9, 2022

Seaiii commented Nov 10, 2022

SergeyKanzhelev commented Dec 9, 2022

Seaiii commented Dec 10, 2022

SergeyKanzhelev commented Dec 14, 2022

Seaiii commented Dec 15, 2022 • edited Loading

smarterclayton commented Feb 14, 2023

smarterclayton commented Feb 17, 2023

Seaiii commented Mar 1, 2023

pacoxu commented Apr 6, 2023

Seaiii commented Jan 12, 2024 • edited Loading

Seaiii commented Nov 8, 2022 •

edited

Loading

Seaiii commented Dec 15, 2022 •

edited

Loading

Seaiii commented Jan 12, 2024 •

edited

Loading