-
Notifications
You must be signed in to change notification settings - Fork 38.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto-delete failed Pods #99986
Comments
/sig node |
/cc |
1 similar comment
/cc |
Additional consideration here - have a different thresholds for different failure reasons. In case of pods terminated because of node graceful termination - pods will likely be less interesting for troubleshooting. See #102820 |
/triage accepted |
@ctrlzhang: The label In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/triage accepted |
/remove-kind support I don't think this is a support request, I think this is a feature request for making this threshold configurable. /kind feature |
The threshold is actually configurable. The problem is that setting it low potentially makes Jobs unusable. |
Not sure if this is the best place, but we're being affected by this topic and I'd like to to add some context. GKE users making use of preemptible nodes, which have a lifetime of ~24 hours, are being affected by the change in graceful node shutdown behavior introduced in 1.20.5-gke.500 in that pods scheduled on preemptible nodes that have been shut down do not get deleted. As issue #102820 notes, I was indeed confused by this behavior as the pods and nodes are working as intended, yet the pods are considered "failed". This is also causing such nodes themselves to not get deleted and I've seen new pods get scheduled onto them as well. |
I'm in the same boat as @schellj, besides the |
Note that if we deliver a solution just for shutdown pods in kubernetes, it would arrive at 1.24 at the earliest (it's a new feature and needs to go through the KEP process). By then, the Job API will be fixed and we can already lower the threshold. |
@alculquicondor Understood. At least in my mind, having a lower gc threshold doesn't entirely solve the issue with shutdown pods as those are pods that have behaved as intended and shouldn't be considered failed and kept around in the first place. |
cc @bobbypage to answer why the pods are not simply deleted. |
this statement is questionable. We just don't know in some cases. If it was a job that hasn't finished yet, it is failed, even with the clean termination. If the pod failed to clearly terminate, we also want to keep information about that pod around. It is clear that pods that were terminated with the graceful termination are "happier" then ones that crashed on their own. But it is clearly not 100% expected behavior in a general sense. That's said, better cleanup rules might be beneficial here for sure. |
The fixed Job controller will consider any pod deletion a failure, even after the pod is completely removed from the API. |
Thanks @schellj for the feedback. We are looking into the behavior for handling pods on shutdown. In the original design of the graceful node shutdown KEP it was decided to not explicitly delete pods, but rather put pods into failed phase. This followed the pattern for kubelet evictions and also made it possible for users to see why their pods were terminated. If they were deleted, pods could appear to vanish suddenly without explanation why which could also appear to be confusing. We are seeing if this makes sense as behavior long term.
I'm not super clear what you're referring to here... what mechanism are you suggesting was used to delete pods prior to 1.20.5? |
@bobbypage I'm not entirely sure, but I'm guessing that on GKE prior to 1.20.5-gke.500, there was a different process that would cordon and drain their preemptible nodes. |
Having a terminated-pod-gc-expiration configuration option in conjunction with the terminated-pod-gc-threshold would work for us, then we can still debug the failed ones but they go away after some time. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close not-planned |
@k8s-triage-robot: Closing this issue, marking it as "Not Planned". In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@pcj: You can't reopen an issue/PR unless you authored it or you are a collaborator. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@zorgzerg: You can't reopen an issue/PR unless you authored it or you are a collaborator. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Overview
Currently, the only mechanism to auto-delete failed pods is the garbage collection. However, the default threshold is incredibly high - 12500[1]. Therefore it is useless in practice for most of customers.
There are circumstances that pods become failed due to a k8s bug. I have seen two cases so far: "Predicate NodeAffinity failed" pods[2] and OutOfCPU pods[3].
So my question is about if we want to make some components to auto-delete failed pods, like some controllers or even kubelet.
Additional Context
Updates
04/12/2021
A user has to implement a "watcher" to periodically detect and delete failed pods. That is bad because k8s should be able to take that responsibility.
The text was updated successfully, but these errors were encountered: