-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a TTL for Pods on workloads other than Jobs #122187
Comments
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/sig node |
/cc |
I think Deployments, DaemonSets and orphan pods. CronJob uses the Job Template and I know you can use the Job |
WDYT about this? I am also a bit confused on who owns the ttlafterfinished controller. Is that sig-apps? I know that since this would require a new pod field, sig-node should also be involved. |
I had a brief chat with @kannon92 and I have capacity to help with this issue |
I think we should float this idea and see if there is interest in it. |
Well, I'm interested in it :).
I'm confused how deployments/dameonsets can benefit from this, do they ever complete? |
I also would like to understand the use case better, given that most Pods are long running, unless they are Jobs. |
The case where I see users getting into this is during node shutdowns, draining nodes, etc. You are correct that this shouldn't impact users. For example, #122122 (comment) is one area where this could benefit users. Another one from the slack link:
|
In that case, it might be preferred to make it a gc setting, rather than a per-pod API. |
True, I see that this change requires admin rights and it doesn't really allow one to tune based on their workloads. But I don't know if its pressing. |
https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/statefulset/stateful_set_control.go#L386-L400 |
The main issue for this is a design decision for Set based app workloads. StatefulSet and DaemonSet require that there are no duplicate pods. Deployment could allow this as there is no uniqueness guarantee for the pod name. |
I see. |
We do not have completed pods in deployments as we enforce In general, the feature might be useful for other workloads (custom controllers, or plain pods) that leave their pods behind. |
I'd like this for plain pods as well, currently I'd have to use a job (which makes it more complex to wait for the pod to be ready to stream the logs) to ensure a completed pod gets cleaned up. |
And are you using bare Pods (no Deployment or anything else)? Why not use a Job for your workload? |
This will be very helpful in following scenario too:
ref: #116376 |
I have a specific use case to address: Our current setup uses Jupyter Enterprise Gateway on Kubernetes, initiating kernels within plain pods. This system enables our data scientists to efficiently carry out their tasks on these kernels. The Jupyter Enterprise Gateway comes equipped with a culling mechanism designed to remove idle kernels after a default inactive period of 3600 seconds. Unfortunately, we have encountered issues (mostly network-related) during the culling process. Specifically, the kernel gets deleted, but the associated pod persists. When this happens, those orphan pods are hard to detect, and may last forever in the cluster. I know that this scenario is somewhat uncommon, but it would greatly enhance our system if Kubernetes could introduce a designated type of pod that undergoes GC after a specified period of inactivity, particularly in terms of network activity. |
Very useful suggestion by @edwardzjl . We should see to try this GC config out. Thanks. |
can we get this triaged and start work on it? |
I suggest you attend a SIG Apps meeting to present a proposal. |
What would you like to be added?
For batch users, they are able to specify a
ttlSecondsAfterFinished
for Pods of a Job. This means that when the Job is complete, pods will be GC after a specified time.In TTL-KEP, there was a mention to adapt TTLAfterFinished for Pods. The future work has some details on what work would be needed to extend the TTLAfterFinished Controller for other pods.
The controller that handles this is located here.
Why is this needed?
Generally, PodGC is handled at the cluster level (for objects other than Jobs) and there has been some requests to set this on certain workloads. It would be nice to have a way for Pods or other sig-apps workloads to be GC if the pods are complete. When pods are terminated, they are left and they are only GC with
--terminated-pod-gc-threshold
. One issue with cluster settings is not all users have access to let this and tune on their workloads. See aws/containers-roadmap#1544 for an example.https://kubernetes.slack.com/archives/C0BP8PW9G/p1701683843554669 is another example.
The text was updated successfully, but these errors were encountered: