New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scale down a deployment by removing specific pods (PodDeletionCost) #2255
Comments
/sig apps |
@annajung @JamesLaverack james, you mentioned in the sig-apps slack channel that this enhancement is at risk, can you clarify why? it meets the criteria. |
@ahg-g Just to follow up here too, we discussed in Slack and this was due to a delay in reviewing. We've now marked this as "Tracked" on the enhancements spreadsheet for 1.21. Thank you for getting back to us. :) |
Hi @ahg-g, Since your Enhancement is scheduled to be in 1.21, please keep in mind the important upcoming dates:
As a reminder, please link all of your k/k PR(s) and k/website PR(s) to this issue so we can track them. Thanks! |
done. |
Hi @ahg-g Enhancements team is currently tracking the following PRs As this PR is merged, can we mark this enhancement complete for code freeze or do you have other PR(s) that are being worked on as part of the release? |
Hi @JamesLaverack , yes the k/k code is merged, docs PR still open though. |
/stage beta |
/milestone v1.22 |
@ahg-g can you clarify if the plan is still to promote the current "Pod Deletion Cost" to GA? Personally, I think we should leave it at Beta for now and consider implementing something like I proposed in #2255 (comment), I am really uncomfortable with marking the current implementation as GA and supporting it forever. |
I am not pursuing the graduation to GA this cycle because we need to resolve the concerns posted here.
This still assumes that the information is local, I am not sure that is always the case.
I don't think we can have the ReplicaSet controller calling each pod to get the costs on every scale down (if that is what you meant), this is neither scalable nor a proper design pattern for controllers. I buy tim and kal's argument for needing to have native way for the pod to declare its cost, but I think we need that to propagate to the pod status for controllers to act on it. I am wondering how does that compare to liveness checks, @thockin do we update the pod status on each liveness check? |
If a use-case needs non-local information, the application designer can use some other method to make the application running in the Pod aware of it, and then expose the calculated "cost" in its the
@ahg-g there are two obvious protection methods for this:
I still think storing the "cost" anywhere is a mistake because the only time the "cost" needs to be correct is just before a Pod is removed. Therefore, it makes very little sense to store this cost anywhere, as it's immediately out of date, irrespective of whether it's in a status OR annotation. Further, there is going to be some level of overhead to check the "cost" of a Pod, and this goes to waste if we arent going to scale down the number of replicas in the near future. (This also raises the question of "cost" probes which take a long time, but I think my suggestion above for a timeout addresses this neatly) |
looks like we are simplifying one case at the expense of the other, but may be this simplifies the most common case, so I will have to get back to this with a more concrete example.
This doesn't change the asymptotic complexity; the replicaset controller still have to call on the order of number of pods before each scale down. Having the control plane calling into the workloads in general isn't something we consider scalable; @wojtek-t and @liggitt if you have an opinion on this.
The most scalable scenario for the ReplicaSet controller is to have the cost set with the pod.
The current approach doesn't suffer from that overhead. The controller that decides when to scale down sets the costs right before issuing a scale down, and it can do that based on metrics exposed by the pods for example. Granted no quite a built in approach and requires building a controller, but it doesn't have this problem. |
@ahg-g (and others watching), I have just written up a pretty comprehensive proposal in kubernetes/kubernetes#107598, that proposes an extension to the Within that proposal, I have fixed your issues with my "cost probe" idea from #2255 (comment) in two ways:
cc @liggitt @khenidak @thockin @JamesLaverack @SwarajShekhar @reylejano |
Hi All, I have now raised KEP-3189, which takes lots of the discussion from my proposal in kubernetes/kubernetes#107598, and simplifies it into a single change. Extend Deployments/ReplicaSets with a |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
/remove-lifecycle rotten |
Hey all watching! After thinking more about how we can make I still need to write up a full proposal and KEP, but my initial thoughts can be found at: The gist of the idea is that we can make |
Many thanks for all your works and reflections on this subject, I'm strongly interest in the capacity to choose pods to be evicted during scale in and I try to follow up corresponding discussions and feature developments or proposals. I searched for a long time how to achieve this correctly, I was happy with the PodDeletionCost but now I am a little disappointed as it seems that it will stay a beta (please do not remove this feature until an equivalent one is released). My need (which is maybe different of yours) is to selectively evict or replace terminated pods to keep a dynamic number of fresh pod replicas without terminating potentially running pods (I mean pods running applications currently processing something). I may be wrong, but I think the root cause of the problem is the incompatibility between the automatic pod restart and the scale-in features. Without PodDeletionCost, one known workaround is to:
For me this workaround speaks in favor the incompatibility between ReplicaSet and scale-in features to select pods to be evicted : currently that cannot work when mixed together. Also I think one should avoid any controller to terminate a pod, it should be the application inside the pod that terminates, implying the its pod to terminate, then a controller could evict only already terminated pod. Here is my proposal :
With these behaviors, scale-in will select pods to be evicted based on the inside pod applications termination status (here Succeeded or Failed) instead of external indicators. If this proposal is acceptable and can work it is maybe achievable with a minimal coding effort. What do you think ? |
Maybe my need is different because I need to automatically replace or delete terminated pods. I think there are two cases to distinguish during scale-in: the capacity to remove terminated pods from replicaset (without replacing them, which imply a ReplicaSet restartPolicy different than Always), and the capacity to remove running pods (using the probe). |
/milestone clear |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
@ahg-g I'm not in love with annotations as APIs. Do we REALLY think this is the best answer? |
I think we have a reasonable counter proposal in kubernetes/kubernetes#107598 (comment); can we hold this in its current beta state until that proposal makes progress? |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
Enhancement Description
k/enhancements
) update PR(s): https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/2255-pod-costk/k
) update PR(s): Implements pod deletion cost kubernetes#99163k/website
) update PR(s): ReplicaSet pod-deletion-cost annotation website#26739k/enhancements
) update PR(s): Promote PodDeletionCost to Beta #2619k/k
) update PR(s): Graduate PodDeletionCost to Beta kubernetes#101080, Integration test for pod deletion cost feature kubernetes#101003k/website
) update(s): PodDeletionCost to Beta website#28417Please keep this description up to date. This will help the Enhancement Team to track the evolution of the enhancement efficiently.
The text was updated successfully, but these errors were encountered: