-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubectl drain leads to downtime even with a PodDisruptionBudget #48307
Comments
@gg7 There are no sig labels on this issue. Please add a sig label by: |
@kubernetes/sig-cluster-ops-* /area node-lifecycle |
@kubernetes/sig-apps-bugs |
If you launched your deployment using The PDB selector needs to match the pod's label(s) in order for it to take effect.
|
Single-pod deployments are by definition not HA - k8s can do nothing today about them. Not sure if this is a bug or a feature request. |
@foxish Good point, thanks! I've changed the PDB:
I also added Now
This was executed 5+ minutes ago and there's still a single my-nginx pod running on the cordoned node. |
Kubernetes knows how to deploy single-pod applications with no downtime with a rolling update. I'm not expecting HA in case of a server crash, but I expect |
PDB works! :)
I don't think there is any bug here so we can use this issue as a feature request. I could see having a way to signal deployments to run surge pods and then have PDBs use the new API but I would like to read more thoughts on this. |
If I pass
|
One thing is highly available in case of a hardware problem. Yes, in this case if the machine dies you get downtime if you only have one replica. Server hardware failures are rare enough that we can live with a couple of minutes of downtime when they happen. Another thing is a planned maintenance on a node: it should be fairly simple to make sure extra pods are started elsewhere before we shut down this node for maintenance. I mean, it's not rocket science, is it... |
No assumptions are made regarding availability on single-pod controllers with the API today. What if the extra pod that you want to surge by default violates the quota given to a user? Are we going to block a cluster upgrade on such a scenario? It's unlike that most admins would like that. Opening a proposal in the community repo of an open source project is not science either :) |
Yes, I realise I was a bit condescending in my last remark. Apologies for that.
Well, I would say, at least attempt to create the extra pod; if it fails, it fails, give up after some time, but at least we tried, and it probably will work in most cases. |
Issues go stale after 90d of inactivity. Prevent issues from auto-closing with an If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/close |
Googler stumbling across this with same question as OP, and can't find a work around. Is there a way to accomplish this? I have a scenario that due to ram constraints, dev/test environments can't afford to be max HA and each service would be good with the PDB minAvailable: 1, while deployment and horizontal autoscaler set to 1 replica. |
We are using a custom bash script that implements an alternative to https://gist.github.com/juliohm1978/1f24f9259399e1e1edf092f1e2c7b089 Not a perfect solution, but it really helps when most of your deployments are single Pods with a rollout strategy like the following:
|
I assume this means High Availability. I'm a bit of a noob in this terminology. Wouldn't it be nice to have a kubectl drain --safe where you don't shut down a pod without first having started (and readyness probed) it on another node? Perhaps this is one of those "this feature contradicts in every way the architecture of the system" requests, like my favorite application feature request "making an 'impersonate user' feature can't be that hard, can it?" ... |
So is it expected that deleting a pod or draining a node causes downtime even if you have rolling update in place? |
unfortunately with 1 replica - yes |
Hi, just another random Kubernetes user here. Sad to see that this is still an issue in 2020. |
Agreed, the team should admit that a lot of people only have resources for 1 replica and thus not reaching the treshold of HA. Why can't we cater for them as well? |
Unfortunately, the only workaround is to cordon the node and then do a rolling restart of the deployments that have pods running on the node. Once complete, drain the node. |
A flag to |
Any update on this? This strongly affects GPU workloads which often cannot run with HA due to cost. |
I can see the difficulty for both the kubernetes & autoscaler projects to implement this in a spec consistent way. In the meantime, we have created a little cronjob in k8s that does some hacky bash scripting to automate the otherwise-manual safe-drain script that @juliohm1978 posted above (https://gist.github.com/juliohm1978/fcfd21b26f9431c01978) I've put together a rough gist of our workaround here: https://gist.github.com/DominicWatson/76e393e04e9c65439c3eff948d19e25a This is running in our staging cluster where we have a big need for autoscaling down. As we evolve it and make it more sophisticated, I'll try to update the gist. Feedback and improvements welcome! |
/kind bug
What happened:
I ran a demo application:
Then I defined a
PodDisruptionBudget
:Then I executed
kubectl drain --force --ignore-daemonsets --delete-local-data NODE-1
. (I used--force
becauseNODE-1
is a master node)I monitored the pods with
while true; do date; kubectl get pods -o wide; sleep 1; done
Output:
What you expected to happen:
I expected Kubernetes to
my-nginx
pod on another node3a. Update the service to send traffic to the new pod
3b. Terminate the pod on the node that's being drained
I think the PodDisruptionBudget didn't have any effect. I ran another test without it and I ended up with a single, unready pod. I believe that happened because pulling the nginx image took longer on the third machine.
Anyway, it should be possible for me to drain master/worker nodes without downtime without
minAvailable: 1
)If the issue is from using
--force
then administrators need a better way of draining master nodes.How to reproduce it (as minimally and precisely as possible):
See above, it's 3 commands.
Environment:
uname -a
):4.4.0-*
The text was updated successfully, but these errors were encountered: