Scale down may cause downtime #40304

caarlos0 · 2017-01-23T16:29:44Z

I created a service ops with 10 replicas, and a strategy.rollingUpdate.maxUnavailable = 1:

$ kubectl scale deployments ops --replicas 10
deployment "ops" scaled

$ kubectl get pods
NAME                  READY     STATUS    RESTARTS   AGE
ops-886507735-4h9ad   2/2       Running   0          9s
ops-886507735-9ng1p   2/2       Running   0          29s
ops-886507735-evawq   2/2       Running   0          44s
ops-886507735-ffjhx   2/2       Running   0          9s
ops-886507735-k6mmo   2/2       Running   0          9s
ops-886507735-m2nvl   2/2       Running   0          9s
ops-886507735-pke96   2/2       Running   0          9s
ops-886507735-veazd   2/2       Running   0          9s
ops-886507735-visci   2/2       Running   0          9s
ops-886507735-zdp26   2/2       Running   0          9s

If I deploy something wrong, let's suppose, a bad docker image that won't come up, or a tag that doesn't exist, some of the replicas will still be running:

$ kubectl set image deployment/ops ops=user/ops:nope
deployment "ops" image updated

$ kubectl get pods
NAME                   READY     STATUS             RESTARTS   AGE
ops-2122582723-10aep   1/2       ImagePullBackOff   0          31s
ops-2122582723-tyvqq   1/2       ImagePullBackOff   0          31s
ops-886507735-4h9ad    2/2       Running            0          1m
ops-886507735-9ng1p    2/2       Running            0          1m
ops-886507735-evawq    2/2       Running            0          1m
ops-886507735-ffjhx    2/2       Running            0          1m
ops-886507735-m2nvl    2/2       Running            0          1m
ops-886507735-pke96    2/2       Running            0          1m
ops-886507735-veazd    2/2       Running            0          1m
ops-886507735-visci    2/2       Running            0          1m
ops-886507735-zdp26    2/2       Running            0          1m

Now, if I scale down (for some reason) to 2 replicas, for example, what happens is:

$ kubectl scale deployments ops --replicas 2
deployment "ops" scaled

$ kubectl get pods
NAME                   READY     STATUS             RESTARTS   AGE
ops-2122582723-tyvqq   1/2       ImagePullBackOff   0          2m
ops-2122582723-yuruf   1/2       ErrImagePull       0          33s
ops-886507735-evawq    2/2       Running            0          4m

Which is OK, service is still up.

Now, if I scale down to 1 replica, things get ugly:

$ kubectl scale deployments ops --replicas 1
deployment "ops" scaled

$ kubectl get pods
NAME                   READY     STATUS             RESTARTS   AGE
ops-2122582723-tyvqq   1/2       ImagePullBackOff   0          3m
ops-886507735-evawq    2/2       Terminating        0          4m

Downtime!

Why don't Kubernetes let the container that was working running instead of killing it and trying to launch a new one?

Is there a way of auto-rolling back when this kind of things happen (or even better, prevent it)?

The text was updated successfully, but these errors were encountered:

0xmichalis · 2017-01-23T18:04:07Z

@caarlos0 this is how the initial pass on proportional scaling was implemented. Also when scaling down we always try to remove from old replica sets first. It's definitely desirable to enhance the deployment controller to scale down broken pods first.

0xmichalis · 2017-01-23T18:08:02Z

Is there a way of auto-rolling back when this kind of things happen (or even better, prevent it)?

Autorollback (#23211) is yet to be implemented but in 1.5 you can use progressDeadlineSeconds and identify stuck deployments.

https://kubernetes.io/docs/user-guide/deployments/#deployment-status

caarlos0 · 2017-01-23T18:58:51Z

Got it, thanks @Kargakis 👍

0xmichalis · 2017-01-23T18:59:31Z

@kubernetes/sig-apps-misc

0xmichalis · 2017-01-30T11:28:55Z

@caarlos0 one suggestion for now - since it's hard to act on perma-failed errors eg. somebody may not care about ImagePullBackOff and expects the image to land at some point in the future - if you are going to scale down manually first make sure that your Deployment is healthy. In this case you should rollback kubectl rollout undo before scaling down. Eventually, we should make sure that scaling down removes broken pods first because you may use an autoscaler.

0xmichalis · 2017-02-02T17:18:46Z

Anybody from @kubernetes/sig-apps-misc have time to take a stab at this one? Basically we should cleanup unhealthy pods before estimating proportions when we scale down in scale.

caarlos0 · 2017-02-02T17:22:30Z

If someone point me to the right direction, I can try to tackle this...

0xmichalis · 2017-02-02T18:18:19Z

Ok, I just realized that trying to cleanup the new replica set will do no good. The system always tries to deploy the latest replica set so having a part of the controller scale down the new replica set (cleanup) and then another part scale up (the strategy) will drive in hotlooping of the controller. That being said I think this is an not issue, we provide you with ways/tools to diagnose failures (d.spec.progressDeadlineSeconds) and rollback (kubectl rollout undo).

caarlos0 · 2017-02-02T18:37:49Z

@Kargakis maybe just fail then? Showing some error message saying that is not possible to scale down because there is no healthy instances in the new version, or something like that?

Of course, the user can check before scaling down... this would be just some kind of safe guard...

0xmichalis · 2017-02-03T02:17:12Z

We cannot special-case the operation, otherwise we may drive autoscalers in hotloops. There is no reason not to rollback in this case other than when you expect the new image to be imported at some point in the future.

caarlos0 · 2017-02-03T11:10:58Z

OK, makes sense. Thanks @Kargakis =D

0xmichalis self-assigned this Jan 23, 2017

0xmichalis added this to the 1.6 milestone Jan 23, 2017

0xmichalis added area/workload-api/deployment sig/apps Categorizes an issue or PR as relevant to SIG Apps. kind/enhancement and removed sig/apps Categorizes an issue or PR as relevant to SIG Apps. labels Jan 23, 2017

0xmichalis closed this as completed Feb 2, 2017

calebamiles modified the milestones: v1.6, 1.6 Feb 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scale down may cause downtime #40304

Scale down may cause downtime #40304

caarlos0 commented Jan 23, 2017 •

edited

0xmichalis commented Jan 23, 2017

0xmichalis commented Jan 23, 2017

caarlos0 commented Jan 23, 2017

0xmichalis commented Jan 23, 2017

0xmichalis commented Jan 30, 2017

0xmichalis commented Feb 2, 2017

caarlos0 commented Feb 2, 2017

0xmichalis commented Feb 2, 2017

caarlos0 commented Feb 2, 2017

0xmichalis commented Feb 3, 2017

caarlos0 commented Feb 3, 2017

Scale down may cause downtime #40304

Scale down may cause downtime #40304

Comments

caarlos0 commented Jan 23, 2017 • edited

0xmichalis commented Jan 23, 2017

0xmichalis commented Jan 23, 2017

caarlos0 commented Jan 23, 2017

0xmichalis commented Jan 23, 2017

0xmichalis commented Jan 30, 2017

0xmichalis commented Feb 2, 2017

caarlos0 commented Feb 2, 2017

0xmichalis commented Feb 2, 2017

caarlos0 commented Feb 2, 2017

0xmichalis commented Feb 3, 2017

caarlos0 commented Feb 3, 2017

caarlos0 commented Jan 23, 2017 •

edited