Restarting all pods even when only scaling up #1036

alenkacz · 2019-11-07T12:07:13Z

Why it happened:
For the upcoming release we chose a very simple strategy of restarting here #1031

Basically for every plan executed, we always restart everything because we're not sure right now whether we need to or not (we don't know if e.g. param change did not change a config map which would require a statefulset to be restarted). This is not an ideal solution though, we should be able to understand the dependencies and restart only when necessary.

How to reproduce:
Having a 3 brokers cluster when we scale it to 5 brokers, before it was adding the 2 brokers and get it ready now when we do:

k kudo update --instance=kafka -p BROKER_COUNT=5

Instance: instance default/kafka has updated parameters from map[] to map[BROKER_COUNT:5]
InstanceController: Going to start execution of plan deploy on instance default/kafka

and right after the new pods are up the old pods are restarted

NAME                             READY   STATUS    RESTARTS   AGE
kafka-kafka-0                    1/1     Running   0          2m4s
kafka-kafka-1                    1/1     Running   0          97s
kafka-kafka-2                    1/1     Running   0          63s
kafka-kafka-3                    0/1     Pending   0          3s
zookeeper-instance-zookeeper-0   1/1     Running   0          3m44s
zookeeper-instance-zookeeper-1   1/1     Running   0          3m44s
zookeeper-instance-zookeeper-2   1/1     Running   0          3m44s
kafka-kafka-3                    0/1     Pending   0          7s
kafka-kafka-3                    0/1     ContainerCreating   0          7s
kafka-kafka-3                    0/1     ContainerCreating   0          17s
kafka-kafka-3                    0/1     Running             0          24s
kafka-kafka-3                    1/1     Running             0          35s
kafka-kafka-4                    0/1     Pending             0          0s
kafka-kafka-4                    0/1     Pending             0          7s
kafka-kafka-4                    0/1     ContainerCreating   0          7s
kafka-kafka-4                    0/1     ContainerCreating   0          16s
kafka-kafka-4                    0/1     Running             0          18s
kafka-kafka-4                    1/1     Running             0          28s
kafka-kafka-2                    1/1     Terminating         0          2m3s
kafka-kafka-2                    0/1     Terminating         0          2m7s
kafka-kafka-2                    0/1     Terminating         0          2m19s
kafka-kafka-2                    0/1     Terminating         0          2m19s
kafka-kafka-2                    0/1     Pending             0          0s
kafka-kafka-2                    0/1     Pending             0          0s
kafka-kafka-2                    0/1     ContainerCreating   0          0s
kafka-kafka-2                    0/1     ContainerCreating   0          5s
kafka-kafka-2                    0/1     Running             0          7s
kafka-kafka-2                    1/1     Running             0          19s
kafka-kafka-1                    1/1     Terminating         0          3m12s
kafka-kafka-1                    0/1     Terminating         0          3m16s
kafka-kafka-1                    0/1     Terminating         0          3m17s
kafka-kafka-1                    0/1     Terminating         0          3m17s
kafka-kafka-1                    0/1     Pending             0          0s
kafka-kafka-1                    0/1     Pending             0          0s
kafka-kafka-1                    0/1     ContainerCreating   0          0s
kafka-kafka-1                    0/1     ContainerCreating   0          8s
kafka-kafka-1                    0/1     Running             0          10s
kafka-kafka-1                    1/1     Running             0          21s
kafka-kafka-0                    1/1     Terminating         0          4m5s
kafka-kafka-0                    0/1     Terminating         0          4m9s
kafka-kafka-0                    0/1     Terminating         0          4m10s
kafka-kafka-0                    0/1     Terminating         0          4m10s
kafka-kafka-0                    0/1     Pending             0          0s
kafka-kafka-0                    0/1     Pending             0          0s
kafka-kafka-0                    0/1     ContainerCreating   0          0s

The text was updated successfully, but these errors were encountered:

alenkacz · 2019-11-07T12:07:30Z

cc. @zmalik

zmalik · 2019-11-07T17:05:14Z

having a pdb helps in this particular case

ANeumann82 · 2020-01-30T15:28:28Z

pdb helps, but still restarts the Pods, making the update of the StatefulSet much slower.

This also applies if the deploy plan modifies just a service, as the enhancement of the resources sets the last-plan-execution-uid on the template of the statefulSet.

It would be nice if we only update the last-plan-execution-uid on the template if we know that the pods need to be restarted, but figuring that out might be complex to do automatically.

Maybe we could leave this to the operator developers in some way? They should know which variables should trigger a Pod restart, wether the var is used in the StatefulSet, a ConfigMap or somewhere else.

ANeumann82 · 2020-04-15T17:17:11Z

Relates to #1424 and KEP-27 #1449

alenkacz added kind/bug priority/high labels Nov 7, 2019

zen-dog mentioned this issue Apr 8, 2020

KEP-27: Detailed Control for Pod Restarts #1449

Merged

ANeumann82 mentioned this issue Apr 24, 2020

KEP-27: Detailed pod restart control by dependencies hash #1483

Merged

ANeumann82 closed this as completed May 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restarting all pods even when only scaling up #1036

Restarting all pods even when only scaling up #1036

alenkacz commented Nov 7, 2019

alenkacz commented Nov 7, 2019

zmalik commented Nov 7, 2019

ANeumann82 commented Jan 30, 2020

ANeumann82 commented Apr 15, 2020

Restarting all pods even when only scaling up #1036

Restarting all pods even when only scaling up #1036

Comments

alenkacz commented Nov 7, 2019

alenkacz commented Nov 7, 2019

zmalik commented Nov 7, 2019

ANeumann82 commented Jan 30, 2020

ANeumann82 commented Apr 15, 2020