Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restarting all pods even when only scaling up #1036

Closed
alenkacz opened this issue Nov 7, 2019 · 4 comments
Closed

Restarting all pods even when only scaling up #1036

alenkacz opened this issue Nov 7, 2019 · 4 comments

Comments

@alenkacz
Copy link
Contributor

alenkacz commented Nov 7, 2019

Why it happened:
For the upcoming release we chose a very simple strategy of restarting here #1031

Basically for every plan executed, we always restart everything because we're not sure right now whether we need to or not (we don't know if e.g. param change did not change a config map which would require a statefulset to be restarted). This is not an ideal solution though, we should be able to understand the dependencies and restart only when necessary.

How to reproduce:
Having a 3 brokers cluster when we scale it to 5 brokers, before it was adding the 2 brokers and get it ready now when we do:

k kudo update --instance=kafka -p BROKER_COUNT=5
Instance: instance default/kafka has updated parameters from map[] to map[BROKER_COUNT:5]
InstanceController: Going to start execution of plan deploy on instance default/kafka

and right after the new pods are up the old pods are restarted

NAME                             READY   STATUS    RESTARTS   AGE
kafka-kafka-0                    1/1     Running   0          2m4s
kafka-kafka-1                    1/1     Running   0          97s
kafka-kafka-2                    1/1     Running   0          63s
kafka-kafka-3                    0/1     Pending   0          3s
zookeeper-instance-zookeeper-0   1/1     Running   0          3m44s
zookeeper-instance-zookeeper-1   1/1     Running   0          3m44s
zookeeper-instance-zookeeper-2   1/1     Running   0          3m44s
kafka-kafka-3                    0/1     Pending   0          7s
kafka-kafka-3                    0/1     ContainerCreating   0          7s
kafka-kafka-3                    0/1     ContainerCreating   0          17s
kafka-kafka-3                    0/1     Running             0          24s
kafka-kafka-3                    1/1     Running             0          35s
kafka-kafka-4                    0/1     Pending             0          0s
kafka-kafka-4                    0/1     Pending             0          7s
kafka-kafka-4                    0/1     ContainerCreating   0          7s
kafka-kafka-4                    0/1     ContainerCreating   0          16s
kafka-kafka-4                    0/1     Running             0          18s
kafka-kafka-4                    1/1     Running             0          28s
kafka-kafka-2                    1/1     Terminating         0          2m3s
kafka-kafka-2                    0/1     Terminating         0          2m7s
kafka-kafka-2                    0/1     Terminating         0          2m19s
kafka-kafka-2                    0/1     Terminating         0          2m19s
kafka-kafka-2                    0/1     Pending             0          0s
kafka-kafka-2                    0/1     Pending             0          0s
kafka-kafka-2                    0/1     ContainerCreating   0          0s
kafka-kafka-2                    0/1     ContainerCreating   0          5s
kafka-kafka-2                    0/1     Running             0          7s
kafka-kafka-2                    1/1     Running             0          19s
kafka-kafka-1                    1/1     Terminating         0          3m12s
kafka-kafka-1                    0/1     Terminating         0          3m16s
kafka-kafka-1                    0/1     Terminating         0          3m17s
kafka-kafka-1                    0/1     Terminating         0          3m17s
kafka-kafka-1                    0/1     Pending             0          0s
kafka-kafka-1                    0/1     Pending             0          0s
kafka-kafka-1                    0/1     ContainerCreating   0          0s
kafka-kafka-1                    0/1     ContainerCreating   0          8s
kafka-kafka-1                    0/1     Running             0          10s
kafka-kafka-1                    1/1     Running             0          21s
kafka-kafka-0                    1/1     Terminating         0          4m5s
kafka-kafka-0                    0/1     Terminating         0          4m9s
kafka-kafka-0                    0/1     Terminating         0          4m10s
kafka-kafka-0                    0/1     Terminating         0          4m10s
kafka-kafka-0                    0/1     Pending             0          0s
kafka-kafka-0                    0/1     Pending             0          0s
kafka-kafka-0                    0/1     ContainerCreating   0          0s
@alenkacz
Copy link
Contributor Author

alenkacz commented Nov 7, 2019

cc. @zmalik

@zmalik
Copy link
Member

zmalik commented Nov 7, 2019

having a pdb helps in this particular case

@ANeumann82
Copy link
Member

pdb helps, but still restarts the Pods, making the update of the StatefulSet much slower.

This also applies if the deploy plan modifies just a service, as the enhancement of the resources sets the last-plan-execution-uid on the template of the statefulSet.

It would be nice if we only update the last-plan-execution-uid on the template if we know that the pods need to be restarted, but figuring that out might be complex to do automatically.

Maybe we could leave this to the operator developers in some way? They should know which variables should trigger a Pod restart, wether the var is used in the StatefulSet, a ConfigMap or somewhere else.

@ANeumann82
Copy link
Member

Relates to #1424 and KEP-27 #1449

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants