You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With the introduction of the StrimziPodSets, the availability of the Cluster Operator is more important than before since it is also responsible for (re)starting the Kafka pods.
In general, the operator itself is:
Stateless and can easily continue after a restart
Fast to start when the pod is killed or deleted
So under normal circumstances, there should be no need to run it in multiple replicas. But in situations when multiple worker nodes (e.g. whole AZ) crash, it can easily happen that many pods go down including Kafka / ZooKeeper pods and the cluster operator pod. In such cases, a lot of new pods might need to be scheduled. That can cause congestions or other issues with insufficient resources when the operator cannot be scheduled and the operands and the StrimziPodSets are left un-operated.
Having multiple operator instances running before hand as a warm stand-by might be useful, because the backup instance of the operator will be already running and will be able to take over and handle the things almost immediately.
This can be implemented using the Lease resource and the support for LeaderElection which is part of the Fabric8 Kubernetes client.
The text was updated successfully, but these errors were encountered:
With the introduction of the StrimziPodSets, the availability of the Cluster Operator is more important than before since it is also responsible for (re)starting the Kafka pods.
In general, the operator itself is:
So under normal circumstances, there should be no need to run it in multiple replicas. But in situations when multiple worker nodes (e.g. whole AZ) crash, it can easily happen that many pods go down including Kafka / ZooKeeper pods and the cluster operator pod. In such cases, a lot of new pods might need to be scheduled. That can cause congestions or other issues with insufficient resources when the operator cannot be scheduled and the operands and the StrimziPodSets are left un-operated.
Having multiple operator instances running before hand as a warm stand-by might be useful, because the backup instance of the operator will be already running and will be able to take over and handle the things almost immediately.
This can be implemented using the Lease resource and the support for LeaderElection which is part of the Fabric8 Kubernetes client.
The text was updated successfully, but these errors were encountered: