Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow the Cluster Operator to run in multiple replicas #7174

Closed
scholzj opened this issue Aug 8, 2022 · 0 comments · Fixed by #7177
Closed

Allow the Cluster Operator to run in multiple replicas #7174

scholzj opened this issue Aug 8, 2022 · 0 comments · Fixed by #7177
Assignees

Comments

@scholzj
Copy link
Member

scholzj commented Aug 8, 2022

With the introduction of the StrimziPodSets, the availability of the Cluster Operator is more important than before since it is also responsible for (re)starting the Kafka pods.

In general, the operator itself is:

  • Stateless and can easily continue after a restart
  • Fast to start when the pod is killed or deleted

So under normal circumstances, there should be no need to run it in multiple replicas. But in situations when multiple worker nodes (e.g. whole AZ) crash, it can easily happen that many pods go down including Kafka / ZooKeeper pods and the cluster operator pod. In such cases, a lot of new pods might need to be scheduled. That can cause congestions or other issues with insufficient resources when the operator cannot be scheduled and the operands and the StrimziPodSets are left un-operated.

Having multiple operator instances running before hand as a warm stand-by might be useful, because the backup instance of the operator will be already running and will be able to take over and handle the things almost immediately.

This can be implemented using the Lease resource and the support for LeaderElection which is part of the Fabric8 Kubernetes client.

@scholzj scholzj self-assigned this Aug 8, 2022
scholzj added a commit to scholzj/strimzi-kafka-operator that referenced this issue Aug 15, 2022
Signed-off-by: Jakub Scholz <www@scholzj.com>
scholzj added a commit that referenced this issue Aug 16, 2022
* Add leader election to the Cluster Operator - Closes #7174

Signed-off-by: Jakub Scholz <www@scholzj.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant