Use `podAntiAffinity` so brokers don't accidentally run on the same machine #33

tombentley · 2017-10-11T16:55:24Z

We should use podAntiAffinity so brokers don't accidentally run on the same physical machine, because that would defeat the point of Kafka holding multiple replicas of partitions.

The text was updated successfully, but these errors were encountered:

scholzj · 2017-10-12T07:41:45Z

I'm not sure we should really use this by default. Some of my thoughts why ...

Kubernetes will try to schedule the pods of stateful set on separate nodes where possible on its own. So for the basic situations anti-affinity is not needed. The situation when it might not work perfectly is for example:

you have 3 node Kubernetes cluster
Kafka is deployed with 3 pods - one pod is scheduled per Kubernetes node
One of the nodes crashes => Kubernetes reschedules the pod from the crashed node to one of the two other nodes
The crashed node rejoined the cluster, but Kubernetes will not pro-actively kill one of the pods to move it to the newly joined node. So although we have 3 nodes again, we are running only 2 nodes => problem.

The pod affinity would cause us some complications:

It will never run a cluster on minikube / minishift / oc cluster up
We will need to carefully select the anti affinity selector so that it binds the affinity only against Kafka pods from the same Kafka cluster (e.g. if you for whatever reason have multiple Kafka installations)

With regards to the replication, at the end this is not about not running two Kafka nodes on the same host but about not running two Kafka hosts with different rack IDs on the same host. So what we should do is add support for broker.rack in the Kafka images and add support for host selectors based on the "zone" where the host is running. That will ensure that there might be several Kafka instances running on the same host but always only with the same rack ID and such instances will not share any replicas. And when we have that implemented the only problem which sharing the physical machine for multiple Kafka pods can cause is that they will steal each others disk cache.

The practical complication with podAntiAffinity is that we will need to configure it somehow - we will at least need a set of files without it for testing on single node installations such as minikube / minishift and set of files with anti affinity for running on real clusters. That sounds like something would should be done after #31. I guess that sooner or later we will anyway get to the state when deploying the cluster will be done by some controller - there it would be even easier because you would just tell the controller that it should deploy "with anti-affinity" and it would generate the code for you.

tombentley · 2017-10-12T09:02:12Z

It will never run a cluster on minikube / minishift / oc cluster up

Is that true of preferredDuringSchedulingIgnoredDuringExecution, wouldn't it just ignore the affinity with that setting?

So what we should do is add support for broker.rack in the Kafka images

Can you open a separate issue for that, @scholzj?

I guess that sooner or later we will anyway get to the state when deploying the cluster will be done by some controller - there it would be even easier because you would just tell the controller that it should deploy "with anti-affinity" and it would generate the code for you.

I'm happy to defer this issue until we have such a controller.

scholzj · 2017-10-12T14:12:11Z

You are right, preferredDuringSchedulingIgnoredDuringExecution should schedule all on Minikube and such (see, I don't know that much about Kubernetes and openShift ;-)). It is also interesting that the Kubernetes docu seems to talk about this even in connection to StatufulSets. I haven't seen any stateful set which would not follow the preferredDuringSchedulingIgnoredDuringExecution rule automatically. But maybe it is not guaranteed. So maybe we should add it to be sure.

I raised #36 for the rack IDs.

vinu · 2018-05-30T05:49:13Z

hey anything on this yet? -
what is the best way to have podaffinity and antiaffnity with strimzi
thanks

ppatierno · 2018-05-30T06:04:28Z

Hi @vinu this feature is planned for the current sprint so it will be available at the end of the next 3 weeks.

Fixes #33

* Add support for user-configurable pod and node affinities Fixes #33

Fix checkstyle issues

scholzj added the yaml label Oct 18, 2017

tombentley mentioned this issue Jun 11, 2018

Affinity #486

Merged

6 tasks

tombentley added a commit that referenced this issue Jun 11, 2018

Add support for user-configurable pod and node affinities

6c5748d

Fixes #33

tombentley closed this as completed in #486 Jun 12, 2018

tombentley added a commit that referenced this issue Jun 12, 2018

Affinity (#486)

641334c

* Add support for user-configurable pod and node affinities Fixes #33

tombentley added this to the 0.5.0 milestone Jun 12, 2018

tomncooper added a commit to tomncooper/strimzi-kafka-operator that referenced this issue Apr 14, 2020

Merge pull request strimzi#33 from tomncooper/cruise-control-dev

d994a25

Fix checkstyle issues

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `podAntiAffinity` so brokers don't accidentally run on the same machine #33

Use `podAntiAffinity` so brokers don't accidentally run on the same machine #33

tombentley commented Oct 11, 2017 •

edited

scholzj commented Oct 12, 2017

tombentley commented Oct 12, 2017

scholzj commented Oct 12, 2017

vinu commented May 30, 2018

ppatierno commented May 30, 2018

Use podAntiAffinity so brokers don't accidentally run on the same machine #33

Use podAntiAffinity so brokers don't accidentally run on the same machine #33

Comments

tombentley commented Oct 11, 2017 • edited

scholzj commented Oct 12, 2017

tombentley commented Oct 12, 2017

scholzj commented Oct 12, 2017

vinu commented May 30, 2018

ppatierno commented May 30, 2018

Use `podAntiAffinity` so brokers don't accidentally run on the same machine #33

Use `podAntiAffinity` so brokers don't accidentally run on the same machine #33

tombentley commented Oct 11, 2017 •

edited