Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Cruise control topics should use proper configuration to make smooth rolling updates possible and ensure availability #9469

Closed
scholzj opened this issue Dec 15, 2023 · 3 comments
Labels

Comments

@scholzj
Copy link
Member

scholzj commented Dec 15, 2023

When Cruise Control is deployed, 2 of its topics - strimzi.cruisecontrol.modeltrainingsamples and strimzi.cruisecontrol.partitionmetricsamples - have a wrong configuration with replication factor 2 and min.insync.replicas set to 2 as well. This is causing issues during rolling updates or any failures as these topics easily become underreplicated.

To reproduce, you can just deploy the Cruise Control example:

kubectl apply -f https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/main/examples/cruise-control/kafka-cruise-control.yaml

And check the topics:

[kafka@my-cluster-kafka-0 kafka]$ bin/kafka-topics.sh --bootstrap-server localhost:9092 --describe
Topic: strimzi.cruisecontrol.metrics    TopicId: wCyWgDzaT2GJdonrPxlHIw PartitionCount: 1       ReplicationFactor: 3    Configs: min.insync.replicas=1,cleanup.policy=delete,retention.ms=18000000,message.format.version=3.0-IV1
        Topic: strimzi.cruisecontrol.metrics    Partition: 0    Leader: 2       Replicas: 2,0,1 Isr: 2,0,1
Topic: strimzi.cruisecontrol.partitionmetricsamples     TopicId: 1Z9zrBOkTC62WTTy2nmj6Q PartitionCount: 32      ReplicationFactor: 2    Configs: min.insync.replicas=2,cleanup.policy=delete,retention.ms=3600000,message.format.version=3.0-IV1
        Topic: strimzi.cruisecontrol.partitionmetricsamples     Partition: 0    Leader: 0       Replicas: 0,2   Isr: 0,2
        Topic: strimzi.cruisecontrol.partitionmetricsamples     Partition: 1    Leader: 2       Replicas: 2,1   Isr: 2,1
        Topic: strimzi.cruisecontrol.partitionmetricsamples     Partition: 2    Leader: 1       Replicas: 1,0   Isr: 1,0
        Topic: strimzi.cruisecontrol.partitionmetricsamples     Partition: 3    Leader: 0       Replicas: 0,1   Isr: 0,1
        Topic: strimzi.cruisecontrol.partitionmetricsamples     Partition: 4    Leader: 2       Replicas: 2,0   Isr: 2,0
        Topic: strimzi.cruisecontrol.partitionmetricsamples     Partition: 5    Leader: 1       Replicas: 1,2   Isr: 1,2
        Topic: strimzi.cruisecontrol.partitionmetricsamples     Partition: 6    Leader: 0       Replicas: 0,2   Isr: 0,2
        Topic: strimzi.cruisecontrol.partitionmetricsamples     Partition: 7    Leader: 2       Replicas: 2,1   Isr: 2,1
        Topic: strimzi.cruisecontrol.partitionmetricsamples     Partition: 8    Leader: 1       Replicas: 1,0   Isr: 1,0
        Topic: strimzi.cruisecontrol.partitionmetricsamples     Partition: 9    Leader: 0       Replicas: 0,1   Isr: 0,1
        Topic: strimzi.cruisecontrol.partitionmetricsamples     Partition: 10   Leader: 2       Replicas: 2,0   Isr: 2,0
        Topic: strimzi.cruisecontrol.partitionmetricsamples     Partition: 11   Leader: 1       Replicas: 1,2   Isr: 1,2
        Topic: strimzi.cruisecontrol.partitionmetricsamples     Partition: 12   Leader: 0       Replicas: 0,2   Isr: 0,2
        Topic: strimzi.cruisecontrol.partitionmetricsamples     Partition: 13   Leader: 2       Replicas: 2,1   Isr: 2,1
        Topic: strimzi.cruisecontrol.partitionmetricsamples     Partition: 14   Leader: 1       Replicas: 1,0   Isr: 1,0
        Topic: strimzi.cruisecontrol.partitionmetricsamples     Partition: 15   Leader: 0       Replicas: 0,1   Isr: 0,1
        Topic: strimzi.cruisecontrol.partitionmetricsamples     Partition: 16   Leader: 2       Replicas: 2,0   Isr: 2,0
        Topic: strimzi.cruisecontrol.partitionmetricsamples     Partition: 17   Leader: 1       Replicas: 1,2   Isr: 1,2
        Topic: strimzi.cruisecontrol.partitionmetricsamples     Partition: 18   Leader: 0       Replicas: 0,2   Isr: 0,2
        Topic: strimzi.cruisecontrol.partitionmetricsamples     Partition: 19   Leader: 2       Replicas: 2,1   Isr: 2,1
        Topic: strimzi.cruisecontrol.partitionmetricsamples     Partition: 20   Leader: 1       Replicas: 1,0   Isr: 1,0
        Topic: strimzi.cruisecontrol.partitionmetricsamples     Partition: 21   Leader: 0       Replicas: 0,1   Isr: 0,1
        Topic: strimzi.cruisecontrol.partitionmetricsamples     Partition: 22   Leader: 2       Replicas: 2,0   Isr: 2,0
        Topic: strimzi.cruisecontrol.partitionmetricsamples     Partition: 23   Leader: 1       Replicas: 1,2   Isr: 1,2
        Topic: strimzi.cruisecontrol.partitionmetricsamples     Partition: 24   Leader: 0       Replicas: 0,2   Isr: 0,2
        Topic: strimzi.cruisecontrol.partitionmetricsamples     Partition: 25   Leader: 2       Replicas: 2,1   Isr: 2,1
        Topic: strimzi.cruisecontrol.partitionmetricsamples     Partition: 26   Leader: 1       Replicas: 1,0   Isr: 1,0
        Topic: strimzi.cruisecontrol.partitionmetricsamples     Partition: 27   Leader: 0       Replicas: 0,1   Isr: 0,1
        Topic: strimzi.cruisecontrol.partitionmetricsamples     Partition: 28   Leader: 2       Replicas: 2,0   Isr: 2,0
        Topic: strimzi.cruisecontrol.partitionmetricsamples     Partition: 29   Leader: 1       Replicas: 1,2   Isr: 1,2
        Topic: strimzi.cruisecontrol.partitionmetricsamples     Partition: 30   Leader: 0       Replicas: 0,2   Isr: 0,2
        Topic: strimzi.cruisecontrol.partitionmetricsamples     Partition: 31   Leader: 2       Replicas: 2,1   Isr: 2,1
Topic: strimzi.cruisecontrol.modeltrainingsamples       TopicId: cjf3IRd4QemE9ESDbAahdw PartitionCount: 32      ReplicationFactor: 2    Configs: min.insync.replicas=2,cleanup.policy=delete,retention.ms=12000000,message.format.version=3.0-IV1
        Topic: strimzi.cruisecontrol.modeltrainingsamples       Partition: 0    Leader: 2       Replicas: 2,1   Isr: 2,1
        Topic: strimzi.cruisecontrol.modeltrainingsamples       Partition: 1    Leader: 1       Replicas: 1,0   Isr: 1,0
        Topic: strimzi.cruisecontrol.modeltrainingsamples       Partition: 2    Leader: 0       Replicas: 0,2   Isr: 0,2
        Topic: strimzi.cruisecontrol.modeltrainingsamples       Partition: 3    Leader: 2       Replicas: 2,0   Isr: 2,0
        Topic: strimzi.cruisecontrol.modeltrainingsamples       Partition: 4    Leader: 1       Replicas: 1,2   Isr: 1,2
        Topic: strimzi.cruisecontrol.modeltrainingsamples       Partition: 5    Leader: 0       Replicas: 0,1   Isr: 0,1
        Topic: strimzi.cruisecontrol.modeltrainingsamples       Partition: 6    Leader: 2       Replicas: 2,1   Isr: 2,1
        Topic: strimzi.cruisecontrol.modeltrainingsamples       Partition: 7    Leader: 1       Replicas: 1,0   Isr: 1,0
        Topic: strimzi.cruisecontrol.modeltrainingsamples       Partition: 8    Leader: 0       Replicas: 0,2   Isr: 0,2
        Topic: strimzi.cruisecontrol.modeltrainingsamples       Partition: 9    Leader: 2       Replicas: 2,0   Isr: 2,0
        Topic: strimzi.cruisecontrol.modeltrainingsamples       Partition: 10   Leader: 1       Replicas: 1,2   Isr: 1,2
        Topic: strimzi.cruisecontrol.modeltrainingsamples       Partition: 11   Leader: 0       Replicas: 0,1   Isr: 0,1
        Topic: strimzi.cruisecontrol.modeltrainingsamples       Partition: 12   Leader: 2       Replicas: 2,1   Isr: 2,1
        Topic: strimzi.cruisecontrol.modeltrainingsamples       Partition: 13   Leader: 1       Replicas: 1,0   Isr: 1,0
        Topic: strimzi.cruisecontrol.modeltrainingsamples       Partition: 14   Leader: 0       Replicas: 0,2   Isr: 0,2
        Topic: strimzi.cruisecontrol.modeltrainingsamples       Partition: 15   Leader: 2       Replicas: 2,0   Isr: 2,0
        Topic: strimzi.cruisecontrol.modeltrainingsamples       Partition: 16   Leader: 1       Replicas: 1,2   Isr: 1,2
        Topic: strimzi.cruisecontrol.modeltrainingsamples       Partition: 17   Leader: 0       Replicas: 0,1   Isr: 0,1
        Topic: strimzi.cruisecontrol.modeltrainingsamples       Partition: 18   Leader: 2       Replicas: 2,1   Isr: 2,1
        Topic: strimzi.cruisecontrol.modeltrainingsamples       Partition: 19   Leader: 1       Replicas: 1,0   Isr: 1,0
        Topic: strimzi.cruisecontrol.modeltrainingsamples       Partition: 20   Leader: 0       Replicas: 0,2   Isr: 0,2
        Topic: strimzi.cruisecontrol.modeltrainingsamples       Partition: 21   Leader: 2       Replicas: 2,0   Isr: 2,0
        Topic: strimzi.cruisecontrol.modeltrainingsamples       Partition: 22   Leader: 1       Replicas: 1,2   Isr: 1,2
        Topic: strimzi.cruisecontrol.modeltrainingsamples       Partition: 23   Leader: 0       Replicas: 0,1   Isr: 0,1
        Topic: strimzi.cruisecontrol.modeltrainingsamples       Partition: 24   Leader: 2       Replicas: 2,1   Isr: 2,1
        Topic: strimzi.cruisecontrol.modeltrainingsamples       Partition: 25   Leader: 1       Replicas: 1,0   Isr: 1,0
        Topic: strimzi.cruisecontrol.modeltrainingsamples       Partition: 26   Leader: 0       Replicas: 0,2   Isr: 0,2
        Topic: strimzi.cruisecontrol.modeltrainingsamples       Partition: 27   Leader: 2       Replicas: 2,0   Isr: 2,0
        Topic: strimzi.cruisecontrol.modeltrainingsamples       Partition: 28   Leader: 1       Replicas: 1,2   Isr: 1,2
        Topic: strimzi.cruisecontrol.modeltrainingsamples       Partition: 29   Leader: 0       Replicas: 0,1   Isr: 0,1
        Topic: strimzi.cruisecontrol.modeltrainingsamples       Partition: 30   Leader: 2       Replicas: 2,1   Isr: 2,1
        Topic: strimzi.cruisecontrol.modeltrainingsamples       Partition: 31   Leader: 1       Replicas: 1,0   Isr: 1,0

It seems to me, that this is caused by the replication factor being set to 2 by Cruise Control or Strimzi, and the min.insync.replicas=2 is inherited from the Kafka cluster configuration. This should be fixed. It should either follow the defaults when creating the topics, or if it wants to use its own replication factor, it should also make sure to use its own min.insync.replicas!

@kyguy
Copy link
Member

kyguy commented Dec 15, 2023

It seems to me, that this is caused by the replication factor being set to 2 by Cruise Control or Strimzi, and the min.insync.replicas=2 is inherited from the Kafka cluster configuration

Yes you are right, the Cruise Control configuration sets the replication factor of these topics to 2 by default and the min.insync.replicas value is inherited by the Kafka cluster configuration.

It should either follow the defaults when creating the topics

This is the way to go, the CC topics should inherit the replication factor value from the Kafka configuration by default just as it inherits min.insync.replicas value.

I'll submit a PR shortly

@scholzj
Copy link
Member Author

scholzj commented Jan 11, 2024

Triaged on a community call on 1.11.2024: Should be fixed.

@scholzj
Copy link
Member Author

scholzj commented Feb 10, 2024

Done in #9471

@scholzj scholzj closed this as completed Feb 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants