Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set default replication factor for CC sample topics #9471

Merged
merged 1 commit into from Feb 10, 2024

Conversation

kyguy
Copy link
Member

@kyguy kyguy commented Dec 15, 2023

Type of change

  • Bugfix

Description

Addresses #9469

At this time, Cruise Control does not provide a min.insync.replicas configuration for sample store topics in its properties file. Therefore, to set the min.insync.replicas configuration for sample store topics, a user must either:

  1. Create the topics manually with the desired min.insync.replicas configuration value
  2. Use the Kafka cluster's min.insync.replica configuration

The issue here is that Cruise Control sets the replication factor of sample store topics to 2 by default and the Strimzi example files set the min.insync.replica of the Kafka clusters to 2. This causes issues when Strimzi preforms rolling updates on the cluster as the sample store topics easily become under replicated.

This PR sets the default replication factor of sample store topics, sample.store.topic.replication.factor, to match that of the Kafka cluster's default.replication.factor.

Checklist

Please go through this checklist and make sure all applicable tasks have been done

  • Write tests
  • Make sure all tests pass
  • Update documentation
  • Check RBAC rights for Kubernetes / OpenShift roles
  • Try your changes from Pod inside your Kubernetes and OpenShift cluster, not just locally
  • Reference relevant issue(s) and close them after merging
  • Update CHANGELOG.md
  • Supply screenshots for visual changes, such as Grafana dashboards

@kyguy kyguy force-pushed the fix-cc-rf branch 3 times, most recently from 076f038 to 5f0d6d7 Compare December 18, 2023 16:56
@kyguy kyguy added this to the 0.40.0 milestone Dec 18, 2023
@kyguy kyguy force-pushed the fix-cc-rf branch 3 times, most recently from 1b311e3 to 0dbb79b Compare December 19, 2023 22:46
@kyguy kyguy marked this pull request as ready for review December 20, 2023 02:28
Copy link
Member

@scholzj scholzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder a bit whether this is the right fix. Do we set the replication factor to 2 in Strimzi? Or does Cruise Control do it? If Cruise Control does it, what are the reasons? Isn't then the right fix for Cruise Control to also set the min.insync.replicas to 1 and thus create a proper topic configuration?

Copy link
Contributor

@fvaleri fvaleri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cruise Control does not set minISR equal to cluster default, we do.

Cruise Control sets sample.store.topic.replication.factor=2 for these internal topics, and looks at the cluster.configs.file for cluster configurations. The sample config/clusterConfigs.json contains min.insync.replicas=1, which is correct.

IMO, we should simply remove the code I linked, because our default minISR for CC is also 1.

@scholzj
Copy link
Member

scholzj commented Dec 20, 2023

I guess we need to decide what is the right way to handle it. WDYT @ppatierno?

@ShubhamRwt
Copy link
Contributor

ShubhamRwt commented Dec 20, 2023

Going through the Cruise Control docs it looks like that the replication factor is set to 2 for internal topics like sample.store.topic.replication.factor. IMO I think what @fvaleri suggest would be an simple fix for this problem since we will be using the default replication factor and min-ISR. I am not sure about the reason why we are setting the min-isr to match that of kafka clusters(maybe to stay consistent with the kafka cluster) but if we have a reason for that then I guess @kyguy solution would be the way to go

@kyguy
Copy link
Member Author

kyguy commented Dec 20, 2023

Do we set the replication factor to 2 in Strimzi? Or does Cruise Control do it?

Cruise Control sets it by default

If Cruise Control does it, what are the reasons?

From what I understand Cruise Control provides a min.insync.replicas value of 1 in their sample config/clusterConfigs.json file because that is default config value for min.insync.replicas for Apache Kafka brokers [1] . Following that, CC sets their sample store topic replication factors to 2 by default since losing sample store topics will only slow down the generation of a rebalance proposal, which is an inconvenience but not a death sentence.

Isn't then the right fix for Cruise Control to also set the min.insync.replicas to 1 and thus create a proper topic configuration?

Yes. We have an issue upstream [2] for this which I am hoping to fix/address. That being said, until that upstream issue is addressed, we could use this fix as a work around or leave the code as it is for the time being.

Cruise Control sets sample.store.topic.replication.factor=2 for these internal topics, and looks at the cluster.configs.file for cluster configurations. The sample config/clusterConfigs.json contains min.insync.replicas=1, which is correct.

It is the responsibility of the CC user (or in our case, Strimzi) to maintain the config/clusterConfigs.json with the Kafka cluster's min.insync.replicas configuration. This cluster configuration is used by Cruise Control when flagging under under replicated partitions. [3] If we don't set the config/clusterConfigs.json config with our cluster's min.insync.replicas config, the data reported by Cruise Control will be incorrect. e.g. when a:

  • Kafka cluster's min.insync.replicas is set to 2
  • Kafka cluster's default.replication.factor is set to 3
  • Cruise Control min.insync.replicas is left at its default of 1

A partition that is under replicated, 2 replicas behind, will not get flagged by Cruise Control. So (1) The user will not know their partition replication is behind/lagging from CC data (2) Cruise Control will not factor in this partition replication lag when it is creating a propose which could lead to a rebalance which could cause a cluster to become more unbalanced or worse, lose data.

maybe to stay consistent with the kafka cluster

Yes, this is key and why we do it. Cruise Control depends on this information to maintain correct state in its cluster model

[1] https://kafka.apache.org/documentation/#brokerconfigs_min.insync.replicas
[2] linkedin/cruise-control#2093
[3] https://github.com/linkedin/cruise-control/blob/aeffe0e3025328d2670765595ba533387221a2e9/cruise-control/src/main/java/com/linkedin/kafka/cruisecontrol/KafkaClusterState.java#L83-L137

@scholzj
Copy link
Member

scholzj commented Dec 20, 2023

I guess I'm a bit confused on how to interpret this.

CC sets their sample store topic replication factors to 2 by default since losing sample store topics will only slow down the generation of a rebalance proposal, which is an inconvenience but not a death sentence.

One has to wonder if we should override it with Kafka's defaults if CC intentionally chose 2. It seems more like it just needs to set the min.insync.replicas for it to 1 or 2 explicitly as well? What happens if the default min.insync.replicas would be set to 3 (that happens for some users running across 2 DCs only). Would Cruise Control break because it would never work?

Yes. We have an issue upstream [2] for this which I am hoping to fix/address. That being said, until that upstream issue is addressed, we could use this fix as a work around or leave the code as it is for the time being.

Does it really make sense to pursue both this fix and the CC fix? Once CC has some default for it, would you need to change this again? Or how would the user choose between using the Kafka's default and the CC default? It does not seem to me like one is workaround of the other. But more like two different solutions from which we will need to pick one.

It is the responsibility of the CC user (or in our case, Strimzi) to maintain the config/clusterConfigs.json with the Kafka cluster's min.insync.replicas configuration.

  • Do we generate this file today? Or is this some missing part on Strimzi side?
  • Out of curiosity ... why does Cruise Control not read it form the topics? kafka-topics.sh seems to be able to read it.

@fvaleri
Copy link
Contributor

fvaleri commented Dec 20, 2023

This cluster configuration is used by Cruise Control when flagging under under replicated partitions. [3] If we don't set the config/clusterConfigs.json config with our cluster's min.insync.replicas config, the data reported by Cruise Control will be incorrect.

Ok, but with your patch it can still happen that the user sets RF 3 and minISR 1 at the cluster level, causing the same issue with flagging under under replicated partitions. I think we should have the same defaults that CC has (RF 2, minISR 1) when .kafka.replicas < 3, and set stronger defaults (RF 3, minISR 2) when kafka.replicas >=3. Wdyt?

@kyguy
Copy link
Member Author

kyguy commented Dec 20, 2023

I guess I'm a bit confused on how to interpret this.

I don't believe there is any strong reason behind CC's chosen defaults, regardless, I have asked CC maintainers if there is any specific reason why sample.store.topic.replication.factor defaults to 2. I'll relay the response once I have it.

It seems more like it just needs to set the min.insync.replicas for it to 1 or 2 explicitly as well?

Yes, this would be the most direct/correct approach if we could set the sample.store.topic.min.insync.replicas of the CC sample topics. We will do so in Strimzi once the configuration is available in upstream CC.

What happens if the default min.insync.replicas would be set to 3 (that happens for some users running across 2 DCs only). Would Cruise Control break because it would never work?

Cruise Control logs errors like the following:

org.apache.kafka.common.errors.NotEnoughReplicasException: Messages are rejected since there are fewer in-sync replicas than required.                                                                                                    │
│ 2023-12-20 18:28:12 ERROR KafkaSampleStore:117 - Failed to produce partition metric sample for strimzi.cruisecontrol.partitionmetricsamples-13 of timestamp 1703096873456 due to exception    

but is still able to generate optimization proposals and execute partition rebalances.

Does it really make sense to pursue both this fix and the CC fix? Once CC has some default for it, would you need to change this again?

Depends on how critical the raised issue is, this PR is a temporary fix for it until:
(1) A sample.store.topic.min.insync.replicas config gets added upstream CC [1]
(2) We submit a PR to set this config in Strimzi
Once that new config is added for Strimzi, the code that sets:

CC's `sample.store.topic.replication.factor` ==  to Kafka's `default.replication.factor`

won't be executed or needed and could be removed.

Or how would the user choose between using the Kafka's default and the CC default?

CC's default takes top precedence, a user could set it themselves, if not, Strimzi would set it with a reasonable default based on the Kafka settings.

It does not seem to me like one is workaround of the other. But more like two different solutions from which we will need to pick one.

This PR is a temporary fix for the problem until we add a proper sample.store.topic.min.insync.replicas configuration upstream for CC's sample store topics.

Do we generate this file today? Or is this some missing part on Strimzi side?

Yes, we generate it on the Strimzi side

Out of curiosity ... why does Cruise Control not read it form the topics? kafka-topics.sh seems to be able to read it.

The CC code does attempt to read the min.insync.replicas config of individuals topics. If it is not set, it relies on the default min.insync.replicas config of the Kafka cluster (which is user provided via config/clusterConfig.json file which Strimzi generates). Is it possible to get the the Kafka cluster's default min.insync.replicas value from the topics too?

Ok, but with your patch it can still happen that the user sets RF 3 and minISR 1 at the cluster level, causing the same issue with flagging under under replicated partitions

With or without the patch, a user can configure their cluster in a way to induce under replicated partition issues. The patch is meant to reduce the chance of that happening for CC sample topics by following the cluster level configurations if CC level configurations are not set. With or without the current patch, under replicated partitions are flagged by CC, which is good. However, if we change the solution by hardcoding the minISR for CC at cluster level (via the Strimzi generated config/clusterConfig.json file), it is possible that under replicated partitions ** are not** flagged, which would be bad.

I think we should have the same defaults that CC has (RF 2, minISR 1) when .kafka.replicas < 3, and set stronger defaults (RF 3, minISR 2) when kafka.replicas >=3. Wdyt?

Sounds reasonable to me. When the sample.store.topic.min.insync.replicas config is added upstream [1] we could follow this guideline when configuring it in Strimzi.

[1] linkedin/cruise-control#2093

@fvaleri
Copy link
Contributor

fvaleri commented Dec 21, 2023

@kyguy thanks for the investigation and the RFE, much appreciated.

When the sample.store.topic.min.insync.replicas config is added upstream [1] we could follow this guideline when configuring it in Strimzi.

Why can't we do that right now and then update to make use of the new configuration? TBH, I don't like the idea that the setup of some internal topics that we create indirectly depends on the user's cluster configuration.

@kyguy
Copy link
Member Author

kyguy commented Dec 21, 2023

Why can't we do that right now and then update to make use of the new configuration?

Because we should not mess with the minISR for CC at a cluster level. We should only alter the minISR for CC at a topic level.

I don't like the idea that the setup of some internal topics that we create indirectly depends on the user's cluster configuration.

Any topic created without a topic level replication factor config defaults to Kafka cluster level default.replication.factor. All this PR does is configure the CC sample topics use the Kafka cluster level default.replication.factor if the rf is not set at the topic level (just like any other topic). It does this to avoid CC's hardcoded config. The replication factor of CC's sample store topic should not be hardcoded because it doesn't work for all Kafka clusters.

@fvaleri
Copy link
Contributor

fvaleri commented Dec 21, 2023

we should not mess with the minISR for CC at a cluster level

This is not what I was suggesting. We can just set a default value at the CC class level, as we already do for minISR.

The replication factor of CC's sample store topic should not be hardcoded because it doesn't work for all Kafka clusters.

I think setting RF=2 and minISR=1 as defaults for CC internal topics works for every cluster. If we want to do better, we can set RF=3 and minISR=2 when .kafka.replicas >= 3.

@kyguy
Copy link
Member Author

kyguy commented Dec 21, 2023

We can just set a default value at the CC class level, as we already do for minISR.

I am not sure I understand, how would we set the default value at the CC class level? Which minISR setting?

I think setting RF=2 and minISR=1 as defaults for CC internal topics works for every cluster. If we want to do better, we can set RF=3 and minISR=2 when .kafka.replicas >= 3.

How would we set default minISR as default for CC internal topics?

Copy link
Contributor

@fvaleri fvaleri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I see that min.insync.replicas in /tmp/clusterConfig.json is never used, because KafkaTopicConfigProvider is deprecated, and Cruise Control uses KafkaAdminTopicConfigProvider by default. If you agree, I can open a different PR to remove Cruise Control's cluster.configs.file configuration and related code.

linkedin/cruise-control#1622

This means that min.insync.replicas cluster default always wins, and the fix I had in mind can't work because of that. This also means that your temporary fix makes sense, until we have the new sample.store.topic.min.insync.replicas configuration.

@kyguy
Copy link
Member Author

kyguy commented Jan 2, 2024

Now I see that min.insync.replicas in /tmp/clusterConfig.json is never used, because KafkaTopicConfigProvider is deprecated, and Cruise Control uses KafkaAdminTopicConfigProvider by default. If you agree, I can open a different PR to remove Cruise Control's cluster.configs.file configuration and related code.

Makes sense to me, I hadn't realized the KafkaTopicConfigProvider was deprecated. Since Cruise Control now relies on the KafkaAdminTopicConfigProvider it is safe to remove the cluster.configs.file configuration and related code from Strimzi, let's do it in another PR

This means that min.insync.replicas cluster default always wins, and the fix I had in mind can't work because of that. This also means that your temporary fix makes sense, until we have the new sample.store.topic.min.insync.replicas configuration.

I think this is the best route to go for Strimzi for now, in the meantime, I'll work with upstream Cruise Control maintainers to add a sample.store.topic.min.insync.replicas configuration to upstream Cruise Control.

@kyguy
Copy link
Member Author

kyguy commented Jan 17, 2024

This PR is ready to go, it will take care of the problem safely in the short term while I work with CC maintainers on getting a more permanent solution merged upstream [1]

[1] linkedin/cruise-control#2093

Copy link
Member

@ppatierno ppatierno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking into account the Cruise Control community has't back to us after our comment on the issue, I think we should not wait for them and moving forward with this. We can always revert back if they will get our help to have the feature supported in CC itself.

@scholzj
Copy link
Member

scholzj commented Feb 9, 2024

Can ou please rebase this @kyguy?

Signed-off-by: Kyle Liberti <kliberti@redhat.com>
@scholzj
Copy link
Member

scholzj commented Feb 10, 2024

/azp run regression

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@scholzj scholzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thakns @kyguy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants