Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFE: Add default min.insync.replicas config for sample store topics #2093

Open
kyguy opened this issue Dec 19, 2023 · 8 comments · May be fixed by #2098
Open

RFE: Add default min.insync.replicas config for sample store topics #2093

kyguy opened this issue Dec 19, 2023 · 8 comments · May be fixed by #2098
Labels
functionality A feature request.

Comments

@kyguy
Copy link
Contributor

kyguy commented Dec 19, 2023

The default replication factor for sample store topics (set by sample.store.topic.replication.factor in the Cruise Control configuration) is 2. Cruise Control also relies on the min.insync.replicas setting of the Kafka cluster it is managing for these sample topics. Therefore, when a Kafka cluster's min.insync.replicas field is set to a value greater than 1 the sample topics could easily become under replicated. To avoid issues such as these, would it make sense to provide a min.insync.replicas config for sample store topics? e.g.

sample.store.topic.min.insync.replicas

in the same manner we do for the metrics reporter topic, cruise.control.metrics.topic.min.insync.replicas

If it sounds like a reasonable request, I am happy to contribute it myself!

@CCisGG
Copy link
Contributor

CCisGG commented Jan 3, 2024

Hi @kyguy, thanks for raising the issue. I have a question here:

Therefore, when a Kafka cluster's min.insync.replicas field is set to a value greater than 1 the sample topics could easily become under replicated.
In that case, why not just increase the config value of sample.store.topic.replication.factor (to 4 or 5 for example)?

Also I wonder if the self-healing would fix the RF of the topic automatically no matter what it is set to.

@kyguy
Copy link
Contributor Author

kyguy commented Jan 3, 2024

In that case, why not just increase the config value of sample.store.topic.replication.factor (to 4 or 5 for example)?

This solution works but it requires the sample.store.topic.replication.factor to be set to a value greater than or equal to the cluster's min.insync.replicas. So users that care about the overhead of hosting the additional replicas are at the mercy of the cluster's min.insync.replicas value.

Also I wonder if the self-healing would fix the RF of the topic automatically no matter what it is set to.

I think it would but it would require the self-healing to be enabled for the CC instance

@mhratson
Copy link
Contributor

mhratson commented Jan 5, 2024

field is set to a value greater than 1 the sample topics could easily become under replicated. To avoid issues such as these,

Hey @kyguy Could you expand why temporarily getting under replicated is an issue?

@kyguy
Copy link
Contributor Author

kyguy commented Jan 5, 2024

Could you expand why temporarily getting under replicated is an issue?

Even if the availability of the sample store topics is not an issue by itself, the warning/error messages related to under replication in application logs can be a nuisance, raising false alarm.

@mhratson
Copy link
Contributor

mhratson commented Jan 8, 2024

logs can be a nuisance, raising false alarm

While i understand the motivation it seems that suppressing the false alarms may be the better solution here than adding configuration that only purpose is to silence false alarms.

While i'm not against it it feels like there must be a better reason to add those configuration settings…

@kyguy
Copy link
Contributor Author

kyguy commented Jan 9, 2024

While i understand the motivation it seems that suppressing the false alarms may be the better solution here than adding configuration that only purpose is to silence false alarms.

How would a user suppress this alarm? From what I understand without a sample store specific sample.store.min.insync.replicas configuration users would need to create the sample store topics manually with the desired min insync replicas config before Cruise Control started or while Cruise Control was stopped. A sample.store.min.insync.replicas config would help users avoid this manual intervention and empower them to configure their sample store topics correctly

@kyguy
Copy link
Contributor Author

kyguy commented Jan 25, 2024

Hey @mhratson just wanted to follow up, would the reason in the previous comment suffice as a justification to add this config?

@ppatierno
Copy link
Contributor

@mhratson while suppressing the false alarm sounds a good reason for not allowing the user to set the min ISR on the samples topics, I think that what @kyguy is asking for is totally reasonable.
Let me elaborate a little bit more ...
We know that Cruise Control has a "hard coded" RF of 2 for the samples store topics but ... it's getting the min ISR from the Kafka configuration, right? You cannot expect that all users are setting min ISR to 1 cluster wide in order to guarantee availability of the samples store topics when a broker goes down. A user should be able to configure the min ISR on samples store topics with the value which makes sense for them (i.e they could use mostly topics with RF = 3, so they want min ISR = 2 cluster wide).
I also understand that temporary availability of the samples store topics is not a problem, but let's imagine a scenario where you have some automation in place. We work on the Strimzi project and the component which is in charge of rolling the brokers when some configuration changes, for example, doesn't roll a broker when it finds out that rolling that broker will make one (or more) topic(s) unavailable. That's exactly what's happening with the Cruise Control samples store topics here.
What's your concern about making the min ISR configurable, as it already allowed for the metrics reporter topic?
Any big issues or drawbacks I don't see here?

@CCisGG CCisGG added the functionality A feature request. label Feb 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
functionality A feature request.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants