New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow reconciliation to continue when scale-down is blocked due to brokers still in use #9585
Conversation
@ppatierno Would you mind having a quick look if the solution for this makes sense to you before I write all the tests etc.? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overall logic seems to be reasonable to me. I left a couple of questions.
I will start crying about conflicts I see with my migration PR later ...
...perator/src/main/java/io/strimzi/operator/cluster/operator/assembly/KafkaClusterCreator.java
Show resolved
Hide resolved
...perator/src/main/java/io/strimzi/operator/cluster/operator/assembly/KafkaClusterCreator.java
Outdated
Show resolved
Hide resolved
...perator/src/main/java/io/strimzi/operator/cluster/operator/assembly/KafkaClusterCreator.java
Show resolved
Hide resolved
/azp run regression |
Azure Pipelines successfully started running 1 pipeline(s). |
…okers still in use - Closes strimzi#9232 Signed-off-by: Jakub Scholz <www@scholzj.com>
418fe7d
to
ff5eb9f
Compare
Signed-off-by: Jakub Scholz <www@scholzj.com>
/azp run regression |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run feature-gates-regression |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run kraft-regression |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I saw multiple TODO which should be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks
Signed-off-by: Jakub Scholz <www@scholzj.com>
/azp run regression |
Azure Pipelines successfully started running 1 pipeline(s). |
…okers still in use (strimzi#9585) Signed-off-by: John McPeek <juan.mcpeek@gmail.com>
…okers still in use (strimzi#9585) Signed-off-by: Jakub Scholz <www@scholzj.com>
Type of change
Description
Currently, when a user tries to scale down the Kafka cluster but the brokers that would be removed are still in use, we just throw an exception and fail the reconciliation. This works fine in general. But it makes it hard to build any automation on top of it since it essentially blocks any other operation tasks. So for example, if use by mistake scales down the Kafka cluster, the operator starts throwing errors and will not proceed with for example renewing certificates.
This PR tries to address it by reverting the scale-down changes and allowing the reconciliation to proceed as usual. It takes a pragmatic approach and reverts all scale-downs even if some of the brokers would be empty and it would be possible to shut them down. Since Kafka has no fencing, this could mean the brokers that were empty get new topics assigned. But it helps to simplify the logic significantly because it does not need to deal with situations such as when user tries to scale down brokers 3 and 4 where broker 3 is empty and broker 4 is still in use. In such case, we cannot simply correct the replication factor by 1 as that would mean broker 3 (the empty one) is kept and broker 4 is removed => so it will be still blocked with an exception.
From the technical perspective, this PR introduces a new class
KafkaClusterCreator
that contains the logic. This class is used to:KafkaCluster.fromCrd(...)
method).The
KafkaClusterCreator
is used by theKafkaAssemblyOperator
and the createdKafkaCluster
instance is used to create theKafkaRecocniler
. I considered some other options as well, but rejected them at the end:KafkaReconciler
=> I rejected it because it seemed to push a lot of logic into the constructor and also had issues with asynchronous code. The advantage of the current approach is that it keeps the constructor clean with only assignments of parameters to object fields.KafkaReconciler
. This would work quite nicely. But it would mean that theKafkaCluster
object is not final in theKafkaReconciler
class. That seemed like something that can be tricky to maintain in the future and is something to avoid.KafkaCluster
object and just modifying the node assignments inside it. This would probably be easier for the initial implementation. But it might be harder to maintain as it would mean the fields inside theKafkaCluster
will not be final, will be subject to change and we will need to be careful what parts are changed and where and when are the used. Creating a brand newKafkaCluster
object seemed like a better approach.This is currently a draft sued for a preliminary review of the code. It does not contain any new tests and it does not update existing tests to the new code.
This should resolve #9232.
Checklist