New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent broker scaledown if broker contains partition replicas - Part 1 #9042
Conversation
70e41de
to
00abbce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some nits. You should also consider adding some tests.
...er-operator/src/main/java/io/strimzi/operator/cluster/operator/assembly/KafkaReconciler.java
Outdated
Show resolved
Hide resolved
...src/main/java/io/strimzi/operator/cluster/operator/assembly/PreventBrokerScaleDownUtils.java
Outdated
Show resolved
Hide resolved
cluster-operator/src/main/java/io/strimzi/operator/cluster/model/KafkaCluster.java
Outdated
Show resolved
Hide resolved
operator-common/src/main/java/io/strimzi/operator/common/Annotations.java
Outdated
Show resolved
Hide resolved
...test/java/io/strimzi/operator/cluster/operator/assembly/PreventBrokerScaleDownUtilsTest.java
Outdated
Show resolved
Hide resolved
cluster-operator/src/main/java/io/strimzi/operator/cluster/model/KafkaPool.java
Show resolved
Hide resolved
...er-operator/src/main/java/io/strimzi/operator/cluster/operator/assembly/KafkaReconciler.java
Outdated
Show resolved
Hide resolved
...src/main/java/io/strimzi/operator/cluster/operator/assembly/PreventBrokerScaleDownUtils.java
Outdated
Show resolved
Hide resolved
...test/java/io/strimzi/operator/cluster/operator/assembly/PreventBrokerScaleDownUtilsTest.java
Outdated
Show resolved
Hide resolved
...test/java/io/strimzi/operator/cluster/operator/assembly/PreventBrokerScaleDownUtilsTest.java
Outdated
Show resolved
Hide resolved
...test/java/io/strimzi/operator/cluster/operator/assembly/PreventBrokerScaleDownUtilsTest.java
Outdated
Show resolved
Hide resolved
...test/java/io/strimzi/operator/cluster/operator/assembly/PreventBrokerScaleDownUtilsTest.java
Outdated
Show resolved
Hide resolved
...test/java/io/strimzi/operator/cluster/operator/assembly/PreventBrokerScaleDownUtilsTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should also adda note to the CHANGELOG.md abotu this.
@scholzj Hi, I have incorporated all the changes just mentioning the things in the changelog is left. Does this look good apart from that? |
...er-operator/src/main/java/io/strimzi/operator/cluster/operator/assembly/KafkaReconciler.java
Outdated
Show resolved
Hide resolved
...er-operator/src/main/java/io/strimzi/operator/cluster/operator/assembly/KafkaReconciler.java
Outdated
Show resolved
Hide resolved
...er-operator/src/main/java/io/strimzi/operator/cluster/operator/assembly/KafkaReconciler.java
Outdated
Show resolved
Hide resolved
...ain/java/io/strimzi/operator/cluster/operator/assembly/PreventBrokerScaleDownOperations.java
Outdated
Show resolved
Hide resolved
...ain/java/io/strimzi/operator/cluster/operator/assembly/PreventBrokerScaleDownOperations.java
Outdated
Show resolved
Hide resolved
...ain/java/io/strimzi/operator/cluster/operator/assembly/PreventBrokerScaleDownOperations.java
Outdated
Show resolved
Hide resolved
...ain/java/io/strimzi/operator/cluster/operator/assembly/PreventBrokerScaleDownOperations.java
Outdated
Show resolved
Hide resolved
...ain/java/io/strimzi/operator/cluster/operator/assembly/PreventBrokerScaleDownOperations.java
Outdated
Show resolved
Hide resolved
...test/java/io/strimzi/operator/cluster/operator/assembly/KafkaAssemblyOperatorPodSetTest.java
Outdated
Show resolved
Hide resolved
/** | ||
* Annotation used to bypass the broker scale-down mechanism | ||
*/ | ||
public static final String ANNO_STRIMZI_IO_SKIP_BROKER_SCALEDOWN_CHECK = STRIMZI_DOMAIN + "skip-broker-scaledown-check"; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for the future: Depending on what is merged first, this should reconcile with #9103 as this is a public annotation. (nothing to do right now, we just need to keep it in mind)
cc6a059
to
93de3af
Compare
...src/main/java/io/strimzi/operator/cluster/operator/assembly/PreventBrokerScaleDownCheck.java
Outdated
Show resolved
Hide resolved
...src/main/java/io/strimzi/operator/cluster/operator/assembly/PreventBrokerScaleDownCheck.java
Outdated
Show resolved
Hide resolved
...src/main/java/io/strimzi/operator/cluster/operator/assembly/PreventBrokerScaleDownCheck.java
Outdated
Show resolved
Hide resolved
Pushed the changes @scholzj. Just a doubt though -> |
I don't think the CHANGELOG should supply docs. There should be docs covering the feature and that should list the annotation. If you want, you could also mention it in some log message if you want. But in CHANGELOG it seems unnecessary. |
Okay, Thanks Jakub |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job 👍
/azp run regression |
Azure Pipelines successfully started running 1 pipeline(s). |
@ShubhamRwt Can you please rebase it and move the new annotation to the |
Sure @scholzj. I will do that |
@@ -367,6 +367,22 @@ public Set<NodeRef> nodes() { | |||
return nodes; | |||
} | |||
|
|||
/** | |||
* Generates list of references to Kafka nodes going to be removed from the Kafka cluster. The references contain both the pod name and | |||
* the ID of the Kafka node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this description wrong? I mean it's returning just ID not pod name, right?
return Future.succeededFuture(); | ||
} else { | ||
return brokerScaleDownOperations.canScaleDownBrokers(reconciliation, kafka.removedNodes(), secretOperator, adminClientProvider) | ||
.compose(s -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use a more meaningful name for variable s? :)
* @param adminClientProvider Used to create the Admin client instance | ||
* @param idsToBeRemoved Ids to be removed | ||
|
||
* @return returns a boolean future based on the outcome of the check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong return description
* @param reconciliation Reconciliation marker | ||
* @param secretOperator Secret operator for working with Secrets | ||
* @param adminClientProvider Used to create the Admin client instance | ||
* @param idsToBeRemoved Ids to be removed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a nit, can you list the parameters the same order as the method signature please?
final Future<Collection<TopicDescription>> descriptions; | ||
try { | ||
String bootstrapHostname = KafkaResources.bootstrapServiceName(reconciliation.name()) + "." + reconciliation.namespace() + ".svc:" + KafkaCluster.REPLICATION_PORT; | ||
LOGGER.debugCr(reconciliation, "Creating AdminClient for cluster {}/{}", reconciliation.namespace()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The debug command is looking for two parameters/placeholders but you are passing one, just the namespace.
} else { | ||
namesPromise.complete(names); | ||
} | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have some VertxUtils class method to go from a Kafka future to a Vertx one?
} else { | ||
descPromise.complete(tds.values()); | ||
} | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto as above?
...src/main/java/io/strimzi/operator/cluster/operator/assembly/PreventBrokerScaleDownCheck.java
Show resolved
Hide resolved
...src/main/java/io/strimzi/operator/cluster/operator/assembly/PreventBrokerScaleDownCheck.java
Show resolved
Hide resolved
Signed-off-by: ShubhamRwt <shubhamrwt02@gmail.com>
Signed-off-by: ShubhamRwt <shubhamrwt02@gmail.com>
9560c1a
to
27f2241
Compare
Signed-off-by: ShubhamRwt <shubhamrwt02@gmail.com>
/azp run regression |
Azure Pipelines successfully started running 1 pipeline(s). |
@ShubhamRwt @henryZrncik It still seems to be failing ... is there one more place where it needs to be fixed? :-/
|
I tested it a bit and yes looks like it passes when I disable the broker scale down check |
Would it be fine @scholzj @henryZrncik to modify this issue -> #9134 itself and add this failing test into the issue description instead of creating separate issue? |
@ShubhamRwt I guess it's not good practice to disable all tests (checks) that are failing. The one check which is now disabled makes sense, but this test - |
Given test does not contain something which should cause such error. (test itself is currently passing so it really is something caused by this change. This is basically the only test which included node-pools and any kind of scaling and was part of regression which was run. So i think it is worth investigate how scaling up/down works with reggard to node-pools & this PR. I am writing this mainly because this failure occurres in scaling up (there is nothing that should prevent it). As @im-konge mentioned, Disabling first test was good choice as we mentioned here that it need to be addressed after adding this new funcitonality, but this test is something which should work even after change. |
Let's see if there are more tests failing with NodePools. |
/azp run feature-gates-regression |
Azure Pipelines successfully started running 1 pipeline(s). |
@ShubhamRwt is not disabling these tests, or? He is disabling his check.
@henryZrncik Are you really sure the test is correct today? I suspect it is not it - it is likely passing mostly by luck. You scale down broker 6 which might contain partition replicas without rescheduling them to other brokers. So I think the test is at fault, not Shubham's code. |
I miss-read it, also I thought that the test is correct, but you mentioned that it works mostly out of luck. Anyway, if it's a test issue and we should update/fix the test, @ShubhamRwt can disable the check for now. |
Signed-off-by: ShubhamRwt <shubhamrwt02@gmail.com>
I have now disabled the broker scale down check for the failing test. |
/azp run regression |
Azure Pipelines successfully started running 1 pipeline(s). |
@@ -229,7 +229,7 @@ void testKafkaNodePoolBrokerIdsManagementUsingAnnotations(ExtensionContext exten | |||
|
|||
Kafka kafka = KafkaTemplates.kafkaPersistent(testStorage.getClusterName(), 1, 1) | |||
.editOrNewMetadata() | |||
.addToAnnotations(Annotations.ANNO_STRIMZI_IO_NODE_POOLS, "enabled") | |||
.addToAnnotations(Map.of(Annotations.ANNO_STRIMZI_IO_NODE_POOLS, "enabled", Annotations.ANNO_STRIMZI_IO_SKIP_BROKER_SCALEDOWN_CHECK, "true")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe adding comment here would be helpful for the one who will fix those two tests, thanks! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if I should mention about this failing test in this issue #9134 itself? I have left a comment about this issue already in the other failing test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is a good idea, We can change these test at the same time anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I will update the issue then, Thanks @henryZrncik
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Signed-off-by: ShubhamRwt <shubhamrwt02@gmail.com>
/azp run regression |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run feature-gates-regression |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run kraft-regression |
Azure Pipelines successfully started running 1 pipeline(s). |
Type of change
Select the type of your PR
Description
This PR introduces the mechanism to stop the scale down if the broker contains partition replicas. If the partition replicas are present on the broker then scale down will be halted and we fail the reconciliation. In the second part of this work, we plan to revert the replicas instead of just failing the reconciliation.
Checklist
Please go through this checklist and make sure all applicable tasks have been done