-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add JBOD support to KRaft mode #9936
Conversation
/azp run regression |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run kraft-regression |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run kraft-regression |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run migration |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run migration |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. I left a few suggestions.
api/src/main/java/io/strimzi/api/kafka/model/kafka/EphemeralStorage.java
Outdated
Show resolved
Hide resolved
api/src/main/java/io/strimzi/api/kafka/model/kafka/EphemeralStorage.java
Outdated
Show resolved
Hide resolved
api/src/main/java/io/strimzi/api/kafka/model/kafka/PersistentClaimStorage.java
Outdated
Show resolved
Hide resolved
api/src/main/java/io/strimzi/api/kafka/model/kafka/PersistentClaimStorage.java
Outdated
Show resolved
Hide resolved
api/src/main/java/io/strimzi/api/kafka/model/kafka/SingleVolumeStorage.java
Outdated
Show resolved
Hide resolved
documentation/modules/operators/ref-operator-cluster-feature-gates.adoc
Outdated
Show resolved
Hide resolved
0b59e0d
to
4503020
Compare
/azp run migration |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run kraft-regression |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run regression |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a first pass adding some comments. I will have a run before adding more feedback (if any).
api/src/main/java/io/strimzi/api/kafka/model/kafka/KRaftMetadataStorage.java
Outdated
Show resolved
Hide resolved
cluster-operator/src/main/java/io/strimzi/operator/cluster/model/nodepools/NodePoolUtils.java
Show resolved
Hide resolved
cluster-operator/src/test/java/io/strimzi/operator/cluster/model/VolumeUtilsTest.java
Show resolved
Hide resolved
Signed-off-by: Jakub Scholz <www@scholzj.com>
Signed-off-by: Jakub Scholz <www@scholzj.com>
Signed-off-by: Jakub Scholz <www@scholzj.com>
Signed-off-by: Jakub Scholz <www@scholzj.com>
Signed-off-by: Jakub Scholz <www@scholzj.com>
4503020
to
42149d3
Compare
...emtest/src/test/java/io/strimzi/systemtest/rollingupdate/AlternativeReconcileTriggersST.java
Outdated
Show resolved
Hide resolved
systemtest/src/test/java/io/strimzi/systemtest/kafka/KafkaST.java
Outdated
Show resolved
Hide resolved
systemtest/src/test/java/io/strimzi/systemtest/kafka/KafkaST.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Jakub Scholz <www@scholzj.com>
@scholzj while testing this from a KRaft migration perspective I saw the following issue ... KRAFT_LOG_DIR=$(grep "log\.dirs=" /tmp/strimzi.properties | sed "s/log\.dirs=*//")
# when in ZooKeeper mode, the __cluster_metadata folder should not exist.
# if it does, it means a KRaft migration rollback is ongoing and it has to be removed.
if [ -d "$KRAFT_LOG_DIR/__cluster_metadata-0" ]; then
echo "Removing __cluster_metadata folder"
rm -rf "$KRAFT_LOG_DIR/__cluster_metadata-0"
fi Of course it can't work when the # when in ZooKeeper mode, the __cluster_metadata folder should not exist.
# if it does, it means a KRaft migration rollback is ongoing and it has to be removed.
# also checking that metadata state is ZK (0), because if it's MIGRATION (2) it means we are rolling back but not finalized yet and KRaft quorum is still in place.
CURRENT_KRAFT_METADATA_LOG_DIR=$(ls -d /var/lib/kafka/data-*/kafka-log"$STRIMZI_BROKER_ID"/__cluster_metadata-0 2> /dev/null || true)
if [[ -d "$CURRENT_KRAFT_METADATA_LOG_DIR" ]] && [ "$STRIMZI_KAFKA_METADATA_CONFIG_STATE" -eq 0 ]; then
echo "Removing __cluster_metadata folder"
rm -rf "$CURRENT_KRAFT_METADATA_LOG_DIR"
fi Deleting the The I also think that the check is testing a "wrong" folder, maybe because the tests are not using JBOD at all (even with just one disk) but the persistent storage so the path is just @im-konge I think the test should be fixed by using JBOD and even with multiple disks support when this PR is merged. |
@ppatierno Can you maybe comment on some exact parts of the code? Because the comment is quite confusing and it is not clear to me what parts are referring to what code etc. Also please keep in mind that migration with JBOD has a separate task and this PR does not really intend to enable migration with JBOD in any way (I assume there are some checks etc.). |
I know but this PR is enabling JBOD disks in KRaft and migration rollback is not going to work from this perspective anymore.
The check about not allowing migration with multiple JBOD disk in 0.40.0 release relies on the The Kafka cluster my-cluster is invalid: [Using more than one disk in a JBOD storage is currently not supported when the UseKRaft feature gate is enabled (in KafkaNodePool kafka)] Now that validation additional check So the current PR allows the migration with multiple JBOD disks and it works fine but has a problem on the rollback which I described. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As already written, I can fix the migration rollback with multiple JBOD disks in a different PR so this one LGTM.
/azp run kraft-regression |
Azure Pipelines successfully started running 1 pipeline(s). |
Type of change
Description
This PR adds JBOD support to KRaft-based Apache Kafka cluster. It is implemented based on the Strimzi Proposal SP#67.
Some notable comments about the implementation:
StorageDiff
class. While it does not fit into it based on how it is done, having it in the same class allows us to reject this change while continuing the reconciliation process instead of just throwing an exception.This should resolve #9437.
Checklist