Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add JBOD support to KRaft mode #9936

Merged
merged 6 commits into from
Apr 16, 2024
Merged

Conversation

scholzj
Copy link
Member

@scholzj scholzj commented Apr 9, 2024

Type of change

  • Enhancement / new feature

Description

This PR adds JBOD support to KRaft-based Apache Kafka cluster. It is implemented based on the Strimzi Proposal SP#67.

Some notable comments about the implementation:

  • It does not use an annotation to mark what volume is currently used for the KRaft metadata. It instead detects it from the existing storage configuration stored in an annotation already.
  • The check if multiple volumes are marked for KRaft metadata is done in the StorageDiff class. While it does not fit into it based on how it is done, having it in the same class allows us to reject this change while continuing the reconciliation process instead of just throwing an exception.
  • KRaft examples are updates as discussed in the proposal
  • A check for the Kafka version was not explicitly mentioned in the proposal but is implemented in this PR to prevent users from deploying Kafka 3.6.x clusters with JBOD by mistake (unfortunately, this caused lot of changes as a new field describing the Kafka version had to be passed into the validation, but I believe it is worth it)
  • Issue [ST] Adapt JBOD based system tests to KRaft #9938 was opened to track the JBOD-based STs that have not yet been adapted to KRaft

This should resolve #9437.

Checklist

  • Write tests
  • Make sure all tests pass
  • Update documentation
  • Try your changes from Pod inside your Kubernetes and OpenShift cluster, not just locally
  • Reference relevant issue(s) and close them after merging
  • Update CHANGELOG.md

@scholzj scholzj added this to the 0.41.0 milestone Apr 9, 2024
@scholzj
Copy link
Member Author

scholzj commented Apr 9, 2024

/azp run regression

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@scholzj
Copy link
Member Author

scholzj commented Apr 9, 2024

/azp run kraft-regression

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@scholzj
Copy link
Member Author

scholzj commented Apr 10, 2024

/azp run kraft-regression

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@scholzj
Copy link
Member Author

scholzj commented Apr 10, 2024

/azp run migration

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@scholzj
Copy link
Member Author

scholzj commented Apr 10, 2024

/azp run migration

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@PaulRMellor PaulRMellor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. I left a few suggestions.

@scholzj
Copy link
Member Author

scholzj commented Apr 11, 2024

/azp run migration

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@scholzj
Copy link
Member Author

scholzj commented Apr 11, 2024

/azp run kraft-regression

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@scholzj
Copy link
Member Author

scholzj commented Apr 11, 2024

/azp run regression

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@ppatierno ppatierno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a first pass adding some comments. I will have a run before adding more feedback (if any).

Signed-off-by: Jakub Scholz <www@scholzj.com>
Signed-off-by: Jakub Scholz <www@scholzj.com>
Signed-off-by: Jakub Scholz <www@scholzj.com>
Signed-off-by: Jakub Scholz <www@scholzj.com>
Signed-off-by: Jakub Scholz <www@scholzj.com>
Signed-off-by: Jakub Scholz <www@scholzj.com>
@ppatierno
Copy link
Member

@scholzj while testing this from a KRaft migration perspective I saw the following issue ...
With the current 0.40.0 (but not supporting more than one JBOD disk in KRaft of course) when you are rolling back, the __cluster_metadata folder is deleted when you back to ZooKeeper. It's happening with the following snipped of code in the kafka_run script when "no KRaft" is detected (see Using KRaft [false] log).

KRAFT_LOG_DIR=$(grep "log\.dirs=" /tmp/strimzi.properties | sed "s/log\.dirs=*//")

  # when in ZooKeeper mode, the __cluster_metadata folder should not exist.
  # if it does, it means a KRaft migration rollback is ongoing and it has to be removed.
  if [ -d "$KRAFT_LOG_DIR/__cluster_metadata-0" ]; then
    echo "Removing __cluster_metadata folder"
    rm -rf "$KRAFT_LOG_DIR/__cluster_metadata-0"
  fi

Of course it can't work when the log.dirs is set with multiple disks, so the check on the existence of a folder fails (because it's not just a single folder but a list).
So I have locally that part changed this way ...

# when in ZooKeeper mode, the __cluster_metadata folder should not exist.
 # if it does, it means a KRaft migration rollback is ongoing and it has to be removed.
 # also checking that metadata state is ZK (0), because if it's MIGRATION (2) it means we are rolling back but not finalized yet and KRaft quorum is still in place.
  CURRENT_KRAFT_METADATA_LOG_DIR=$(ls -d /var/lib/kafka/data-*/kafka-log"$STRIMZI_BROKER_ID"/__cluster_metadata-0 2> /dev/null || true)
  if [[ -d "$CURRENT_KRAFT_METADATA_LOG_DIR" ]] && [ "$STRIMZI_KAFKA_METADATA_CONFIG_STATE" -eq 0 ]; then
    echo "Removing __cluster_metadata folder"
    rm -rf "$CURRENT_KRAFT_METADATA_LOG_DIR"
  fi

Deleting the __cluster_metadata at the end of the rollback also helps us to hit the following KAFKA-16463 which is not fixed yet and needs that metadata folder being deleted before restarting a migration again.

The MigrationST wasn't able to catch this because it doesn't support multiple JBOD disk yet so the check that __cluster_metadata was deleted is hard coded this way.

https://github.com/strimzi/strimzi-kafka-operator/blob/main/systemtest/src/test/java/io/strimzi/systemtest/migration/MigrationST.java#L684

I also think that the check is testing a "wrong" folder, maybe because the tests are not using JBOD at all (even with just one disk) but the persistent storage so the path is just /var/lib/kafka/data/.

@im-konge I think the test should be fixed by using JBOD and even with multiple disks support when this PR is merged.

@scholzj
Copy link
Member Author

scholzj commented Apr 16, 2024

@ppatierno Can you maybe comment on some exact parts of the code? Because the comment is quite confusing and it is not clear to me what parts are referring to what code etc.

Also please keep in mind that migration with JBOD has a separate task and this PR does not really intend to enable migration with JBOD in any way (I assume there are some checks etc.).

@ppatierno
Copy link
Member

ppatierno commented Apr 16, 2024

Also please keep in mind that migration with JBOD has a separate task and this PR does not really intend to enable migration with JBOD in any way

I know but this PR is enabling JBOD disks in KRaft and migration rollback is not going to work from this perspective anymore.
I am anyway fine to have the migration (rollback) back to work with a different PR after this was is merged.
Maybe it's more understandable.

(I assume there are some checks etc.).

The check about not allowing migration with multiple JBOD disk in 0.40.0 release relies on the NodePoolUtils.validateNodePools calling validateKRaftJbodStorage that was changed by this PR. The 0.40.0 returns the following error ...

The Kafka cluster my-cluster is invalid: [Using more than one disk in a JBOD storage is currently not supported when the UseKRaft feature gate is enabled (in KafkaNodePool kafka)]                                                       

Now that validation additional check KafkaVersion.compareDottedVersions(versionChange.to().version(), "3.7.0") < 0 "broke" that validation which passes.

So the current PR allows the migration with multiple JBOD disks and it works fine but has a problem on the rollback which I described.
As I said, we can leave this PR as it is and I will fix migration rollback in a different PR.

Copy link
Member

@ppatierno ppatierno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As already written, I can fix the migration rollback with multiple JBOD disks in a different PR so this one LGTM.

@scholzj
Copy link
Member Author

scholzj commented Apr 16, 2024

/azp run kraft-regression

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@scholzj scholzj merged commit 5aacdfa into strimzi:main Apr 16, 2024
21 checks passed
@scholzj scholzj deleted the add-jbod-support-to-kraft branch April 16, 2024 23:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[KRaft] Add support for JBOD storage in KRaft mode
4 participants