Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for KRaft to KRaft upgrades #9393

Merged
merged 4 commits into from
Nov 29, 2023

Conversation

scholzj
Copy link
Member

@scholzj scholzj commented Nov 24, 2023

Type of change

  • Enhancement / new feature

Description

This PR delivers the second part of the KRaft-to-KRaft upgrades according to the [SP#61|https://github.com/strimzi/proposals/blob/main/061-kraft-upgrades-and-downgrades.md]:

  • It adds the KRaft metadata version to the KafkaVersionChange class
  • It renames VersionChangeCreator to ZooKeeperVersionChangeCreator. It keeps being used for ZooKeeper based clusters without any major changes. That minimizes possible impact on the existing Zoo-based clusters.
  • Creates new KRaftVersionChangeCreator that helps to create the KafkaVersionChange instance for KRaft clusters following the proposal and a code similar to what we use for ZooKeeper
  • Creates new interface VersionChangeCreator that is shared between KRaftVersionChangeCreator and ZooKeeperVersionChangeCreator to make the handling of them in KafkaAssemblyOperator easier.
  • Improves ResourceUtils for our unit tests in order to makeit possible to re-use the default Kafka Admin client mock and add additional mocking to it.

Documentation will be updated in a separate PR.

There is a space to improve the KafkaVersionChange class and how it is used in KafkaReconciler. But I decided to not do it in this PR to make it easier to review. And might get back to refactor it later in a follow-up PR.

This PR also contains unit tests and was tested manually as well. The current testing possibilities are limited:

  • Missing support for rolling KRaft controller-only nodes. The upgrade unit test should be changed to use a separate controller and broker pools once supported in the KafkaRoller class. Controller-only nodes were tested manually by deleting the pods.
  • Only after Strimzi 0.39 and Kafka 3.7.0 are released, we will be able to test the proper operator upgrades

But these limitations should not prevent reviews / and merging this.

Checklist

  • Write tests
  • Make sure all tests pass
  • Try your changes from Pod inside your Kubernetes and OpenShift cluster, not just locally
  • Update CHANGELOG.md

@scholzj scholzj added this to the 0.39.0 milestone Nov 24, 2023
@scholzj
Copy link
Member Author

scholzj commented Nov 24, 2023

/azp run kraft-regression

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@scholzj
Copy link
Member Author

scholzj commented Nov 25, 2023

/azp run upgrade

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@scholzj
Copy link
Member Author

scholzj commented Nov 25, 2023

/azp run regression

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@scholzj
Copy link
Member Author

scholzj commented Nov 25, 2023

/azp run feature-gates-regression

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Jakub Scholz <www@scholzj.com>
@scholzj scholzj force-pushed the add-support-for-kraft-to-kraft-upgrades branch from 0b42bb2 to a52a83b Compare November 27, 2023 15:20
@scholzj
Copy link
Member Author

scholzj commented Nov 27, 2023

/azp run upgrade

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@scholzj
Copy link
Member Author

scholzj commented Nov 27, 2023

/azp run kraft-regression

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@scholzj
Copy link
Member Author

scholzj commented Nov 28, 2023

/azp run regression

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@katheris katheris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, only a couple of small suggestions

private Future<List<Pod>> getPods() {
Labels selectorLabels = Labels.forStrimziKind(Kafka.RESOURCE_KIND)
.withStrimziCluster(reconciliation.name())
.withStrimziName(KafkaResources.kafkaStatefulSetName(reconciliation.name()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess at some point we should refactor this method to be KafkaResources.kafkaStrimziPodSetName since statefulsets will eventually be removed from Strimzi entirely.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a fair point. I might open some issue for it. But it is a bit more complicated than that:

  • It is still used for some migration and upgrade purposes as StatefulSet name
  • It is actually not a StrimziPodSet name as with node pools it is named after the node pools
  • It is a label used to mark all the Kafka pods regardless of the pool they belong to.

So it is a bit of a chaos :-(

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

//////////

@Test
public void testMetadataVersionAtNoChange() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could potentially be converted to a parametrized test to make it easier to parse the different permutations that we are testing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. I guess possibly it could. But I'm not sure if it would really help with debugging and readability. To be honest, I do not like parametrized tests too much.

Signed-off-by: Jakub Scholz <www@scholzj.com>
Copy link
Member

@ppatierno ppatierno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just two really minor things.

Signed-off-by: Jakub Scholz <www@scholzj.com>
@fvaleri fvaleri self-requested a review November 28, 2023 16:03
Copy link
Contributor

@fvaleri fvaleri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this feature. I left some comments.

I also manually tried to do a safe downgrade that I know is possible (3.5-IV1 -> 3.5-IV0). It worked fine, but logs sometimes use the featureLevel (i.e. 10), and sometimes use release-subVersion (i.e. 3.5-IV1), which can be confusing for users. I think we should stick with the release-subVersion.

2023-11-28 16:34:58 INFO  AbstractOperator:511 - Reconciliation #46(watch) Kafka(test/my-cluster): Kafka my-cluster in namespace test was MODIFIED
2023-11-28 16:34:58 INFO  AbstractOperator:265 - Reconciliation #46(watch) Kafka(test/my-cluster): Kafka my-cluster will be checked for creation or modification
2023-11-28 16:34:59 INFO  KRaftMetadataManager:168 - Reconciliation #46(watch) Kafka(test/my-cluster): Updating metadata version from 10 to 9
2023-11-28 16:34:59 INFO  KRaftMetadataManager:118 - Reconciliation #46(watch) Kafka(test/my-cluster): Successfully updated metadata version to 3.5-IV0
2023-11-28 16:34:59 INFO  CrdOperator:122 - Reconciliation #46(watch) Kafka(test/my-cluster): Status of Kafka my-cluster in namespace test has been updated
2023-11-28 16:34:59 INFO  AbstractOperator:511 - Reconciliation #47(watch) Kafka(test/my-cluster): Kafka my-cluster in namespace test was MODIFIED
2023-11-28 16:34:59 INFO  AbstractOperator:265 - Reconciliation #47(watch) Kafka(test/my-cluster): Kafka my-cluster will be checked for creation or modification
2023-11-28 16:34:59 INFO  AbstractOperator:537 - Reconciliation #46(watch) Kafka(test/my-cluster): reconciled

@@ -10,6 +10,7 @@
If needed, `UnidirectionalTopicOperator` can be disabled in the feature gates configuration in the Cluster Operator.
* Improved Kafka Connect metrics and dashboard example files
* Allow specifying and managing KRaft metadata version
* Add support for KRaft to KRaft upgrades (Apache Kafka upgrades for the KRaft based clusters)
Copy link
Contributor

@fvaleri fvaleri Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add the min supported Strimzi version in this note (the first release using kafkaVersion pod annotation)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean with min supported Strimzi version?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We handle the case where pods are there but without kafka-version annotation, which results in KafkaUpgradeException. So I think adding a note to avoid that experience would be good (for users that actually read release notes and documentation).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm still not sure what is it you are asking for.

Signed-off-by: Jakub Scholz <www@scholzj.com>
@scholzj
Copy link
Member Author

scholzj commented Nov 28, 2023

logs sometimes use the featureLevel (i.e. 10), and sometimes use release-subVersion (i.e. 3.5-IV1), which can be confusing for users.

@fvaleri You are right, this could be confusing. I opened a separate PR to fix it as it is not really related to this PR.

Copy link
Contributor

@fvaleri fvaleri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks.

@scholzj scholzj merged commit 5427403 into strimzi:main Nov 29, 2023
13 checks passed
@scholzj scholzj deleted the add-support-for-kraft-to-kraft-upgrades branch November 29, 2023 10:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants