Add support for Kafka 3.7.0 and remove Kafka 3.5.2 #9747

scholzj · 2024-02-27T00:47:50Z

Type of change

Select the type of your PR

Bugfix
Enhancement / new feature
Refactoring
Documentation

Description

This PR adds support for Kafka 3.7.0 and removes support for Kafka 3.5.2. It also uses Kafka 3.7.0 in the operators. It also updates Cruise Control and Snappy (to keep it in sync with what is used in Kafka).

Checklist

Write tests
Make sure all tests pass
Update documentation
Try your changes from Pod inside your Kubernetes and OpenShift cluster, not just locally
Update CHANGELOG.md

Signed-off-by: Jakub Scholz <www@scholzj.com>

cluster-operator/src/test/java/io/strimzi/operator/cluster/KafkaVersionTestUtils.java

Signed-off-by: Jakub Scholz <www@scholzj.com>

scholzj · 2024-02-27T10:22:06Z

/azp run regression

azure-pipelines · 2024-02-27T10:22:18Z

Azure Pipelines successfully started running 1 pipeline(s).

scholzj · 2024-02-27T10:22:25Z

/azp run upgrade

azure-pipelines · 2024-02-27T10:22:40Z

Azure Pipelines successfully started running 1 pipeline(s).

scholzj · 2024-02-27T15:07:36Z

/azp run kraft-regression

azure-pipelines · 2024-02-27T15:07:47Z

Azure Pipelines successfully started running 1 pipeline(s).

scholzj · 2024-02-27T19:48:01Z

/azp run feature-gates-regression

azure-pipelines · 2024-02-27T19:48:13Z

Azure Pipelines successfully started running 1 pipeline(s).

blaghed · 2024-04-24T14:32:20Z

Hi @scholzj ,

Why have this incompatibility existing at all?

Before it was checking that only level was returned in the output, which is weirdly restrictive:

for (var e : fetchedLoggers.entrySet()) {
    String level = e.getValue().get("level");
    if (level != null && e.getValue().size() == 1) {
        loggerMap.put(e.getKey(), level);
    } else {
        result.tryFail("Inner map has unexpected keys " + e.getValue().keySet());
        break;
    }
}

Now it checks that only level and last_modified are returned in the output, though the code proceeds to only care about the level:

for (var e : fetchedLoggers.entrySet()) {
    if (Set.of("level", "last_modified").containsAll(e.getValue().keySet()))   {
        String level = e.getValue().get("level");
        if (level != null) {
            loggerMap.put(e.getKey(), level);
        }
    } else {
        result.tryFail(new RuntimeException("Inner map has unexpected keys " + e.getValue().keySet()));
        break;
    }
}

But I don't understand why the need exists to create this incompatibility between Kafka 3.5.x and 3.7.x at all. Can't the operator simply get the level and ignore the rest?

scholzj · 2024-04-24T14:37:58Z

@blaghed I'm not sure I follow the question. I think it is fairly common to protect the code from unexpected data being returned. It helps to detect changes that might need to be analyzes and handled in the code.

blaghed · 2024-04-24T14:46:06Z

I can understand that reasoning, but in this case you are having the code consider any addition to the JSON structure to be a breaking change, without the necessity to do so.
Shouldn't you simply care that level exists and is returned? If Kafka decides to add some other fields in the future should be irrelevant to this code.

scholzj · 2024-04-24T15:05:12Z

Well, you do not know how significant the next fields will be. So I'm not sure I see this as a problem.

blaghed · 2024-04-24T15:23:46Z

As is, you are making the Cluster Operator tightly bound to the Kafka version use on KafkaConnect, without the actual need to.

Even if a future Kafka version were to add some fields that you would suddenly find significant on this class, why is that a reason to have the current version of the Operator break on it?

If that is not enough to persuade you, then alternatively see this from a JSON perspective:
With Kafka 3.5:

{
  "org.apache.kafka.connect": {
    "level": "WARN"
  }
}

With Kafka 3.7:

{
  "org.apache.kafka.connect": {
    "level": "WARN",
    "last_modified": null
  }
}

Adding last_modified is not a breaking change, because JSON parsers are able to cope with the additional field by ignoring it.
However, the current code here code does a "strict syntax" check on it, even though you have absolutely no need for the last_modified field at all, creating a needless incompatibility.

scholzj · 2024-04-24T15:37:25Z

Each Strimzi version supports only selected Kafka versions. The current version supports Kafka 3.6 and 3.7 for example. That is part of the source code and you cannot use it with other Kafka versions just like that.

blaghed · 2024-04-24T15:50:02Z

That is understandable and easily handled when we talk about the Operators and the Kafka pods only, but Kafka Connect objects can have "any" image, as long as it is based on Kafka, so it is an entirely different beast to handle.

From Kafka's side, they do a pretty decent job with the API, so compatibility is hardly an issue there. With the Strimzi Operator though, we find needless limitations in place.
I could very much understand if this was a situation where you absolutely need something from the API, so if you were actually using the last_modified for something, then my conversation would certainly be different here, but it is not used at all, as far as I can tell.

scholzj · 2024-04-24T15:55:03Z

From Kafka's side, they do a pretty decent job with the API, so compatibility is hardly an issue there. With the Strimzi Operator though, we find needless limitations in place.

That is true on the producer / consumer side. But the management side of things was a bit more wild in the last years with ZooKeeper removal.

Kafka Connect objects can have "any" image, as long as it is based on Kafka, so it is an entirely different beast to handle

They cannot have any image. They need to use a container image based on the same Strimzi release. That is partly because the image itself contains a lot of auxiliary logic. But Connect had its own changes to the REST API in recent years as well.

blaghed · 2024-04-24T16:07:32Z

But Connect had its own changes to the REST API in recent years as well.

Interesting, but are those changes backwards breaking or forwards breaking? Because this code is both, upgrading the Cluster forces the upgrade of used Connect image.

Note that our usage is to load the plugins into a Kafka image and use that directly, rather than the other options available like JAR download, which are not really ok in a secure production environment.
If we could use images instead of JARs for this feature, that would be awesome though, and this problem would be mitigated.

Anyway, you certainly make good points, and I think asking you to never break anything is unreasonable, but it would be great if you didn't break when it is not explicitely needed, as is the case here, please?

scholzj · 2024-04-24T16:22:52Z

Interesting, but are those changes backwards breaking or forwards breaking?

Depends on how exactly you define the compatibility. When a new API is introduced to replace an old API because of a new feature, it does not break compatibility per-se. But unless you want to maintain two paths in your code base (which means additional effort for maintenance, testing etc.) you eventually move to the new API to enable the new feature and thus stop being compatible with the old versions that have only the old API. Like it or not, the reality is that we do not have resources to maintain and test the support for older Kafka versions.

Note that our usage is to load the plugins into a Kafka image and use that directly, rather than the other options available like JAR download, which are not really ok in a secure production environment.
If we could use images instead of JARs for this feature, that would be awesome though, and this problem would be mitigated.

I'm not sure I follow this. It doesn't matter whether you build your own image or use the Connect Build feature to have the operator do it for you. The requirement is always the same => the base image has to be one of the Strimzi Kafka images from the same Strimzi release.

I'm not emotionally attached to the code that you commented on. I don't think I wrote the original code. It definitely has some advantages from my point of view, but I would have probably not written it like it is written if I was starting from zero. So as far as I'm concerned, if you hate it so much feel free to open a PR and let's see what the reviewers will say to it. But you should keep in mind that this does not change the big picture and if you are concerned by Strimzi supporting only the last 2 Kafka versions then it is not going to change anything on it.

blaghed · 2024-04-25T08:21:41Z

I'm not sure I follow this. It doesn't matter whether you build your own image or use the Connect Build feature to have the operator do it for you. The requirement is always the same => the base image has to be one of the Strimzi Kafka images from the same Strimzi release.

At the moment, the following spec would "build" the image to be used:

spec:
  build:
    plugins:
      - name: my-connector
        artifacts:
          - type: jar
            url: http://repo/connector-1.0.0.jar

What I meant, is that it would be great if we could just add the plugins as images. No "build" needed, rather they could just all be init-containers or such.
E.g.:

spec:
  image: quay.io/strimzi/kafka:0.39.0-kafka-3.5.2 # could also just be taken from the Strimzi Operator directly
  build: # just re-using the existing structure, can be some other block
    plugins:
      - name: my-connector
        artifacts:
          - type: image
            url: connectors/my-connector:1.0.0

Anyway, that is just wishful thinking. I know there would be some hurdles to go through, like making sure the connector images follow some predefined structure to make sure the plugins can be gathered correctly, but seems like something that can be overcome through some docs.

I'm not emotionally attached to the code that you commented on. I don't think I wrote the original code. It definitely has some advantages from my point of view, but I would have probably not written it like it is written if I was starting from zero. So as far as I'm concerned, if you hate it so much feel free to open a PR and let's see what the reviewers will say to it. But you should keep in mind that this does not change the big picture and if you are concerned by Strimzi supporting only the last 2 Kafka versions then it is not going to change anything on it.

Ah, good to know. I will try that, thank you for the nice discussion.

scholzj · 2024-04-25T08:24:46Z

What I meant, is that it would be great if we could just add the plugins as images. No "build" needed, rather they could just all be init-containers or such.

If you are going to build a container image, you can simply build the one based on Strimzi with the plugins you want. You do not need to build one image and then have Strimzi copy from it and build another image.
The connectors need to be int he main image. Not in some init-containers.

blaghed · 2024-04-25T08:36:24Z

If you are going to build a container image, you can simply build the one based on Strimzi with the plugins you want. You do not need to build one image and then have Strimzi copy from it and build another image.

The issue with that, as we are facing now, is that the "KafkaConnect" image becomes tightly bound to the Strimzi Cluster version, which is the issue we are facing now. This makes upgrades an unfortunate casualty of the design, having to be done in lock-step.

The connectors need to be int he main image. Not in some init-containers.

It is my understanding that the plugins just need to be mounted into the running image, and not that there is any need to have them directly as part of the image.
What you are saying is simply what exists, so it is a design limitation, not a technical one.

scholzj · 2024-04-25T08:39:28Z

What you are saying is simply what exists, so it is a design limitation, not a technical one.

Well, pretty much nothing is a technical limitation and just something that was not designed yet if you approach it like that 😄.

blaghed · 2024-04-25T09:13:19Z

FYI: #10026

Add support for Kafka 3.7.0 and remove Kafka 3.5.2

246266a

Signed-off-by: Jakub Scholz <www@scholzj.com>

scholzj added this to the 0.40.0 milestone Feb 27, 2024

scholzj marked this pull request as ready for review February 27, 2024 00:55

scholzj requested a review from ppatierno February 27, 2024 00:55

ppatierno reviewed Feb 27, 2024

View reviewed changes

cluster-operator/src/test/java/io/strimzi/operator/cluster/KafkaVersionTestUtils.java Outdated Show resolved Hide resolved

Review comment PP

1113202

Signed-off-by: Jakub Scholz <www@scholzj.com>

ppatierno approved these changes Feb 27, 2024

View reviewed changes

see-quick approved these changes Feb 27, 2024

View reviewed changes

coltmcnealy-lh approved these changes Feb 27, 2024

View reviewed changes

im-konge approved these changes Feb 27, 2024

View reviewed changes

scholzj merged commit eaf8950 into strimzi:main Feb 28, 2024
39 of 41 checks passed

scholzj deleted the add-support-for-kafka-3.7.0 branch February 28, 2024 11:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Kafka 3.7.0 and remove Kafka 3.5.2 #9747

Add support for Kafka 3.7.0 and remove Kafka 3.5.2 #9747

scholzj commented Feb 27, 2024

scholzj commented Feb 27, 2024

azure-pipelines bot commented Feb 27, 2024

scholzj commented Feb 27, 2024

azure-pipelines bot commented Feb 27, 2024

scholzj commented Feb 27, 2024

azure-pipelines bot commented Feb 27, 2024

scholzj commented Feb 27, 2024

azure-pipelines bot commented Feb 27, 2024

blaghed commented Apr 24, 2024

scholzj commented Apr 24, 2024

blaghed commented Apr 24, 2024

scholzj commented Apr 24, 2024

blaghed commented Apr 24, 2024

scholzj commented Apr 24, 2024

blaghed commented Apr 24, 2024

scholzj commented Apr 24, 2024

blaghed commented Apr 24, 2024 •

edited

scholzj commented Apr 24, 2024

blaghed commented Apr 25, 2024 •

edited

scholzj commented Apr 25, 2024

blaghed commented Apr 25, 2024

scholzj commented Apr 25, 2024

blaghed commented Apr 25, 2024

Add support for Kafka 3.7.0 and remove Kafka 3.5.2 #9747

Add support for Kafka 3.7.0 and remove Kafka 3.5.2 #9747

Conversation

scholzj commented Feb 27, 2024

Type of change

Description

Checklist

scholzj commented Feb 27, 2024

azure-pipelines bot commented Feb 27, 2024

scholzj commented Feb 27, 2024

azure-pipelines bot commented Feb 27, 2024

scholzj commented Feb 27, 2024

azure-pipelines bot commented Feb 27, 2024

scholzj commented Feb 27, 2024

azure-pipelines bot commented Feb 27, 2024

blaghed commented Apr 24, 2024

scholzj commented Apr 24, 2024

blaghed commented Apr 24, 2024

scholzj commented Apr 24, 2024

blaghed commented Apr 24, 2024

scholzj commented Apr 24, 2024

blaghed commented Apr 24, 2024

scholzj commented Apr 24, 2024

blaghed commented Apr 24, 2024 • edited

scholzj commented Apr 24, 2024

blaghed commented Apr 25, 2024 • edited

scholzj commented Apr 25, 2024

blaghed commented Apr 25, 2024

scholzj commented Apr 25, 2024

blaghed commented Apr 25, 2024

blaghed commented Apr 24, 2024 •

edited

blaghed commented Apr 25, 2024 •

edited