Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Kafka 3.7.0 and remove Kafka 3.5.2 #9747

Merged
merged 2 commits into from Feb 28, 2024

Conversation

scholzj
Copy link
Member

@scholzj scholzj commented Feb 27, 2024

Type of change

Select the type of your PR

  • Bugfix
  • Enhancement / new feature
  • Refactoring
  • Documentation

Description

This PR adds support for Kafka 3.7.0 and removes support for Kafka 3.5.2. It also uses Kafka 3.7.0 in the operators. It also updates Cruise Control and Snappy (to keep it in sync with what is used in Kafka).

Checklist

  • Write tests
  • Make sure all tests pass
  • Update documentation
  • Try your changes from Pod inside your Kubernetes and OpenShift cluster, not just locally
  • Update CHANGELOG.md

Signed-off-by: Jakub Scholz <www@scholzj.com>
@scholzj scholzj added this to the 0.40.0 milestone Feb 27, 2024
@scholzj scholzj marked this pull request as ready for review February 27, 2024 00:55
Signed-off-by: Jakub Scholz <www@scholzj.com>
@scholzj
Copy link
Member Author

scholzj commented Feb 27, 2024

/azp run regression

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@scholzj
Copy link
Member Author

scholzj commented Feb 27, 2024

/azp run upgrade

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@scholzj
Copy link
Member Author

scholzj commented Feb 27, 2024

/azp run kraft-regression

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@scholzj
Copy link
Member Author

scholzj commented Feb 27, 2024

/azp run feature-gates-regression

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@scholzj scholzj merged commit eaf8950 into strimzi:main Feb 28, 2024
39 of 41 checks passed
@scholzj scholzj deleted the add-support-for-kafka-3.7.0 branch February 28, 2024 11:57
@blaghed
Copy link
Contributor

blaghed commented Apr 24, 2024

Hi @scholzj ,

Why have this incompatibility existing at all?

Before it was checking that only level was returned in the output, which is weirdly restrictive:

for (var e : fetchedLoggers.entrySet()) {
    String level = e.getValue().get("level");
    if (level != null && e.getValue().size() == 1) {
        loggerMap.put(e.getKey(), level);
    } else {
        result.tryFail("Inner map has unexpected keys " + e.getValue().keySet());
        break;
    }
}

Now it checks that only level and last_modified are returned in the output, though the code proceeds to only care about the level:

for (var e : fetchedLoggers.entrySet()) {
    if (Set.of("level", "last_modified").containsAll(e.getValue().keySet()))   {
        String level = e.getValue().get("level");
        if (level != null) {
            loggerMap.put(e.getKey(), level);
        }
    } else {
        result.tryFail(new RuntimeException("Inner map has unexpected keys " + e.getValue().keySet()));
        break;
    }
}

But I don't understand why the need exists to create this incompatibility between Kafka 3.5.x and 3.7.x at all. Can't the operator simply get the level and ignore the rest?

@scholzj
Copy link
Member Author

scholzj commented Apr 24, 2024

@blaghed I'm not sure I follow the question. I think it is fairly common to protect the code from unexpected data being returned. It helps to detect changes that might need to be analyzes and handled in the code.

@blaghed
Copy link
Contributor

blaghed commented Apr 24, 2024

I can understand that reasoning, but in this case you are having the code consider any addition to the JSON structure to be a breaking change, without the necessity to do so.
Shouldn't you simply care that level exists and is returned? If Kafka decides to add some other fields in the future should be irrelevant to this code.

@scholzj
Copy link
Member Author

scholzj commented Apr 24, 2024

Well, you do not know how significant the next fields will be. So I'm not sure I see this as a problem.

@blaghed
Copy link
Contributor

blaghed commented Apr 24, 2024

As is, you are making the Cluster Operator tightly bound to the Kafka version use on KafkaConnect, without the actual need to.

Even if a future Kafka version were to add some fields that you would suddenly find significant on this class, why is that a reason to have the current version of the Operator break on it?

If that is not enough to persuade you, then alternatively see this from a JSON perspective:
With Kafka 3.5:

{
  "org.apache.kafka.connect": {
    "level": "WARN"
  }
}

With Kafka 3.7:

{
  "org.apache.kafka.connect": {
    "level": "WARN",
    "last_modified": null
  }
}

Adding last_modified is not a breaking change, because JSON parsers are able to cope with the additional field by ignoring it.
However, the current code here code does a "strict syntax" check on it, even though you have absolutely no need for the last_modified field at all, creating a needless incompatibility.

@scholzj
Copy link
Member Author

scholzj commented Apr 24, 2024

Each Strimzi version supports only selected Kafka versions. The current version supports Kafka 3.6 and 3.7 for example. That is part of the source code and you cannot use it with other Kafka versions just like that.

@blaghed
Copy link
Contributor

blaghed commented Apr 24, 2024

That is understandable and easily handled when we talk about the Operators and the Kafka pods only, but Kafka Connect objects can have "any" image, as long as it is based on Kafka, so it is an entirely different beast to handle.

From Kafka's side, they do a pretty decent job with the API, so compatibility is hardly an issue there. With the Strimzi Operator though, we find needless limitations in place.
I could very much understand if this was a situation where you absolutely need something from the API, so if you were actually using the last_modified for something, then my conversation would certainly be different here, but it is not used at all, as far as I can tell.

@scholzj
Copy link
Member Author

scholzj commented Apr 24, 2024

From Kafka's side, they do a pretty decent job with the API, so compatibility is hardly an issue there. With the Strimzi Operator though, we find needless limitations in place.

That is true on the producer / consumer side. But the management side of things was a bit more wild in the last years with ZooKeeper removal.

Kafka Connect objects can have "any" image, as long as it is based on Kafka, so it is an entirely different beast to handle

They cannot have any image. They need to use a container image based on the same Strimzi release. That is partly because the image itself contains a lot of auxiliary logic. But Connect had its own changes to the REST API in recent years as well.

@blaghed
Copy link
Contributor

blaghed commented Apr 24, 2024

But Connect had its own changes to the REST API in recent years as well.

Interesting, but are those changes backwards breaking or forwards breaking? Because this code is both, upgrading the Cluster forces the upgrade of used Connect image.

Note that our usage is to load the plugins into a Kafka image and use that directly, rather than the other options available like JAR download, which are not really ok in a secure production environment.
If we could use images instead of JARs for this feature, that would be awesome though, and this problem would be mitigated.

Anyway, you certainly make good points, and I think asking you to never break anything is unreasonable, but it would be great if you didn't break when it is not explicitely needed, as is the case here, please?

@scholzj
Copy link
Member Author

scholzj commented Apr 24, 2024

Interesting, but are those changes backwards breaking or forwards breaking?

Depends on how exactly you define the compatibility. When a new API is introduced to replace an old API because of a new feature, it does not break compatibility per-se. But unless you want to maintain two paths in your code base (which means additional effort for maintenance, testing etc.) you eventually move to the new API to enable the new feature and thus stop being compatible with the old versions that have only the old API. Like it or not, the reality is that we do not have resources to maintain and test the support for older Kafka versions.

Note that our usage is to load the plugins into a Kafka image and use that directly, rather than the other options available like JAR download, which are not really ok in a secure production environment.
If we could use images instead of JARs for this feature, that would be awesome though, and this problem would be mitigated.

I'm not sure I follow this. It doesn't matter whether you build your own image or use the Connect Build feature to have the operator do it for you. The requirement is always the same => the base image has to be one of the Strimzi Kafka images from the same Strimzi release.


I'm not emotionally attached to the code that you commented on. I don't think I wrote the original code. It definitely has some advantages from my point of view, but I would have probably not written it like it is written if I was starting from zero. So as far as I'm concerned, if you hate it so much feel free to open a PR and let's see what the reviewers will say to it. But you should keep in mind that this does not change the big picture and if you are concerned by Strimzi supporting only the last 2 Kafka versions then it is not going to change anything on it.

@blaghed
Copy link
Contributor

blaghed commented Apr 25, 2024

I'm not sure I follow this. It doesn't matter whether you build your own image or use the Connect Build feature to have the operator do it for you. The requirement is always the same => the base image has to be one of the Strimzi Kafka images from the same Strimzi release.

At the moment, the following spec would "build" the image to be used:

spec:
  build:
    plugins:
      - name: my-connector
        artifacts:
          - type: jar
            url: http://repo/connector-1.0.0.jar

What I meant, is that it would be great if we could just add the plugins as images. No "build" needed, rather they could just all be init-containers or such.
E.g.:

spec:
  image: quay.io/strimzi/kafka:0.39.0-kafka-3.5.2 # could also just be taken from the Strimzi Operator directly
  build: # just re-using the existing structure, can be some other block
    plugins:
      - name: my-connector
        artifacts:
          - type: image
            url: connectors/my-connector:1.0.0

Anyway, that is just wishful thinking. I know there would be some hurdles to go through, like making sure the connector images follow some predefined structure to make sure the plugins can be gathered correctly, but seems like something that can be overcome through some docs.

I'm not emotionally attached to the code that you commented on. I don't think I wrote the original code. It definitely has some advantages from my point of view, but I would have probably not written it like it is written if I was starting from zero. So as far as I'm concerned, if you hate it so much feel free to open a PR and let's see what the reviewers will say to it. But you should keep in mind that this does not change the big picture and if you are concerned by Strimzi supporting only the last 2 Kafka versions then it is not going to change anything on it.

Ah, good to know. I will try that, thank you for the nice discussion.

@scholzj
Copy link
Member Author

scholzj commented Apr 25, 2024

What I meant, is that it would be great if we could just add the plugins as images. No "build" needed, rather they could just all be init-containers or such.

  1. If you are going to build a container image, you can simply build the one based on Strimzi with the plugins you want. You do not need to build one image and then have Strimzi copy from it and build another image.
  2. The connectors need to be int he main image. Not in some init-containers.

@blaghed
Copy link
Contributor

blaghed commented Apr 25, 2024

If you are going to build a container image, you can simply build the one based on Strimzi with the plugins you want. You do not need to build one image and then have Strimzi copy from it and build another image.

The issue with that, as we are facing now, is that the "KafkaConnect" image becomes tightly bound to the Strimzi Cluster version, which is the issue we are facing now. This makes upgrades an unfortunate casualty of the design, having to be done in lock-step.

The connectors need to be int he main image. Not in some init-containers.

It is my understanding that the plugins just need to be mounted into the running image, and not that there is any need to have them directly as part of the image.
What you are saying is simply what exists, so it is a design limitation, not a technical one.

@scholzj
Copy link
Member Author

scholzj commented Apr 25, 2024

What you are saying is simply what exists, so it is a design limitation, not a technical one.

Well, pretty much nothing is a technical limitation and just something that was not designed yet if you approach it like that 😄.

@blaghed
Copy link
Contributor

blaghed commented Apr 25, 2024

FYI: #10026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants