Fix Connect's dynamic log level changes #9752

fvaleri · 2024-02-28T17:15:59Z

The loggers endpoint now supports the scope parameter, which can be set to cluster in order to change level in all nodes at once. This change simply adds this parameter that works since Kafka 3.7, and causes no harm to previous versions.

This should close #9067.

The loggers endpoint now supports the scope parameter, which can be set to cluster in order to change level in all nodes at once. This change simply adds this parameter that works since Kafka 3.7, and causes no harm to previous versions. This should close strimzi#9067. Signed-off-by: Federico Valeri <fedevaleri@gmail.com>

scholzj · 2024-02-28T19:08:54Z

I guess the test failure is related to this?

scholzj · 2024-02-28T19:18:03Z

Interestingly enough when running the whole test suite I get 500, running just the single test gives me another error but with 204.

Signed-off-by: Federico Valeri <fedevaleri@gmail.com>

fvaleri · 2024-02-29T09:03:52Z

Test fixed. With cluster wide logging level update the response code changes to 204 (no content).

scholzj

Wait, it is not that simple ... how will this work with Kafka 3.6? I assume the new parameter will be ignored which is fine, but it will give you still HTTP 200, or? So you need to check for both / or check the whole range of HTTP OK codes, or?

Signed-off-by: Federico Valeri <fedevaleri@gmail.com>

fvaleri · 2024-02-29T09:58:42Z

So you need to check for both / or check the whole range of HTTP OK codes, or?

Right, we still need to support both. Thanks.

ppatierno · 2024-02-29T13:54:47Z

@fvaleri I see the test failing because the 500 error.

scholzj · 2024-02-29T13:56:41Z

Yeah, that is what I pointed out -> the test alone works fine with 204, but when the whole test suite is run it gets 500.

fvaleri · 2024-02-29T15:19:27Z

I'll look into that, but it works fine on my end. Do you also see 500 on your local env?

scholzj · 2024-02-29T15:21:54Z

@fvaleri I did saw 500 locally yesterday when running the whole KafkaConnectApiIT suit from IDE. And 204 when running only the single KafkaConnectApiIT.testChangeLoggers test.

fvaleri · 2024-03-01T07:30:51Z

Ok, was able to reproduce locally, but it doesn't happen all the time. There is some flakiness here.

Signed-off-by: Federico Valeri <fedevaleri@gmail.com>

fvaleri · 2024-03-01T13:01:42Z

That was too easy, indeed.

The problem is that, unlike the "worker" scope, the "cluster" scope request writes the logging level configuration into the internal -config topic, in order to propagate this information to the other workers. To do this, it uses a KafkaBasedLog instance, which includes a Kafka producer initialized to null on instance creation. This producer is created lazily when the service starts.

When we start the test Connect instance, services like the ConfigBackingStore, which uses the KafkaBasedLog, are started asynchronously on a different thread. This means that, when our cluster wide log level configuration request arrives, the producer may not have been created yet, which results in the error "IllegalStateException: This KafkaBasedLog was created in read-only mode and does not support write operations".

This is fixed by using the Connect.isRunning() method to wait for all services initialization before running the tests.

I also found a second source of flakiness, this time on task restart test. This is due to "ConnectRestException: Cannot complete request momentarily due to no known leader URL, likely because a rebalance was underway".

This may happens when there is a rebalance in progress, as the error suggests, and confirmed by observing the log. The workaround here is to only use one worker node for this test, which I think is fine as the focus is on the REST API here.

After these changes, I ran KafkaConnectApiIT and KafkaConnectorIT many times without issues. Let me know if you also see the same.

scholzj · 2024-03-01T14:44:27Z

/azp run regression

azure-pipelines · 2024-03-01T14:44:40Z

Azure Pipelines successfully started running 1 pipeline(s).

scholzj

LGTM assuming the tests pass.

fvaleri requested review from scholzj and ppatierno February 28, 2024 17:16

fvaleri added this to the 0.40.0 milestone Feb 28, 2024

Fix integration test

441cb06

Signed-off-by: Federico Valeri <fedevaleri@gmail.com>

scholzj reviewed Feb 29, 2024

View reviewed changes

Support old return code

6a64459

Signed-off-by: Federico Valeri <fedevaleri@gmail.com>

Fix test flakiness

6149110

Signed-off-by: Federico Valeri <fedevaleri@gmail.com>

scholzj approved these changes Mar 1, 2024

View reviewed changes

ppatierno approved these changes Mar 4, 2024

View reviewed changes

scholzj merged commit f17b8d3 into strimzi:main Mar 4, 2024
21 checks passed

fvaleri deleted the fix-connect-log branch March 4, 2024 09:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Connect's dynamic log level changes #9752

Fix Connect's dynamic log level changes #9752

fvaleri commented Feb 28, 2024

scholzj commented Feb 28, 2024

scholzj commented Feb 28, 2024

fvaleri commented Feb 29, 2024 •

edited

scholzj left a comment

fvaleri commented Feb 29, 2024

ppatierno commented Feb 29, 2024

scholzj commented Feb 29, 2024

fvaleri commented Feb 29, 2024

scholzj commented Feb 29, 2024

fvaleri commented Mar 1, 2024

fvaleri commented Mar 1, 2024 •

edited

scholzj commented Mar 1, 2024

azure-pipelines bot commented Mar 1, 2024

scholzj left a comment

Fix Connect's dynamic log level changes #9752

Fix Connect's dynamic log level changes #9752

Conversation

fvaleri commented Feb 28, 2024

scholzj commented Feb 28, 2024

scholzj commented Feb 28, 2024

fvaleri commented Feb 29, 2024 • edited

scholzj left a comment

Choose a reason for hiding this comment

fvaleri commented Feb 29, 2024

ppatierno commented Feb 29, 2024

scholzj commented Feb 29, 2024

fvaleri commented Feb 29, 2024

scholzj commented Feb 29, 2024

fvaleri commented Mar 1, 2024

fvaleri commented Mar 1, 2024 • edited

scholzj commented Mar 1, 2024

azure-pipelines bot commented Mar 1, 2024

scholzj left a comment

Choose a reason for hiding this comment

fvaleri commented Feb 29, 2024 •

edited

fvaleri commented Mar 1, 2024 •

edited