New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[UTO] Unidirectional Topic Operator does seem to cause disruption when upgrading from 0.38 with BTO #9470
Labels
Comments
This is possibly a blocker for 0.39.0 release! At least unless we confirm that it is only a cosmetic issue in UTO without any negative impact. In that case it would be ugly, but perhaps not blocking. |
scholzj
changed the title
[UTO] Unidirectional Topic Operator does seem to cause disruption when ugrading from 0.38 with BTO
[UTO] Unidirectional Topic Operator does seem to cause disruption when upgrading from 0.38 with BTO
Dec 15, 2023
fvaleri
added a commit
to fvaleri/strimzi-kafka-operator
that referenced
this issue
Dec 18, 2023
This issue is caused by stale metadata of one or more brokers after restarting the cluster (no risk of data loss). Using the reproducer, we can see that the UTO fails at 14:27:39 with UnknownTopicOrPartitionException (retriable), while one of the brokers first knows about my-topic at 14:27:44. This triggers topic creation logic which fails with TopicExistsException. UTO log: 2023-12-17 14:27:39,55262 TRACE [kafka-admin-client-thread | strimzi-topic-operator-a93c1635-76c3-4c9f-b61f-68c1a6ac98c3] BatchingTopicController:754 - Admin.describeTopics([__strimzi_store_topic, strimzi.cruisecontrol.partitionmetricsamples, __strimzi-topic-operator-kstreams-topic-store-changelog, timer-topic, connect-cluster-status, strimzi.cruisecontrol.modeltrainingsamples, strimzi.cruisecontrol.metrics, my-topic, __consumer_offsets, connect-cluster-offsets]) failed with java.util.concurrent.CompletionException: org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition. Broker log: 2023-12-17 14:27:44,209 TRACE [Broker id=1000] Cached leader info UpdateMetadataPartitionState(topicName='my-topic', partitionIndex=0, controllerEpoch=1, leader=2000, leaderEpoch=4, isr=[1001, 2000, 2001], zkVersion=7, replicas=[2000, 2001, 1001], offlineReplicas=[]) for partition my-topic-0 in response to UpdateMetadata request sent by controller 1001 epoch 2 with correlation id 0 (state.change.logger) [control-plane-kafka-request-handler-0] I'm proposing to catch and ignore the TopicExistsException, wich is also what BTO does. If the topic was created by a third party before the UTO, the next reconciliation will try to revert any configuration drift. Signed-off-by: Federico Valeri <fedevaleri@gmail.com>
scholzj
pushed a commit
that referenced
this issue
Dec 18, 2023
scholzj
pushed a commit
that referenced
this issue
Dec 18, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The Unidirectional Topic Operator (enabled by default in 0.39) seems to cause disruption when upgrading from 0.38 (with BTO enabled by default).
It seems to be reproducible with the following steps (4 times out of 4 tries):
Expected behavior:
Actual behavior:
The text was updated successfully, but these errors were encountered: