New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] race when creating new topics with TopicOperator #1775
Comments
Note: The above log lines are in reverse order: Read the log bottom up. |
@joekohlsdorf thanks for this bug report. I've not been able to reproduce this yet. Is there any chance you can reproduce this with the Topic Operator logging at DEBUG? You can configure it easily in the
It seems that you created the topics immediately after/while the TO was starting up. Are you able to reproduce this when the TO has been running for a while (i.e. I'm wondering if this is a race somehow between the reconciliation we do at start up and the event-based reconciliation, or something which can happen purely event driven). |
Here is a debug log of this, due to the amount of log produced I only extracted lines which contain the topic name. Log is in reverse order. In the case of the issue shown in the logs, the topics were created right after starting up the topic operator, I deployed the manifests for cluster and topic creation together. I can also reproduce this when the topic operator has been running for a while, sometimes I have to delete and redeploy topics multiple times to correct this issue. |
In the original bug report I changed the topic name in the debug log for privacy reasons. While debugging a related issue I came to the conclusion that this could happen if topic names contain characters which cannot be represented in K8s resource names. The redacted topic names from the log contained underscores. |
Further info on reproducing this:
|
I'm hitting this or something similar. I have a Helm chart that creates multiple (about 14) topics, and in my setup (3 node Kafka cluster on EKS, using st1-backed volumes, 3 Zookeeper nodes), the behaviour seems to trigger most of the time I make a fresh installation of this helm chart. All of the topics gets created, but most of them have no The topics are not available neither under /strimzi/topics nor under /brokers/topics in ZK. It doesn't help restarting the topic operator, the only mention of the bad topics in the debug log if I restart is If I add a topicName to the KafkaTopic spec, the topic operator will add an event to the KafkaTopic that it can't be renamed. What works is to remove one KafkaTopic at a time, then rerun the Helm chart installation so it gets recreated. I'll try to get back with some logs. Let me know what other information would be useful. |
@forsberg logs would be super useful. |
What's interesting is that at the time the Helm chart is installed, creating 5 topics - there's absolutely no mention of the names of these topics in the log (at level DEBUG) of the topic operator. So there's not much to report, log-wise. The 5 topics all got the same metadata.creationTimestamp:
What I did have, however, was a number of (other, unrelated) topics where I had tried modifying the replicas count, which led to these kind of messages:
It may be a coincidence, but after fixing the unrelated KafkaTopics to set spec.replicas=1, the log starts mentioning the 5 new topics created by the Helm chart:
And after this, my 5 KafkaTopic all got a Summing up the timeline:
|
I know that the replication exception breaks other updates to the same topic. But I do not think it should impact other topics. @stanlyDoge @ppatierno @tombentley Can you check this? You are the experts on TO. |
To add further data to this, I'm seeing what looks to be a related issue where the TopicOperator fails to materialize some topics in Kafka. On a container restart with DEBUG on I see the "ignoring" messages. Those are the ONLY indication that the operator is even aware of the topic resources in k8s. How long the operator is running makes no difference. Modifying a KafkaTopic resource sometimes shows logs. I'll try and reproduce and see if I can get a clean run. Despite log messages, nothing seems to change in the kafka cluster. The topic does not materialize. At least in the current cluster where I am observing this, if you delete the kafkatopic object and recreate it with a spec that includes topicName, replicas, and partitions, the topic operator appears to actually function. It is my understanding that those should be optional since replicas and partitions can be inherited from the cluster definition and topicName should only be used in select cases. |
Nothing gets logged and the topic doesn't materialize in zookeeper or kafka. There are no events logged:
|
Has anyone reproduced this using Strimzi 0.21? |
I did not try yet. I can do later. |
@sknot-rh tried to reproduce this without success. So it was probably fixed by one of the other PRs in Strimzi 0.22. Closing. |
Describe the bug
When creating a large number of topics at once with the topic operator I consistently see a race where some of them are recognized as unmanaged by the topic operator getting their Kubernetes resource definitions renamed to include a hash.
To Reproduce
Create 500 KafkaTopic resource definitions, then
kubectl apply
them all at once.I used a fresh cluster with no load for this test.
Expected behavior
I expect all Kafka topics to be created without getting their Kubernetes resource definitions renamed.
Environment (please complete the following information):
Strimzi version: 0.12.1 (same behavior on 0.11.1)
Installation method: YAML files
Kubernetes cluster: 1.11
Infrastructure: AWS, installed with kops
YAML files and logs
Log for a single topic where I saw this behavior:
The text was updated successfully, but these errors were encountered: