Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue when creating new topics with TopicOperator #5943

Closed
AmarendraSingh88 opened this issue Nov 25, 2021 · 5 comments
Closed

Issue when creating new topics with TopicOperator #5943

AmarendraSingh88 opened this issue Nov 25, 2021 · 5 comments

Comments

@AmarendraSingh88
Copy link

Describe the bug
I am getting an issue where some of the topics are not getting created by the Topic Operator. I am creating around 40 topics (16 partitions, 3 replicas) and having a Kafka cluster of 3 brokers.
The issue is intermittent as sometimes all the topics get created but other times some of them (5-10 topics) are not getting created. The Kubernetes custom resource (KafkaTopic) is there but the actual kafka topic is not available

To Reproduce
Steps to reproduce the behavior:

  1. Create KafkaTopic custom resource
  2. Using helm loop through the KafkaTopic resource and create 40+ topics
  3. Check the status of KafkaTopic resources with NotReady state using kubectl command-
    kubectl get kt -n kafka --context=testing -o json | jq -r '[.items[] |select(.status.conditions[0].type != "Ready")| .metadata.name]'

Expected behavior
All the topics should be created without any issue.

Environment (please complete the following information):

  • Strimzi version: 0.22.1
  • Installation method: Helm chart
  • Kubernetes cluster: Kubernetes 1.20
  • Infrastructure: Amazon EKS
  • Kafka version - 2.7.0

logs
I am not getting any substantial information from the Topic Operator logs as well, this is the status of the topic which is in NotReady
state-

Status:
   Conditions:   
      Last Transition Time: 2021-11-25T06:07:27.477907Z                                                               
      Message: Call(callName=createTopics, deadlineMs=1637820447475, tries=1, nextAllowedTryMs=1637820447577) timed out at 1637820447477 after 1 attempt()                                                                          
      Reason: TimeoutException 
      Status: True                                                                                                    
      Type: NotReady                                                                                                  
      Observed Generation: 1                                                                                          
Events:            
  Type     Reason    Age       From                                      Message   
  Warning           <unknown>  io.strimzi.operator.topic.TopicOperator  Failure processing KafkaTopic watch event ADDED on resource <Topic-Name> with labels {app.kubernetes.io/managed-by=Helm, strimzi.io/cluster=kafka, tenant-id=<tenant-id>}: Call(callName=createTopics, deadlineMs=1637820447475, tries=1, nextAllowedTryMs=1637820447577) timed out at 1637820447477 after 1 attempt(s)

Additional context
I have a couple of questions here-
1- We have around 80 test customers with 40 topics each (16 partitions, 3 replicas) which makes it around 150k partitions in the cluster, is that enough to be handled by a Kafka cluster of 3 brokers?
2- How can we start multiple instances of Entity Operator so that the Topic management load is distributed and we don't end up in this kind of race condition?

This does seem related to #1775. Let me know if more details are required.

@sknot-rh
Copy link
Member

I suspect this is the same issue as #5691. Each reconciliation the topics are fetched from Kafka to check the updates. When there is a large number of topics, a lot of requests are sent to Kafka broker and it basically refuses any new ones. The adminClient.describeTopics() supports fetching data in batches (the list of topic names is provided) so we should think about that.

@scholzj
Copy link
Member

scholzj commented Nov 25, 2021

There is a difference between 40 topics and 3200 topics. So which one is it you are using?

@AmarendraSingh88
Copy link
Author

There is a difference between 40 topics and 3200 topics. So which one is it you are using?

We have a total of 3200+ topics and the issue is coming when we create new topics which are 40. Out of 40, 5-10 topics are not getting created. After some time when I delete the Custom Kafka resource for those 5-10 topics and create them again, the creation works.

@scholzj
Copy link
Member

scholzj commented Nov 25, 2021

I guess that could be as suggested by @sknot-rh => you might be reaching the limits of the system. Maybe increasing the resources for the Kafka cluster or for the Topic operator might help. But it is not exact science ... so you would need to give it a try.

@AmarendraSingh88
Copy link
Author

Ok, let me try increasing that and share the results.

@scholzj scholzj closed this as completed Jan 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants