-
Notifications
You must be signed in to change notification settings - Fork 490
Description
As for now QW tries to distribute load between indexers using pipelines as a measure of work.
In the issue #5833 there was a discussion about even pipeline distribution.
But even with even pipeline distribution there can be an uneven load distribution between indexers.
Here is my case:
- Kafka source for topic1 with 24 partitions and 4 pipelines. This topic is very loaded, more than 10 times in comparing to other topics
- Kafka source for topic2 with 24 partitions and also 4 pipelines. This topic has very small load of data during the time
When you run these 2 sources on QW cluster with 2 instances you are getting the following outcome:
- Indexer1 gets all pipelines for topic1
- Indexer2 gets all pipelines for topic2
So, pipelines are distributed evenly, but
Indexer1 get all partitions for the Kafka topic which is 10 times more loaded then the topic for Indexer2.
This cause Indexer1 to have 100% CPU load and lag in processing topic1 messages
At the same time Indexer2 uses less than 10% CPU and does nothing.
I tried the following tricks to force QW to spread partitions between the Indexers:
- tried to set
cpu_capacity
to 1m on all indexers - didn't help - tried to test image from Uneven pipeline distribution across indexers #5833 , which is
quickwit/quickwit:qw-collocation-20250710
. It also didn't help with partition distribution, but during the load I noticed more even pipeline spread - tried to add 3rd Indexer that also didn't affect partition distribution.
It looks like QW has some logic in the code which tries to put pipelines related to same source/topic on the same Indexer. Because even with random distribution at some point I should have seen partitions spread between multiple instance.
Here is a screenshot of CPU load on the indexers:

Let me know if you need more details. Also ready to test custom builds if you want to try something
Thanks,