Skip to content

Add even Kafka partition distribution between all or set of indexers #5924

@dpavlov-smartling

Description

@dpavlov-smartling

As for now QW tries to distribute load between indexers using pipelines as a measure of work.
In the issue #5833 there was a discussion about even pipeline distribution.
But even with even pipeline distribution there can be an uneven load distribution between indexers.
Here is my case:

  • Kafka source for topic1 with 24 partitions and 4 pipelines. This topic is very loaded, more than 10 times in comparing to other topics
  • Kafka source for topic2 with 24 partitions and also 4 pipelines. This topic has very small load of data during the time

When you run these 2 sources on QW cluster with 2 instances you are getting the following outcome:

  • Indexer1 gets all pipelines for topic1
  • Indexer2 gets all pipelines for topic2

So, pipelines are distributed evenly, but
Indexer1 get all partitions for the Kafka topic which is 10 times more loaded then the topic for Indexer2.
This cause Indexer1 to have 100% CPU load and lag in processing topic1 messages
At the same time Indexer2 uses less than 10% CPU and does nothing.

I tried the following tricks to force QW to spread partitions between the Indexers:

  • tried to set cpu_capacity to 1m on all indexers - didn't help
  • tried to test image from Uneven pipeline distribution across indexers #5833 , which is quickwit/quickwit:qw-collocation-20250710 . It also didn't help with partition distribution, but during the load I noticed more even pipeline spread
  • tried to add 3rd Indexer that also didn't affect partition distribution.

It looks like QW has some logic in the code which tries to put pipelines related to same source/topic on the same Indexer. Because even with random distribution at some point I should have seen partitions spread between multiple instance.

Here is a screenshot of CPU load on the indexers:

Image

Let me know if you need more details. Also ready to test custom builds if you want to try something

Thanks,

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions