-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rebalancing is sometimes delayed #179
Comments
@objectuser I guess it's because the group subscriber is too busy to respond to group status change calls. There are few things you can try:
(sorry that we have made the current implementation too simple, we are thinking about a new implementation). |
@zmstone I'll try and digest your suggestions. Brod has worked pretty well for us so far and I appreciate you want to evolve it. Thanks for your response and suggestions! |
If I start two group subscribers (same group) and there are a lot of messages to process, there are timeout errors in the log and it appears that both subscribers are elected group leader and are assigned all partitions.
(BTW, I'm using a thin Elixir wrapper over Brod.)
If I start two subscribers to the same group, if there are few messages to process, there's a rebalance, one is elected leader and the partitions are divided among the subscribers.
However, it appears that if the subscribers are especially busy, the rebalance is delayed, or never happens at all.
I do see both subscribers with
elected=true
and both are assigned all partitions. I also see:And:
I tested this with a topic with 32 partitions and 10k messages. If I start one subscriber, let it run for a bit, and then add another, I don't see a rebalance until almost all 10k messages are processed.
If I don't have any messages to be processed, a rebalance seems to happen almost immediately.
My observations of "busy" may be totally wrong, but it's the only pattern I've detected.
My questions is: Am I doing something wrong, or not doing something, that's causing this behavior? Or is this something I should expect from Brod specifically, or Kafka in general?
Please let me know if I can provide more information.
Thanks!
The text was updated successfully, but these errors were encountered: