Rebalancing is sometimes delayed #179

objectuser · 2017-03-14T18:15:11Z

If I start two group subscribers (same group) and there are a lot of messages to process, there are timeout errors in the log and it appears that both subscribers are elected group leader and are assigned all partitions.

(BTW, I'm using a thin Elixir wrapper over Brod.)

If I start two subscribers to the same group, if there are few messages to process, there's a rebalance, one is elected leader and the partitions are divided among the subscribers.

However, it appears that if the subscribers are especially busy, the rebalance is delayed, or never happens at all.

I do see both subscribers with elected=true and both are assigned all partitions. I also see:

12:52:01 PM consumer.1 |  group coordinator (groupId=index.6,memberId=,generation=11,pid=#PID<0.263.0>):
12:52:01 PM consumer.1 |  failed to join group
12:52:01 PM consumer.1 |  reason::timeout
12:52:02 PM consumer.1 |  group coordinator (groupId=index.6,memberId=,generation=11,pid=#PID<0.263.0>):
12:52:02 PM consumer.1 |  re-joining group, reason::timeout
12:52:11 PM consumer.1 |  group coordinator (groupId=index.6,memberId=nonode@nohost/<0.263.0>-a4a2c673-aa82-4d1a-9d95-418de65cf551,generation=15,pid=#PID<0.263.0>):
12:52:11 PM consumer.1 |  elected=true
12:52:11 PM consumer.1 |  group coordinator (groupId=index.6,memberId=nonode@nohost/<0.263.0>-a4a2c673-aa82-4d1a-9d95-418de65cf551,generation=15,pid=#PID<0.263.0>):

And:

12:50:21 PM consumer.1 |  group coordinator (groupId=index.6,memberId=nonode@nohost/<0.263.0>-893aaf83-c68c-4aaa-b448-34792fac31b0,generation=11,pid=#PID<0.263.0>):
12:50:21 PM consumer.1 |  re-joining group, reason::RebalanceInProgress
12:51:34 PM consumer.1 |  group coordinator (groupId=index.6,memberId=nonode@nohost/<0.263.0>-893aaf83-c68c-4aaa-b448-34792fac31b0,generation=11,pid=#PID<0.263.0>):
12:51:34 PM consumer.1 |  failed to join group
12:51:34 PM consumer.1 |  reason::UnknownMemberId
12:51:34 PM consumer.1 |  group coordinator (groupId=index.6,memberId=nonode@nohost/<0.263.0>-893aaf83-c68c-4aaa-b448-34792fac31b0,generation=11,pid=#PID<0.263.0>):
12:51:34 PM consumer.1 |  re-joining group, reason::UnknownMemberId
12:52:01 PM consumer.1 |  group coordinator (groupId=index.6,memberId=,generation=11,pid=#PID<0.263.0>):
12:52:01 PM consumer.1 |  failed to join group
12:52:01 PM consumer.1 |  reason::timeout
12:52:02 PM consumer.1 |  group coordinator (groupId=index.6,memberId=,generation=11,pid=#PID<0.263.0>):
12:52:02 PM consumer.1 |  re-joining group, reason::timeout
12:52:11 PM consumer.1 |  group coordinator (groupId=index.6,memberId=nonode@nohost/<0.263.0>-a4a2c673-aa82-4d1a-9d95-418de65cf551,generation=15,pid=#PID<0.263.0>):
12:52:11 PM consumer.1 |  elected=true
12:52:11 PM consumer.1 |  group coordinator (groupId=index.6,memberId=nonode@nohost/<0.263.0>-a4a2c673-aa82-4d1a-9d95-418de65cf551,generation=15,pid=#PID<0.263.0>):
12:52:11 PM consumer.1 |  assignments received:

I tested this with a topic with 32 partitions and 10k messages. If I start one subscriber, let it run for a bit, and then add another, I don't see a rebalance until almost all 10k messages are processed.

If I don't have any messages to be processed, a rebalance seems to happen almost immediately.

My observations of "busy" may be totally wrong, but it's the only pattern I've detected.

My questions is: Am I doing something wrong, or not doing something, that's causing this behavior? Or is this something I should expect from Brod specifically, or Kafka in general?

Please let me know if I can provide more information.

Thanks!

The text was updated successfully, but these errors were encountered:

zmstone · 2017-03-14T20:40:17Z

@objectuser I guess it's because the group subscriber is too busy to respond to group status change calls.
The group coordinator needs to revoke the already assigned partitions from the member before it can proceed re-balancing the group, but the revoke is a synced call to the subscriber process.
brod_group_subscriber is implemented as a member + subscriber + message_handler.

There are few things you can try:

Try fetch smaller batches (this will allow subscribers to exit the busy handle_message loop quicker).
Hand off the messages to other workers (especially when you have a lot of partitions, it makes sense to spawn one worker per partition), and ack the messages from the workers. This will make brod_group_subscriber a member + subscriber but messages are handled in workers.
Write your own group member, implement the brod_group_member behaviour so you can have your own member, then subscriber + message_handler are can be implemented in per-partition workers.

(sorry that we have made the current implementation too simple, we are thinking about a new implementation).

objectuser · 2017-03-15T14:03:20Z

@zmstone I'll try and digest your suggestions.

Brod has worked pretty well for us so far and I appreciate you want to evolve it.

Thanks for your response and suggestions!

objectuser closed this as completed Mar 15, 2017

objectuser mentioned this issue Mar 17, 2017

Support per-partition workers spreedly/kaffe#20

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rebalancing is sometimes delayed #179

Rebalancing is sometimes delayed #179

objectuser commented Mar 14, 2017

zmstone commented Mar 14, 2017

objectuser commented Mar 15, 2017

Rebalancing is sometimes delayed #179

Rebalancing is sometimes delayed #179

Comments

objectuser commented Mar 14, 2017

zmstone commented Mar 14, 2017

objectuser commented Mar 15, 2017