We have been having hardware issues with one of our brokers. Whenever it goes down, our consumers (using ConsumerGroup) end up in an endless cycle of rebalancing the consumer group. The only way to resolve this is to stop all of the consumers in the group for around 60 seconds so that the broker recognizes that the consumers are all "dead", then restart the consumers. The same thing happens when the down broker comes back online and gets elected by the cluster as a leader for some partitions.
This may be related to #246 or may be the same underlying issue - I am not 100% certain as we've been in firefighting mode.
This may also be related to the following: https://issues.apache.org/jira/browse/KAFKA-2985