New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More workers then partitions cause re-balance to hang #134
Comments
This is weird. Are you seeing any errors or are the agents just hanging? We typically run with python 3.6, could you also try with that? |
I'm testing with Python 3.7 and should work fine. Do you have an example app that can reproduce the issue? @vineet, notice he says he has three workers attempting to consume from a topic with only one partition. |
Please also paste logs if you have (but make sure they don't contain sensitive data) |
Some notes after testing with 3 workers on a single-partition topic:
What works:
What does not work:
As you can see the high-water is -1, so the recovery shouldn't have anything to recover, |
Thanks for the quick response! I saw the issue without using group by and table, just a regular stream that prints something. I started a worker, it worked fine. Started 2 more, they worked fine. Stopped one of them and all of them got stuck. I will try to do it today again |
Ah, it only logs warnings and errors by default. To get additional logging start the worker with |
This should be fixed in latest faust. |
Awesome, thanks! |
It seems I'm experiencing the same/very similar issue.
Where as if the worker does take over it actually re-joins the group:
I'm using Python 3.6.8 and Faust 1.10.1 |
* fix tests directory name in Makefile * fix black warning Co-authored-by: Vikram Patki <54442035+patkivikram@users.noreply.github.com>
Hey guys, first of all - this is an awesome library ! It's gives a solution to a problem a lot of developers has in data-heavy applications. :)
Steps to reproduce
Create a kafka topic with 1 partition, and start 3 agents for that topic.
You will see that the agents will hang when they will try to re-balance.
Expected behavior
They should re-balance, and only one should consume messages since 3 workers > 1 partition
Actual behavior
All agents freeze upon re-balance.
Versions
Faust version 1.0.27
Python 3.7.0 without any extension
The text was updated successfully, but these errors were encountered: