New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove static concurrency limit from gRPC server #7544
Conversation
Can one of the admins verify this patch? |
Test FAILed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it was also an issue before, but should we be concerned at all that there is no backpressure on the server's request queue?
@raulchen merging this because the high memory consumption is causing CI failures in master and the tests passed here, but please have a look when you can and let me know if I'm missing something. |
@edoakes, while reducing memory usage, this PR also removes the back pressure functionality on the server side. I believe this will also cause issues. Previously, when I first introduced gRPC to ray, without supporting back pressure, I saw an issue that worker suddenly submits tons of tasks to raylet and makes raylet overloaded. Some stressful tests were failing. I believe this PR will cause similar problems. I think the best solution to this issue is that we only pre-allocate K tags. When a new request is accepted, we allocate a new tag if the total number of outstanding requests is less than N. |
@raulchen not sure I fully understand - where was the backpressure that this change removes? We can still only ever have at most Also note that most handlers had |
Why are these changes needed?
We currently have a configurable limit on the number of concurrent calls of each RPC handler that can be accepted at a given time (after that, the calls will block in the gRPC layer as there is no
tag
for the completion queue to populate). On startup, that number of gRPC tags are created for future requests. This causes high memory consumption with no real benefit, as each thread can only be consuming at most one call at a time (while it is executing the handler) and creates a new one synchronously once it finishes executing the handler (even if the RPC reply was not yet sent).To summarize, the existing behavior is:
N
tags that will be populated byK
completion queues as requests come inK
threads poll one tag at a time from the completion queues. The threads handle the request synchronously (calling the user-defined methods), then create a new tag once the handler is finished (note that many of our handlers just post to another event loop, so return quickly - "finished" just means that the handler returned, not necessarily that the reply callback was called).Note that there are never fewer than
N
-K
tags in the completion queues and there can never be more thanK
handlers running at a given time (because there are onlyK
threads to run them), but there's no limit on the number of outstanding requests being handled (if they're being handled on another thread). If all tags in the completion queue are populated, the completion queues are also blocked waiting for the handlers to create new tags.The behavior with this PR is:
K
tags that will be populated byK
completion queues as requests come in.K
threads poll one tag at a time from the completion queues. The threads create a new tag, then handle the request synchronously (calling the user-defined methods).The key differences are:
K
tags in the completion queues. The queues will be blocked if these tags are populated, but we still have some pipelining to populate these tags while handlers run.K
handlers running at a given time because there are onlyK
threads handling requests.Related issue number
Closes #7543
Checks
scripts/format.sh
to lint the changes in this PR.