Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove static concurrency limit from gRPC server #7544

Merged
merged 2 commits into from Mar 10, 2020

Conversation

edoakes
Copy link
Contributor

@edoakes edoakes commented Mar 10, 2020

Why are these changes needed?

We currently have a configurable limit on the number of concurrent calls of each RPC handler that can be accepted at a given time (after that, the calls will block in the gRPC layer as there is no tag for the completion queue to populate). On startup, that number of gRPC tags are created for future requests. This causes high memory consumption with no real benefit, as each thread can only be consuming at most one call at a time (while it is executing the handler) and creates a new one synchronously once it finishes executing the handler (even if the RPC reply was not yet sent).

To summarize, the existing behavior is:

  • On startup, create N tags that will be populated by K completion queues as requests come in
  • K threads poll one tag at a time from the completion queues. The threads handle the request synchronously (calling the user-defined methods), then create a new tag once the handler is finished (note that many of our handlers just post to another event loop, so return quickly - "finished" just means that the handler returned, not necessarily that the reply callback was called).

Note that there are never fewer than N - K tags in the completion queues and there can never be more than K handlers running at a given time (because there are only K threads to run them), but there's no limit on the number of outstanding requests being handled (if they're being handled on another thread). If all tags in the completion queue are populated, the completion queues are also blocked waiting for the handlers to create new tags.

The behavior with this PR is:

  • On startup, create K tags that will be populated by K completion queues as requests come in.
  • K threads poll one tag at a time from the completion queues. The threads create a new tag, then handle the request synchronously (calling the user-defined methods).

The key differences are:

  • We don't pre-allocate tags, reducing the memory usage (as reported in the issue).
  • We now always have K tags in the completion queues. The queues will be blocked if these tags are populated, but we still have some pipelining to populate these tags while handlers run.
  • There still can never be more than K handlers running at a given time because there are only K threads handling requests.

Related issue number

Closes #7543

Checks

@edoakes edoakes changed the title Remove max concurrency from RPC handlers Simplify concurrency handling in gRPC server Mar 10, 2020
@edoakes edoakes changed the title Simplify concurrency handling in gRPC server Remove static concurrency limit from gRPC server Mar 10, 2020
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22978/
Test FAILed.

@edoakes edoakes requested review from raulchen and ericl March 10, 2020 21:38
Copy link
Contributor

@stephanie-wang stephanie-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it was also an issue before, but should we be concerned at all that there is no backpressure on the server's request queue?

@edoakes
Copy link
Contributor Author

edoakes commented Mar 10, 2020

@raulchen merging this because the high memory consumption is causing CI failures in master and the tests passed here, but please have a look when you can and let me know if I'm missing something.

@edoakes edoakes merged commit 119a303 into ray-project:master Mar 10, 2020
@raulchen
Copy link
Contributor

@edoakes, while reducing memory usage, this PR also removes the back pressure functionality on the server side. I believe this will also cause issues.

Previously, when I first introduced gRPC to ray, without supporting back pressure, I saw an issue that worker suddenly submits tons of tasks to raylet and makes raylet overloaded. Some stressful tests were failing. I believe this PR will cause similar problems.

I think the best solution to this issue is that we only pre-allocate K tags. When a new request is accepted, we allocate a new tag if the total number of outstanding requests is less than N.

@edoakes
Copy link
Contributor Author

edoakes commented Mar 11, 2020

@raulchen not sure I fully understand - where was the backpressure that this change removes? We can still only ever have at most 2*K tags in scope at any given time - K currently being handled by the threads and K allocated in the completion queues.

Also note that most handlers had 9999 set for the max concurrency, in which case there was effectively no backpressure AFAICT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Idle ray worker uses too much memory
5 participants