-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] [Serve] Threading for Ray Serve #20169
Comments
Hi @simon-mo, could you take a look at this feature. Thank you 😃 |
Thanks for posting this @cin-duke. cc @jiaodong @edoakes from the Serve team. Actually after some thoughts, this is more nuanced than I thought. When you want concurrent requests but it is CPU bounded, for example a single call using 100% CPU, you should use replicas instead of threading to ensure that each process. Replicas are easier to manage and performance is more predictable. When you have a call that only uses say 20% CPU but it is blocking call (no async option), threading might still make sense. However in this case replicas are still preferred because you can do The only case where threading would be useful is to use lower number of replicas due to process overhead. In this case you can use a Python threadpool and orchestrate it with asyncio. https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.loop.run_in_executor In summary, I think supporting threaded actor in max_concurrent_quries might be challenging. Let me know whether this make sense! |
Thanks for your detailed explanation @simon-mo Here is some of my comments:
In this usecase, thread would be useful because spawning multiple workers will take lots of system RAM, and if the deployment uses GPU RA, it will be very hard to scale out.
I will look into this approach. However, it'd be easier for users if Ray implement the threadpool internally. User only needs to specify the argument to turn on thread, and set max concurrency. |
This is a great point! For the thread pool approach, if you can try to prototype it in your application and see it working, we can integrate it into Ray Serve :D. |
Okay, we will experiment with it and let you know the result. |
Hi @cin-duke just revisiting this issue as this topic was brought up by other community users as well. Any updates on your experiment or help needed ? |
Search before asking
Description
Currently, Ray Serve only supports asyncio which will block the event loop when there are some computation heavy tasks. Which is not ideal for handling concurrent requests.
It'd be great if we can support threading for Ray Serve deployment. It will be easier to process concurrent requests.
Use case
Allow Ray Serve deployment to process some computation heavy tasks while waiting for results on other threads.
Related issues
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: