[Feature] [Serve] Threading for Ray Serve #20169

cin-duke · 2021-11-09T03:44:07Z

Search before asking

I had searched in the issues and found no similar feature requirement.

Description

Currently, Ray Serve only supports asyncio which will block the event loop when there are some computation heavy tasks. Which is not ideal for handling concurrent requests.
It'd be great if we can support threading for Ray Serve deployment. It will be easier to process concurrent requests.

Use case

Allow Ray Serve deployment to process some computation heavy tasks while waiting for results on other threads.

Related issues

No response

Are you willing to submit a PR?

Yes I am willing to submit a PR!

cin-duke · 2021-11-09T03:45:50Z

Hi @simon-mo, could you take a look at this feature. Thank you 😃

simon-mo · 2021-11-09T19:53:51Z

Thanks for posting this @cin-duke. cc @jiaodong @edoakes from the Serve team.

Actually after some thoughts, this is more nuanced than I thought.

When you want concurrent requests but it is CPU bounded, for example a single call using 100% CPU, you should use replicas instead of threading to ensure that each process. Replicas are easier to manage and performance is more predictable.

When you have a call that only uses say 20% CPU but it is blocking call (no async option), threading might still make sense. However in this case replicas are still preferred because you can do YourDeployment.options(ray_actor_options={"num_cpus": 0.2}, num_replicas=10) to scale out.

The only case where threading would be useful is to use lower number of replicas due to process overhead. In this case you can use a Python threadpool and orchestrate it with asyncio. https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.loop.run_in_executor

In summary, I think supporting threaded actor in max_concurrent_quries might be challenging. Let me know whether this make sense!

cin-duke · 2021-11-11T04:46:08Z

Thanks for your detailed explanation @simon-mo

Here is some of my comments:

When you have a call that only uses say 20% CPU but it is blocking call (no async option), threading might still make sense. However in this case replicas are still preferred because you can do YourDeployment.options(ray_actor_options={"num_cpus": 0.2}, num_replicas=10) to scale out.

In this usecase, thread would be useful because spawning multiple workers will take lots of system RAM, and if the deployment uses GPU RA, it will be very hard to scale out.

The only case where threading would be useful is to use lower number of replicas due to process overhead. In this case you can use a Python threadpool and orchestrate it with asyncio. https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.loop.run_in_executor

I will look into this approach. However, it'd be easier for users if Ray implement the threadpool internally. User only needs to specify the argument to turn on thread, and set max concurrency.

simon-mo · 2021-11-11T18:08:28Z

In this use case, thread would be useful because spawning multiple workers will take lots of system RAM, and if the deployment uses GPU RA, it will be very hard to scale out.

This is a great point! For the thread pool approach, if you can try to prototype it in your application and see it working, we can integrate it into Ray Serve :D.

cin-duke · 2021-11-12T03:06:08Z

Okay, we will experiment with it and let you know the result.

jiaodong · 2022-02-11T18:28:27Z

Hi @cin-duke just revisiting this issue as this topic was brought up by other community users as well. Any updates on your experiment or help needed ?

cin-duke · 2022-02-16T02:20:31Z

Hi @jiaodong, @simon-mo
Sorry for late response, I don't have time to investigate it. My simple solution is: creating a function to call the deployment, then create a thread pool to run that function on separated thread.

cin-duke added the enhancement Request for new feature and/or capability label Nov 9, 2021

cin-duke changed the title ~~[Feature] Threading for Ray Serve~~ [Feature][Serve] Threading for Ray Serve Nov 9, 2021

cin-duke changed the title ~~[Feature][Serve] Threading for Ray Serve~~ [Feature] [Serve] Threading for Ray Serve Nov 9, 2021

simon-mo added the serve Ray Serve Related Issue label Nov 11, 2021

simon-mo added this to the Serve backlog milestone Nov 11, 2021

simon-mo added the P2 Important issue, but not time-critical label Jan 26, 2022

AmeerHajAli added the platform label Mar 26, 2022

edoakes removed the platform label Apr 25, 2022

Martin4R mentioned this issue Dec 20, 2023

[Serve] Patching the asyncio-loop fails from Ray 2.8 #42035

Closed

SeanHH86 mentioned this issue May 10, 2024

Api server blocked when one request is in-process OpenCSGs/llm-inference#137

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] [Serve] Threading for Ray Serve #20169

[Feature] [Serve] Threading for Ray Serve #20169

cin-duke commented Nov 9, 2021

cin-duke commented Nov 9, 2021

simon-mo commented Nov 9, 2021

cin-duke commented Nov 11, 2021 •

edited

Loading

simon-mo commented Nov 11, 2021

cin-duke commented Nov 12, 2021

jiaodong commented Feb 11, 2022

cin-duke commented Feb 16, 2022

[Feature] [Serve] Threading for Ray Serve #20169

[Feature] [Serve] Threading for Ray Serve #20169

Comments

cin-duke commented Nov 9, 2021

Search before asking

Description

Use case

Related issues

Are you willing to submit a PR?

cin-duke commented Nov 9, 2021

simon-mo commented Nov 9, 2021

cin-duke commented Nov 11, 2021 • edited Loading

simon-mo commented Nov 11, 2021

cin-duke commented Nov 12, 2021

jiaodong commented Feb 11, 2022

cin-duke commented Feb 16, 2022

cin-duke commented Nov 11, 2021 •

edited

Loading