Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Serve] Document max_replicas_per_node deployment option #42743

Merged
merged 5 commits into from Jan 31, 2024
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
36 changes: 36 additions & 0 deletions doc/source/serve/production-guide/fault-tolerance.md
Expand Up @@ -247,6 +247,42 @@ After you apply the Redis objects along with your updated `RayService`, your Ray
Check out the KubeRay guide on [GCS fault tolerance](kuberay-gcs-ft) to learn more about how Serve leverages the external Redis cluster to provide head node fault tolerance.
:::

### Spreading replicas across nodes

One way to improve the availability of your Serve application is to spread deployment replicas across multiple nodes so that you still have enough running
replicas to serve traffic even after a certain number of node failures.

By default, Serve soft spreads all deployments replicas but it has a few limitations:
jjyao marked this conversation as resolved.
Show resolved Hide resolved

* It's a soft and best-effort spread and there is no guarantee that the spread is perfectly even.

* Serve tries to spread replicas among the existing nodes if possible instead of launching new nodes.
For example, if you have a big enough single node cluster, Serve schedules all replicas on that single node assuming
it has enough resources and that node becomes the single point of failure.

You can change the spread behavior of your deployment through the `max_replicas_per_node`
[deployment option](../serve/api/doc/ray.serve.deployment_decorator.rst) which hard limits the number of replicas of a given deployment that can run on a single node.
If you set it to 1 then you are effectively strict spreading the deployment replicas. If you don't set it then there is no hard spread constraint and Serve uses the default soft spread mentioned above. `max_replicas_per_node` option is per deployment and only affects the spread of replicas within a deployment. There is no spread between replicas of different deployments.

Here is a code example showing how to set `max_replicas_per_node` deployment option:

```{testcode}
import ray
from ray import serve

@serve.deployment(max_replicas_per_node=1)
class Deployment1:
def __call__(self, request):
return "hello"

@serve.deployment(max_replicas_per_node=2)
class Deployment2:
def __call__(self, request):
return "world"
```

In this example, we have two Serve deployments with different `max_replicas_per_node`: Deployment1 can have at most 1 replica on each node and Deployment2 can have at most 2 replicas on each node. If we schedule 2 replicas of Deployment1 and 2 replicas of Deployment2, we will have a cluster with at least 2 nodes each running 1 replica of Deployment1. The 2 replicas of Deployment2 may run on a single node or two nodes and either satisfies the `max_replicas_per_node` constraint.
jjyao marked this conversation as resolved.
Show resolved Hide resolved

(serve-e2e-ft-behavior)=
## Serve's recovery procedures

Expand Down