## 1. When to Consider Ray Serve

Consider using Ray Serve for your project if it meets one or more of the following criteria:

| **Challenge** | **Details** | **Ray Serve Solution** |
|---------------|------------------|--------------------------|
| **Slow iteration speed for ML engineers** | - Developers need to containerize and rollout components on Kubernetes to test changes<br>- Developers need to use complex protocols (e.g. gRPC) to achieve acceptable performance | - Provides a Python-first API to develop lightweight services<br>- Services are lightweight [Ray actors](https://docs.ray.io/en/latest/ray-core/actors.html)<br>- Ray Serve can be run locally for development |
| **Need to efficiently compose multiple components** | - Requires efficient data sharing between components<br>- Implementing performant streaming protocols (e.g. gRPC) is a complex task | - Relies on [Ray's object store](https://docs.ray.io/en/latest/ray-core/objects.html) to share data optimally<br>- Avoids the need to implement gRPC streaming |
| **Poor utilization of expensive hardware** | Suffering from poor utilization due to naive request handling | - Offers [dynamic batching of requests](https://docs.ray.io/en/latest/serve/advanced-guides/dyn-req-batch.html) to improve hardware utilization<br>- Leverages Ray Core's support for accelerators and custom resources:<br>&nbsp;&nbsp;&nbsp;&nbsp;• [Multi-node/multi-GPU serving](https://docs.ray.io/en/latest/serve/tutorials/vllm-example.html)<br>&nbsp;&nbsp;&nbsp;&nbsp;• [Fractional compute resource usage](https://docs.ray.io/en/latest/serve/configure-serve-deployment.html)<br>- RayTurbo Serve offers [replica compaction](https://www.anyscale.com/blog/new-feature-replica-compaction?_gl=1*lrhlou*_gcl_au*OTY4NjkwODIzLjE3Mzg1Mjc2MzA.) |
| **High-latency outliers when juggling many models** | Stuck with naive load balancing and expensive state loading (e.g. ML models) | - Provides [model multiplexing](https://docs.ray.io/en/latest/serve/model-multiplexing.html) to avoid unnecessary load times<br>- Routes to replicas that already have a model loaded |
