Skip to content

Latest commit



697 lines (428 loc) · 24.6 KB


File metadata and controls

697 lines (428 loc) · 24.6 KB

Ray Use Cases

This page indexes common Ray use cases for scaling ML. It contains highlighted references to blogs, examples, and tutorials also located elsewhere in the Ray documentation.

LLMs and Gen AI

Large language models (LLMs) and generative AI are rapidly changing industries, and demand compute at an astonishing pace. Ray provides a distributed compute framework for scaling these models, allowing developers to train and deploy models faster and more efficiently. With specialized libraries for data streaming, training, fine-tuning, hyperparameter tuning, and serving, Ray simplifies the process of developing and deploying large-scale AI models.

Learn more about how Ray scales LLMs and generative AI with the following resources.

1 2 3 4

[Blog] How Ray solves common production challenges for generative AI infrastructure

[Blog] Training 175B Parameter Language Models at 1000 GPU scale with Alpa and Ray

[Blog] Faster stable diffusion fine-tuning with Ray AIR

[Blog] How to fine tune and serve LLMs simply, quickly and cost effectively using Ray + DeepSpeed + HuggingFace


[Example] GPT-J-6B Fine-Tuning with Ray AIR and DeepSpeed


[Example] Fine-tuning DreamBooth with Ray AIR


[Example] Stable Diffusion Batch Prediction with Ray AIR


[Example] GPT-J-6B Serving with Ray AIR

Batch Inference

Batch inference is the process of generating model predictions on a large "batch" of input data. Ray for batch inference works with any cloud provider and ML framework, and is fast and cheap for modern deep learning applications. It scales from single machines to large clusters with minimal code changes. As a Python-first framework, you can easily express and interactively develop your inference workloads in Ray. To learn more about running batch inference with Ray, see the batch inference guide<batch_inference_home>.

1 2 3 4

[Blog] Offline Batch Inference: Comparing Ray, Apache Spark, and SageMaker

[Blog] Streaming distributed execution across CPUs and GPUs

[Blog] Using Ray Data to parallelize LangChain inference


[Guide] Batch Prediction using Ray Data


[Example] Batch Inference on NYC taxi data using Ray Data


[Example] Batch OCR processing using Ray Data

Many Model Training

Many model training is common in ML use cases such as time series forecasting, which require fitting of models on multiple data batches corresponding to locations, products, etc. The focus is on training many models on subsets of a dataset. This is in contrast to training a single model on the entire dataset.

When any given model you want to train can fit on a single GPU, Ray can assign each training run to a separate Ray Task. In this way, all available workers are utilized to run independent remote training rather than one worker running jobs sequentially.

Data parallelism pattern for distributed training on large datasets.

Data parallelism pattern for distributed training on large datasets.

How do I do many model training on Ray?

To train multiple independent models, use the Ray Tune (Tutorial <mmt-tune>) library. This is the recommended library for most cases.

You can use Tune with your current data preprocessing pipeline if your data source fits into the memory of a single machine (node). If you need to scale your data, or you want to plan for future scaling, use the Ray Data <data> library. Your data must be a supported format <input-output>, to use Ray Data.

Alternative solutions exist for less common cases:

  1. If your data is not in a supported format, use Ray Core (Tutorial <mmt-core>) for custom applications. This is an advanced option and requires and understanding of design patterns and anti-patterns <core-patterns>.
  2. If you have a large preprocessing pipeline, you can use the Ray Data library to train multiple models (Tutorial <mmt-datasets>).

Learn more about many model training with the following resources.

1 2 3 4

[Blog] Many Models Batch Training at Scale with Ray Core


[Example] Batch Training with Ray Core


[Example] Batch Training with Ray Data


[Guide] Tune Basic Parallel Experiments


[Example] Batch Training and Tuning using Ray Tune

[Talk] Scaling Instacart fulfillment ML on Ray

Model Serving

Ray Serve <rayserve> is well suited for model composition, enabling you to build a complex inference service consisting of multiple ML models and business logic all in Python code.

It supports complex model deployment patterns requiring the orchestration of multiple Ray actors, where different actors provide inference for different models. Serve handles both batch and online inference and can scale to thousands of models in production.

Deployment patterns with Ray Serve. (Click image to enlarge.)

Deployment patterns with Ray Serve. (Click image to enlarge.)

Learn more about model serving with the following resources.

1 2 3 4

[Talk] Productionizing ML at Scale with Ray Serve

[Blog] Simplify your MLOps with Ray & Ray Serve


[Guide] Getting Started with Ray Serve


[Guide] Model Composition in Serve


[Gallery] Serve Examples Gallery

[Gallery] More Serve Use Cases on the Blog

Hyperparameter Tuning

The Ray Tune <tune-main> library enables any parallel Ray workload to be run under a hyperparameter tuning algorithm.

Running multiple hyperparameter tuning experiments is a pattern apt for distributed computing because each experiment is independent of one another. Ray Tune handles the hard bit of distributing hyperparameter optimization and makes available key features such as checkpointing the best result, optimizing scheduling, and specifying search patterns.

Distributed tuning with distributed training per trial.

Distributed tuning with distributed training per trial.

Learn more about the Tune library with the following talks and user guides.

1 2 3 4


[Guide] Getting Started with Ray Tune

[Blog] How to distribute hyperparameter tuning with Ray Tune

[Talk] Simple Distributed Hyperparameter Optimization


[Gallery] Ray Tune Examples Gallery

More Tune use cases on the Blog

Distributed Training

The Ray Train <train-userguides> library integrates many distributed training frameworks under a simple Trainer API, providing distributed orchestration and management capabilities out of the box.

In contrast to training many models, model parallelism partitions a large model across many machines for training. Ray Train has built-in abstractions for distributing shards of models and running training in parallel.

Model parallelism pattern for distributed large model training.

Model parallelism pattern for distributed large model training.

Learn more about the Train library with the following talks and user guides.

1 2 3 4

[Talk] Ray Train, PyTorch, TorchX, and distributed deep learning

[Blog] Elastic Distributed Training with XGBoost on Ray


[Guide] Getting Started with Ray Train


[Example] Fine-tune a 🤗 Transformers model


[Gallery] Ray Train Examples Gallery

[Gallery] More Train Use Cases on the Blog

Reinforcement Learning

RLlib is an open-source library for reinforcement learning (RL), offering support for production-level, highly distributed RL workloads while maintaining unified and simple APIs for a large variety of industry applications. RLlib is used by industry leaders in many different verticals, such as climate control, industrial control, manufacturing and logistics, finance, gaming, automobile, robotics, boat design, and many others.

Decentralized distributed proximal polixy optimiation (DD-PPO) architecture.

Decentralized distributed proximal polixy optimiation (DD-PPO) architecture.

Learn more about reinforcement learning with the following resources.

1 2 3 4

[Course] Applied Reinforcement Learning with RLlib


[Guide] Getting Started with RLlib


[Gallery] RLlib Examples Gallery

[Gallery] More RL Use Cases on the Blog

ML Platform

Merlin is Shopify's ML platform built on Ray. It enables fast-iteration and scaling of distributed applications such as product categorization and recommendations.

Shopify's Merlin architecture built on Ray.

Shopify's Merlin architecture built on Ray.

Spotify uses Ray for advanced applications that include personalizing content recommendations for home podcasts, and personalizing Spotify Radio track sequencing.

How Ray ecosystem empowers ML scientists and engineers at Spotify.

How Ray ecosystem empowers ML scientists and engineers at Spotify.

The following highlights feature companies leveraging Ray's unified API to build simpler, more flexible ML platforms.

1 2 3 4

[Blog] The Magic of Merlin - Shopify's New ML Platform

[Slides] Large Scale Deep Learning Training and Tuning with Ray

[Talk] Predibase - A low-code deep learning platform built for scale

[Talk] Ray Summit Panel - ML Platform on Ray

End-to-End ML Workflows

The following highlights examples utilizing Ray AIR to implement end-to-end ML workflows.

1 2 3 4


[Example] Text classification with Ray


[Example] Image classification with Ray


[Example] Object detection with Ray


[Example] Credit scoring with Ray and Feast


[Example] Machine learning on tabular data


[Example] AutoML for Time Series with Ray


[Gallery] Full Ray AIR Examples Gallery

Large Scale Workload Orchestration

The following highlights feature projects leveraging Ray Core's distributed APIs to simplify the orchestration of large scale workloads.

1 2 3 4

[Blog] Highly Available and Scalable Online Applications on Ray at Ant Group

[Blog] Ray Forward 2022 Conference: Hyper-scale Ray Application Use Cases


[Example] Speed up your web crawler by parallelizing it with Ray