# Scalable Batch Inference with Ray

<img src="https://technical-training-assets.s3.us-west-2.amazonaws.com/Generic/ray_logo.png" width="20%" loading="lazy">

## About this notebook

### Is this module right for you?

This module presents several approaches for scaling batch inference on Ray. Through hands-on practice with inference on a computer vision task, you will implement and compare different inference architectures to better understand Ray AIR and Ray Core.

To get the most out of this notebook, the following scenarios may apply to you:

* You observe performance bottlenecks when working on batch inference problems in computer vision projects.
* You want to scale or increase throughput of existing batch inference pipelines.
* You wish to explore different architectures for scaling batch inference with Ray AIR and Ray Core.

### Prerequisites

For this notebook you should satisfy the following requirements:

* Practical Python and machine learning experience.
* Familiarity with batch inference in ML.
* Familiarity with Ray and Ray AIR equivalent to completing these training modules:
  * [Overview of Ray](https://github.com/ray-project/ray-educational-materials/blob/main/Introductory_modules/Overview_of_Ray.ipynb)
  * [Introduction to Ray AIR](https://github.com/ray-project/ray-educational-materials/blob/main/Introductory_modules/Introduction_to_Ray_AIR.ipynb)
  * [Ray Core](https://github.com/ray-project/ray-educational-materials/tree/main/Ray_Core)

### Learning objectives

* Understand common design patterns for distributed batch inference.
* Implement scalable batch inference with Ray.
* Extend each approach by tuning performance.
* Compare scalable batch inference architectures on Ray to evaluate which is most relevant to your work.

### What will you do?

* Learn about three distributed batch inference design patterns with Ray.
* Get to know the inference task.
  * Semantic (image) segmentation using the SegFormer model.
* Implement sequential inference.
* Implement distributed inference patterns.
  * Inference with Ray AIR Datasets and **BatchPredictor** abstractions.
  * Inference with Ray Core, using key abstrations: Ray tasks and actors.
* Compare approaches to identify situations best fit for each.

## Part 1: Ray design patterns for scaling batch inference

The ultimate goal for machine learning models is often to generate predictions on a set of unseen data. In this notebook, you focus on the inference stage of the ML workflow and explore different approaches to scaling it.

Ray Core and Ray AIR provide APIs that allow you to perform batch inference at scale, processing millions of examples and offering various performance tuning options.

|<img src="https://technical-training-assets.s3.us-west-2.amazonaws.com/Scaling_inference/example_ml_workflow.png" width="70%" loading="lazy">|
|:--|
|An example of a machine learning workflow that starts with reading raw data and preprocessing it. These steps are followed by training and tuning that produce a trained model. This model is then used for inference, often on large datasets.|

### What is (batch) inference?

<div class="alert alert-info">
  <strong>Batch inference</strong> (also known as offline inference): is the process of generating predictions on a large set or "batch" of data.
</div>

Unlike *online inference* where predictions are generated as each observation is produced, batch inference generates predictions over a large number of input data when immediate response is not required or feasible. 

For example, batch inference is relevant when generating weekly product recommendations using historical customer data or sales forecasting using time-aggregated observations.

|<img src="https://technical-training-assets.s3.us-west-2.amazonaws.com/Scaling_inference/batch_inference.png" width="70%" loading="lazy">|
|:--|
|Batch inference is the process of applying a trained model to a batch of data to generate predictions.|

In a non-distributed setting, inference executes sequentially. The model processes incoming batches of data one at a time, limiting performance to a single machine or GPU. Below, you will learn about three approaches for distributing batch inference on Ray.

### Batch inference using Ray AIR BatchPredictor

Ray AIR [BatchPredictor](https://docs.ray.io/en/latest/ray-air/predictors.html#batch-prediction) is a utility for large-scale, distributed batch inference. `BatchPredictor` has out-of-the-box features:

* supports various predictors like [TorchPredictor](https://docs.ray.io/en/latest/ray-air/api/doc/ray.train.torch.TorchPredictor.html#ray.train.torch.TorchPredictor), [HuggingFacePredictor](https://docs.ray.io/en/latest/ray-air/api/doc/ray.train.huggingface.HuggingFacePredictor.html#ray.train.huggingface.HuggingFacePredictor) or [XGBoostPredictor](https://docs.ray.io/en/latest/ray-air/api/doc/ray.train.xgboost.XGBoostPredictor.html#ray.train.xgboost.XGBoostPredictor))
* it handles framework-native batch conversions
* it has options to resume operations from AIR checkpoint to prediction, selection / keep columns, etc.

`BatchPredictor` takes in two components:

* **`Checkpoint`**. A trained model, could be from training or tuning step.
* **`Predictor`**. A class that loads models from `Checkpoint` to perform inference; supports framework-specific predictors (e.g. TorchPredictor and TensorflowPredictor).

Once instantiated, BatchPredictor can call `predict()` on a Ray Dataset. [Ray Datasets](https://docs.ray.io/en/latest/data/dataset.html#datasets) are the standard way to load and exchange data in Ray AIR. Datasets load and preprocess data for parallel compute, internally handling operations like batching, pipelining, autoscaling the actor pool, and memory management.



|<img src="https://technical-training-assets.s3.us-west-2.amazonaws.com/Scaling_inference/air_batchpredictor.png" width="70%" loading="lazy">|
|:--|
|Ray Datasets parallelize data loading, preprocessing, and batching. Ray AIR `BatchPredictor` takes both `Checkpoint` and `Predictor` objects to call `predict()` on a Ray Dataset for distributed batch inference.|

These high-level abstractions automate the challenging aspects of scaling batch inference in exchange for less direct control over the way Ray distributes.

<img src="https://technical-training-assets.s3.us-west-2.amazonaws.com/Scaling_inference/code_batchpredictor.png" width="70%" loading="lazy">

### Batch inference using Ray Core API

#### Stateless inference using Ray Tasks

Part of the Ray Core primitives, [Ray Tasks](https://docs.ray.io/en/latest/ray-core/tasks.html#ray-remote-functions) offer an easy way to distribute inference across a compute cluster. Tasks are Python functions that execute remotely in the cluster, allowing multiple processes to work on different tasks concurrently (see: [Remote Procedure Call](https://en.wikipedia.org/wiki/Remote_procedure_call)).

In this approach, tasks contain replicas of the trained model to compute predictions on input data. Since tasks do not store or modify any internal state, we say they are *stateless*. 

An example of a stateless function in deep learning is the [SGD optimizer](https://en.wikipedia.org/wiki/Stochastic_gradient_descent) because it only updates weights (the output) based on the gradient of the loss function (the input, along with the current weights). No internal state about previously calculated gradients influences how future gradients are calculated.

|<img src="https://technical-training-assets.s3.us-west-2.amazonaws.com/Scaling_inference/task_inference.png" width="70%" loading="lazy">|
|:--|
|During stateless inference, each Ray Task loads the trained model and outputs predictions on assigned batches. This approach scales with the number of available CPUs and GPUs because each inference task is independent of the other concurrent jobs.|

<img src="https://technical-training-assets.s3.us-west-2.amazonaws.com/Scaling_inference/code_task.png" width="70%" loading="lazy">

#### Stateful inference using Ray Actors

In the previous approach, the trained model is loaded and discarded after each batch. This works great for smaller models, however, loading large, complex models into memory can be computationally expensive. In addition, you may want the ability to capture some persistent internal state.

[Ray Actors](https://docs.ray.io/en/latest/ray-core/actors.html) are *stateful objects*, meaning they maintain an internal state. Other examples of stateful objects include Python classes and the [Adam optimizer](https://arxiv.org/abs/1412.6980) commonly used in deep learning. Due to this property, actors can run inference on multiple batches and avoid the overhead of reloading the model after each batch.

Setting up stateful inference involves a few important steps:

1. Create replicas of the trained model as Ray Actors.
2. Feed data into these model replicas in parallel and retrieve predictions.
3. Continue to manage idle actors and assign tasks until the entire inference job completes.

|<img src="https://technical-training-assets.s3.us-west-2.amazonaws.com/Scaling_inference/actor_inference.png" width="70%" loading="lazy">|
|:--|
|Ray Actors can generate predictions on batches of data. Because each actor keeps track of an internal state, it can be reused for inference on multiple batches.|

When using Ray Actors for stateful inference, it is important to implement *load balancing*, or appropriate distribution of work among workers to utilize resources efficiently. This process involves keeping track of in-flight tasks to assign new batches to available actors continuously until the entire process completes.

Using actors directly offers more control over how tasks are assigned. However, you may opt to use the convenient [Ray ActorPool](https://docs.ray.io/en/latest/ray-core/package-ref.html#ray-util-actorpool) utility which handles load balancing (futures management) automatically. This abstraction wraps a list of actors and distributes the workload, allowing you to focus on the inference logic.

|<img src="https://technical-training-assets.s3.us-west-2.amazonaws.com/Scaling_inference/actor_pool.png" width="70%" loading="lazy">|
|:--|
|The ActorPool wraps around a list of `n` actors so you do not have to manage idle actors and manually distribute workloads.|

<img src="https://technical-training-assets.s3.us-west-2.amazonaws.com/Scaling_inference/code_actor.png" width="70%" loading="lazy">

## Part 2: Batch inference example using computer vision transformers

To demonstrate the three design patterns introduced in the previous section, you will apply each approach on a computer vision task: semantic segmentation.

Semantic segmentation, similar to object detection, involves assigning labels to objects in a scene pixel-by-pixel. In this hands-on example, you will run batch inference on image data by using a pretrained model to generate predictions.

### Data

#### MIT ADE20K - scene parsing benchmark

The [MIT ADE20K Dataset](http://sceneparsing.csail.mit.edu/) (also known as "SceneParse150") provides the largest open source dataset for scene parsing. It is often used as a standard for assessing semantic segmentation model performance due to its high-quality annontations. For this example, you will use the unlabeled test data to implement different batch inference architectures.

|<img src="https://technical-training-assets.s3.us-west-2.amazonaws.com/Scaling_inference/scene.png" width="70%" loading="lazy">|
|:--|
|Unannotated scene image from MITADE20K on the left. Pixel-by-pixel predictions on the right. [*Date accessed: November 10, 2022*](https://github.com/CSAILVision/semantic-segmentation-pytorch)|

Dataset highlights

* 20k annotated, scene-centric training images
* 3.3k unlabeled test images
* 150 [semantic categories](https://docs.google.com/spreadsheets/d/1se8YEtb2detS7OuPE86fXGyD269pMycAWe2mtKUj2W8/edit?usp=sharing) (such as person, car, bed, sky, etc.)

### Model

#### SegFormer - transformer-based framework for semantic segmentation

[SegFormer](https://arxiv.org/pdf/2105.15203.pdf) is an effective semantic segmentation method based on a *transformer* architecture. [Transformers](https://en.wikipedia.org/wiki/Transformer_(machine_learning_model)) are a type of deep learning architecture that process sequential data via a series of self-attention layers and then transform them via a feedforward neural network.

What sets SegFormer apart from previous transformer-based approaches are two key features:

1. A hierarchically structured transformer encoder which does not depend on positional encoding that avoids interpolation when training and testing resolutions differ.
2. A lightweight MLP layer that avoids complex decoders.

You will use a pretrained SegFormer model finetuned on [MITADE20K](http://sceneparsing.csail.mit.edu/) to perform batch inference.

|<img src="https://technical-training-assets.s3.us-west-2.amazonaws.com/Scaling_inference/segformer_architecture.png" width="70%" loading="lazy">|
|:--|
|SegFormer architecture showcasing the hierarical transformer encoder and all-MLP decoder. [*Date accessed: November 10, 2022*](https://arxiv.org/pdf/2105.15203.pdf).|


## Part 3: Sequential batch inference

In order to establish familiarity with this batch inference task, you will implement a basic approach with one worker that generates predictions on batches sequentially. To get set up, the semantic segmentation example requires the following steps:

1. Load the pretrained [SegFormer](https://huggingface.co/nvidia/segformer-b0-finetuned-ade-512-512) model.
2. Load the [feature extractor](https://huggingface.co/docs/transformers/v4.16.2/en/model_doc/segformer#transformers.SegformerFeatureExtractor) (preprocessor for scene data).
3. Load [SceneParse150](https://huggingface.co/datasets/scene_parse_150) dataset.
4. Run batch inference on images from the test set.

|<img src="https://technical-training-assets.s3.us-west-2.amazonaws.com/Scaling_inference/single_sequential_timeline.png" width="90%" loading="lazy">|
|:--|
|Timeline of sequential batch inference using a single worker. Tasks can vary in runtime due variations in complexity, data size, and more. |

### Set up necessary imports and utilities

In [None]:
import torch
import numpy as np
import pandas as pd
from PIL import Image
from PIL.JpegImagePlugin import JpegImageFile

# Set the seed to a fixed value for reproducibility.
torch.manual_seed(201)

### Load the model components from the HuggingFace Hub

From the [Hugging Face Hub](https://huggingface.co/docs/hub/index), retrieve the pretrained SegFormer model by specifying the model name and [label files](https://huggingface.co/datasets/huggingface/label-files/blob/main/ade20k-id2label.json) which map indices to semantic categories.

#### Load label mappings

In [None]:
from utils import get_labels

In [None]:
id2label, label2id = get_labels()

print(f"Total number of labels: {len(id2label)}")
print(f"Example labels: {list(id2label.values())[:5]}")

The utility function `get_labels` fetches two dictionary mappings from [Hugging Face](https://huggingface.co/datasets/huggingface/label-files/blob/main/ade20k-id2label.json), `id2label` and `label2id`, which are used to convert between numerical and string labels for the 150 available [semantic categories](https://docs.google.com/spreadsheets/d/1se8YEtb2detS7OuPE86fXGyD269pMycAWe2mtKUj2W8/edit#gid=0) of objects.

#### Load SegFormer

In [None]:
from transformers import SegformerForSemanticSegmentation

In [None]:
MODEL_NAME = "nvidia/segformer-b0-finetuned-ade-512-512"

segformer = SegformerForSemanticSegmentation.from_pretrained(
    MODEL_NAME, id2label=id2label, label2id=label2id
)

print(f"Number of model parameters: {segformer.num_parameters()/(10**6):.2f} M")

The [Hugging Face Hub](https://huggingface.co/nvidia/segformer-b0-finetuned-ade-512-512) makes available many variations on SegFormer. Here, you specify a version finetuned on the MITADE20K (SceneParse150) dataset on images with a 512 x 512 resolution.

Note: This "b0" model is the smallest, with [other options](https://huggingface.co/nvidia/segformer-b5-finetuned-ade-640-640) ranging up to and including "b5". Keep this in mind as something to experiment with when comparing different batch inference architectures later on.

#### Load the feature extractor

In [None]:
from transformers import SegformerFeatureExtractor

In [None]:
segformer_feature_extractor = SegformerFeatureExtractor.from_pretrained(
    MODEL_NAME, reduce_labels=True
)
segformer_feature_extractor

[Feature extractors](https://huggingface.co/docs/transformers/main_classes/feature_extractor) preprocess input features (e.g. image data) by normalizing, resizing, padding, and converting raw images into the shape expected by SegFormer.

The [`reduce_labels`](https://huggingface.co/docs/transformers/model_doc/segformer#segformer) flag ensures that the background of an image (anything that is not explicitly an object) isn't included when computing loss. 

### Load dataset

#### Set up necessary imports

In [None]:
from datasets import load_dataset
from utils import convert_image_to_rgb

In [None]:
SMALL_DATA = True

<div class="alert alert-warning">
  <strong>SMALL_DATA</strong>: a flag to download a subset (160 images) of the available test data. Defaults to True. Set to False (recommended) to work with the full test data (3352 images).
</div>

If you set `SMALL_DATA` to `False`, expect it to take some time (depending on your connection download speed) because you are downloading all test images to your local machine or cluster.

#### Load SceneParse150

In [None]:
DATASET_NAME = "scene_parse_150"

# Load data from the Hugging Face datasets repository.
if SMALL_DATA:
    train_dataset = load_dataset(DATASET_NAME, split="train[:10]")
    test_dataset = load_dataset(DATASET_NAME, split="test[:160]")
else:
    train_dataset = load_dataset(DATASET_NAME, split="train[:10]")
    test_dataset = load_dataset(DATASET_NAME, split="test")

The two datasets serve different purposes:

* **`train_dataset`**  
    * Retrieve a small sample of images for visualization purposes only. Training samples include ground-truth, annotated image regions. Full training dataset contains 20210 images.
* **`test_dataset`**  
    * Used for batch inference purposes. Test samples do not contain ground-truth labels. Full test dataset contains 3352 images.

In [None]:
train_dataset

In [None]:
test_dataset = test_dataset.map(convert_image_to_rgb)
test_dataset

Each sample contains three components:
* **`image`** 
    * The PIL image.
* **`annotation`**  
    * Human annotations of image regions (annotation mask is `None` in testing set).
* **`category`**  
    * Category of the scene generally (e.g. driveway, voting booth, dairy_outdoor).

#### Display example images

In [None]:
from utils import display_example_images

In [None]:
# Try running this multiple times!
display_example_images(train_dataset)

### Run sequential inference on 1 batch

#### Define inference logic

This `predict` function forms the basis for the inference step, and you will reuse variations of this function multiple times throughout each approach for batch inference.

In [None]:
def predict(
    model: SegformerForSemanticSegmentation,
    feature_extractor: SegformerFeatureExtractor,
    images: list[JpegImageFile],
) -> list[np.array]:
    # Set the device on which PyTorch will run.
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model.to(device)  # Move the model to specified device.
    model.eval()  # Set the model in evaluation mode on test data.

    # The feature extractor processes raw images.
    inputs = feature_extractor(images=images, return_tensors="pt")

    # The model is applied to input images in the inference step.
    with torch.no_grad():
        outputs = model(pixel_values=inputs.pixel_values.to(device))

    # Post-process the output for display.
    image_sizes = [image.size[::-1] for image in images]
    segmentation_maps_postprocessed = (
        feature_extractor.post_process_semantic_segmentation(
            outputs=outputs, target_sizes=image_sizes
        )
    )

    # Return list of segmentation maps detached from the computation graph.
    return [j.detach().cpu().numpy() for j in segmentation_maps_postprocessed]

#### Prepare 1 batch of 16 images

In [None]:
from utils import get_image_indices

In [None]:
BATCH_SIZE = 16

# Get BATCH_SIZE randomly shuffled image IDs from the test dataset.
image_indices = get_image_indices(dataset=test_dataset, n=BATCH_SIZE)
image_indices

In [None]:
# Create a list of images by extracting images from random indices sampled from the test data.
batch = [test_dataset[i]["image"] for i in image_indices]
batch

#### Run batch inference

In [None]:
segmentation_maps = predict(
    model=segformer,
    feature_extractor=segformer_feature_extractor,
    images=batch,
)

In [None]:
segmentation_maps[0]

Performing batch inference outputs a list of segmentation maps. Each element in the segmentation map array represents the [semantic category](https://docs.google.com/spreadsheets/d/1se8YEtb2detS7OuPE86fXGyD269pMycAWe2mtKUj2W8/edit#gid=0) of the corresponding pixel in the input image.

Together, you can visualize these predicted segmentation maps by overlaying them onto the original image to see defined regions of objects.

#### Visualize example predictions

In [None]:
from utils import visualize_predictions

In [None]:
visualize_predictions(image=batch[0], segmentation_maps=segmentation_maps[0])

### Run sequential inference on 10 batches

Next, you will test the scalability and performance of the sequential batch inference approach by increasing the number of batches from 1 to 10. This will allow you to observe and verify that this approach can limit performance when scaling.

#### Prepare batches

In [None]:
BATCH_SIZE = 16
N_BATCHES = 10

# Get BATCH_SIZE * N_BATCHES randomly shuffled image IDs from the test dataset.
image_indices = get_image_indices(dataset=test_dataset, n=BATCH_SIZE * N_BATCHES)

# Split indices into N_BATCHES
image_indices_grouped = np.split(np.asarray(image_indices), N_BATCHES)
image_indices_grouped

In [None]:
batches = []

# Create a list of images for each batch of indices sampled from the test dataset.
for image_idx in image_indices_grouped:
    batch = [test_dataset[int(i)]["image"] for i in image_idx]
    batches.append(batch)

batches[0]

#### Run batch inference

In [None]:
predictions = []

In [None]:
for batch in batches:
    segmentation_maps = predict(
        model=segformer,
        feature_extractor=segformer_feature_extractor,
        images=batch,
    )
    predictions.append(segmentation_maps)

Notice that increasing the number of batches by 10 leads to approximately a 10x increase in runtime/ This is the expected result for a sequential approach, which scales linearly with the number of batches.

In [None]:
# Inspect the resulting segmentation maps array.
predictions[0][0]

### Summary: Sequential batch inference

|<img src="https://technical-training-assets.s3.us-west-2.amazonaws.com/Scaling_inference/single_sequential_timeline.png" width="90%" loading="lazy">|
|:--|
|Timeline of sequential batch inference using a single worker. Tasks can vary in runtime due variations in complexity, data size, and more. |

#### Key concepts

<div class="alert alert-info">
  <strong>Batch inference</strong> (also known as offline inference): is the process of generating predictions on a large set or "batch" of data.
</div>

## Part 4: Distributed batch inference with Ray AIR

These high-level APIs automate the challenging aspects of parallelizing and distributing batch inference tasks, allowing you to focus on the inference logic. 

There are four main abstractions that work together to optimize this process:

* [**`Datasets`**](https://docs.ray.io/en/latest/data/dataset.html)  
    * These are used to parallelize data loading, preprocessing, and exchanging data in Ray AIR.
* [**`Checkpoint`**](https://docs.ray.io/en/latest/tune/tutorials/tune-checkpoints.html)  
    * `Checkpoint` objects represent saved models created during training or tuning and provide a common interface for restoring the model's state for tasks such as inference.
* [**`Predictor`**](https://docs.ray.io/en/latest/ray-air/predictors.html)  
    * Ray AIR `Predictors` are a class that load models from `Checkpoint` to perform inference and can be used by `BatchPredictor` to do large-scale inference.
* [**`BatchPredictor`**](https://docs.ray.io/en/latest/ray-air/predictors.html#batch-prediction)  
    * Ray AIR `BatchPredictor` utility takes in a `Checkpoint` and a `Predictor` class and executes large-scale distributed batch prediction on a Ray Dataset when calling `predict()`.

|<img src="https://technical-training-assets.s3.us-west-2.amazonaws.com/Scaling_inference/air_batchpredictor.png" width="70%" loading="lazy">|
|:--|
|Ray Datasets parallelize data loading, preprocessing, and batching. Ray AIR `BatchPredictor` takes both `Checkpoint` and `Predictor` objects to call `predict()` on a Ray Dataset for distributed batch inference.|

Ray handles operations such as batching, pipelining, actor pool autoscaling, and memory management internally, so you can benefit from the scalability and ease of use of Ray AIR without needing to worry about the details of task distribution. Using these abstractions does come with some trade-offs, as you have less control over how Ray distributes the workload.

### Initialize Ray runtime

In [None]:
import ray

In [None]:
ray.init()

### Create a Ray Dataset with 160 images

In [None]:
# Get BATCH_SIZE * N_BATCHES randomly shuffled image IDs from the test dataset.
image_indices = get_image_indices(dataset=test_dataset, n=BATCH_SIZE * N_BATCHES)

# Create a list of images for the indices sampled from the test dataset.
data = [test_dataset[i]["image"] for i in image_indices]

In [None]:
# Create a Ray Dataset from the list of images to use in Ray AIR.
dataset = ray.data.from_items(data)
dataset.show(limit=3)

### Define a custom Predictor for image data

`BatchPredictor` takes in a `Checkpoint` (which will be constructed from the SegFormer model and feature extractor) and a `Predictor`. Ray AIR supports multiple framework-specific [`Predictors`](https://docs.ray.io/en/latest/ray-air/package-ref.html#predictor) such as TorchPredictor and TensorflowPredictor while also allowing for the ability to implement a [custom](https://docs.ray.io/en/latest/ray-air/predictors.html#developer-guide-implementing-your-own-predictor) one. 

Here, you will implement a custom `SemanticSegmentationPredictor`, with the same replicas and core `predict()` logic as before, but with some modifications to fit the `BatchPredictor` pattern.

In [None]:
from ray.air import Checkpoint
from ray.train.predictor import Predictor

In [None]:
class SemanticSegmentationPredictor(Predictor):
    # The constructor method initializes the class to load/cache the model and feature extractor.
    def __init__(
        self,
        model: SegformerForSemanticSegmentation,
        feature_extractor: SegformerFeatureExtractor,
    ):
        super().__init__()
        self.model = model
        self.feature_extractor = feature_extractor

    # This is the same logic as the `predict()` function defined in Part 3,
    # only with pandas DataFrames as inputs and outputs.
    def _predict_pandas(self, batch: pd.DataFrame) -> pd.DataFrame:
        # Set the device on which PyTorch will run.
        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        self.model.to(device)
        self.model.eval()

        # The feature extractor processes raw images.
        batch = [batch["value"][0]]
        inputs = self.feature_extractor(images=batch, return_tensors="pt")

        # The model is applied to input images in the inference step.
        with torch.no_grad():
            outputs = self.model(pixel_values=inputs.pixel_values.to(device))

        # Post-process the output for display.
        image_sizes = [image.size[::-1] for image in batch]
        segmentation_maps_postprocessed = (
            self.feature_extractor.post_process_semantic_segmentation(
                outputs=outputs, target_sizes=image_sizes
            )
        )

        # Post-process the list of segmentation maps into a pandas DataFrame
        df = pd.DataFrame(columns=["segmentation_maps"])
        df.loc[0, "segmentation_maps"] = segmentation_maps_postprocessed

        return df

    # Creates an instance of SemanticSegmentationPredictor using the model and
    # feature extractor contained in the Checkpoint.
    @classmethod
    def from_checkpoint(
        self, checkpoint: Checkpoint, **kwargs
    ) -> "SemanticSegmentationPredictor":
        checkpoint_data = checkpoint.to_dict()
        return SemanticSegmentationPredictor(
            model=checkpoint_data["model"],
            feature_extractor=checkpoint_data["feature_extractor"],
        )

### Create a BatchPredictor

In [None]:
from ray.train.batch_predictor import BatchPredictor

In [None]:
# Construct a BatchPredictor using the SegFormer model and feature extractor along with an instance
# of the custom SemanticSegmentationPredictor class.
batch_predictor = BatchPredictor(
    checkpoint=Checkpoint.from_dict(
        {"model": segformer, "feature_extractor": segformer_feature_extractor}
    ),
    predictor_cls=SemanticSegmentationPredictor,
)

### Run parallel batch inference on a Ray Dataset

In [None]:
predictions_dataset = batch_predictor.predict(data=dataset, batch_size=1)

In [None]:
# Inspect the resulting segmentation maps in this DataFrame.
predictions_dataset.take(limit=1)

**Coding Exercise**

In our example, we used a custom `Predictor`, but Ray AIR's BatchPredictor offers support for a number of framework specific predictors. 

Refer to this [user guide](https://docs.ray.io/en/latest/ray-air/predictors.html#developer-guide-implementing-your-own-predictor) for assistance. Try to implement the same inferencing logic, but this time, use a [HuggingFacePredictor](https://docs.ray.io/en/master/train/api.html?highlight=huggingfacepredictor#ray.train.huggingface.HuggingFacePredictor.predict) instead.

Hint: HuggingFace models expect a specific [configuration](https://huggingface.co/docs/transformers/main_classes/configuration) for loading models as checkpoints. You will want to provide the model and feature extractor from HuggingFace to create a Checkpoint to ensure that you include all the required files. For example:

```python
with tempfile.TemporaryDirectory() as tmpdir:
    huggingface_checkpoint = HuggingFaceCheckpoint.from_model(
        model=segformer, path=tmpdir
    )
    predictor = BatchPredictor.from_checkpoint(
        checkpoint=huggingface_checkpoint,
        predictor_cls=HuggingFacePredictor,
        feature_extractor=segformer_feature_extractor,  # passed to HF pipeline
        task="image-segmentation",  # passed to HF pipeline
        device=-1,
    )
```

In [None]:
### YOUR CODE HERE ###

**Solution**

In [None]:
### SAMPLE IMPLEMENTATION ###

import tempfile
from ray.train.huggingface import HuggingFaceCheckpoint, HuggingFacePredictor

with tempfile.TemporaryDirectory() as tmpdir:
    huggingface_checkpoint = HuggingFaceCheckpoint.from_model(
        model=segformer, path=tmpdir
    )
    predictor = BatchPredictor.from_checkpoint(
        checkpoint=huggingface_checkpoint,
        predictor_cls=HuggingFacePredictor,
        feature_extractor=segformer_feature_extractor,  # passed to HF pipeline
        task="image-segmentation",  # passed to HF pipeline
        device=-1,
    )

    predictions_dataset = predictor.predict(data=dataset, batch_size=1)

predictions_dataset.take(1)

### Summary: Distributed batch inference with Ray AIR

#### Key API elements

* [**`Datasets`**](https://docs.ray.io/en/latest/data/dataset.html)  
    * These are used to parallelize data loading, preprocessing, and exchanging data in Ray AIR.
* [**`Checkpoint`**](https://docs.ray.io/en/latest/tune/tutorials/tune-checkpoints.html)  
    * `Checkpoint` objects represent saved models created during training or tuning and provide a common interface for restoring the model's state for tasks such as inference.
* [**`Predictor`**](https://docs.ray.io/en/latest/ray-air/predictors.html)  
    * Ray AIR `Predictors` are a class that load models from `Checkpoint` to perform inference and can be used by `BatchPredictor` to do large-scale inference.
* [**`BatchPredictor`**](https://docs.ray.io/en/latest/ray-air/predictors.html#batch-prediction)  
    * Ray AIR `BatchPredictor` utility takes in a `Checkpoint` and a `Predictor` class and executes large-scale distributed batch prediction on a Ray Dataset when calling `predict()`.


## Part 5: Distributed batch inference with Ray Core API

### Batch inference with Ray Tasks

Starting with stateless inference with Ray Tasks, you will load the replicas of the SegFormer model and feature extractor onto tasks which will run inference on different batches concurrently. Because these tasks do not store or modify any internal state, we refer to them as stateless tasks.

|<img src="https://technical-training-assets.s3.us-west-2.amazonaws.com/Scaling_inference/task_inference.png" width="70%" loading="lazy">|
|:--|
|In the stateless inference approach, each Ray Task loads the trained model and generates predictions for the assigned batches. This method scales well with the number of available CPUs and GPUs because each inference task can be executed concurrently and independently of the other tasks.|

#### Put the model and feature extractor in the object store

When using Ray, you can pass objects as arguments to remote functions. Ray will automatically store these objects in the local object store (on the worker node where the function is running) using the [`ray.put()`](https://docs.ray.io/en/latest/ray-core/package-ref.html#ray-put) function. This makes the objects available to all local tasks. However, if the objects are large, this can be inefficient as the objects will need to be copied every time they are passed to a remote function.

To improve performance, you can explicitly store both the model and feature extractor in the object store by using `ray.put()`. This avoids the need to create multiple copies of the objects.

It is important to note that if you have multiple worker nodes in your cluster, the objects will need to be copied in memory when they are used on a worker node different from where they are stored. Zero copy is not guaranteed in this case.

|<img src="https://technical-training-assets.s3.us-west-2.amazonaws.com/Overview_of_Ray/object_store.png" width="70%" loading="lazy">|
|:--|
|Diagram of workers in worker nodes using `ray.put()` to place objects and using `ray.get()` to retrieve them from each node's object store.|

In [None]:
segformer_ref = ray.put(segformer)
segformer_feature_extractor_ref = ray.put(segformer_feature_extractor)

<div class="alert alert-warning">
  <strong>Tip</strong>

  Passing the same large argument (model), by value repeatedly <a href="https://docs.ray.io/en/latest/ray-core/patterns/pass-large-arg-by-value.html">harms performance and can cause Out-of-disk for the driver node</a>.
  
  Use the **object store** and **ray.put()** to pass by reference instead (for example, model_ref instead of model).
</div>

#### Define remote function for inference

One way to parallelize predictions in a stateless manner (similar to using lambdas) is to use Ray tasks. Each time a Ray task is called, it loads the trained model from the local object store in order to perform inference. 

This approach allows the prediction task to be stateless, but it incurs the overhead of loading the model each time it is called. This may not be a significant issue for small models, but larger models may experience bottlenecks when loading.

In [None]:
# Use decorator to designate this as a remote function.
@ray.remote
def inference_task(
    model: SegformerForSemanticSegmentation,
    feature_extractor: SegformerFeatureExtractor,
    images: list[JpegImageFile],
) -> list[np.array]:
    # The `predict` function is the same one defined earlier in Part 3.
    return predict(
        model=model,
        feature_extractor=feature_extractor,
        images=images,
    )

<div class="alert alert-warning">
  <strong>Tip</strong>: Batches should be large enough to avoid the anti-pattern of having  <a href="https://docs.ray.io/en/latest/ray-core/patterns/too-fine-grained-tasks.html"> tasks which are too fine-grained</a>.
</div>

#### Prepare batches

In [None]:
# Get BATCH_SIZE * N_BATCHES randomly shuffled image IDs from the test dataset.
image_indices = get_image_indices(dataset=test_dataset, n=BATCH_SIZE * N_BATCHES)

# Split indices into N_BATCHES
image_indices_grouped = np.split(np.asarray(image_indices), N_BATCHES)

batches = []

# Create a list of images for each batch of indices sampled from the test dataset.
for image_idx in image_indices_grouped:
    batch = [test_dataset[int(i)]["image"] for i in image_idx]
    batches.append(batch)

batches[0]

#### Run parallel inference on 10 batches

In [None]:
prediction_refs = []
predictions = []

In [None]:
# Launch all prediction tasks.
for batch in batches:
    # Launch a prediction task by passing model reference, feature extractor
    # reference, and batch of images.
    task_ref = inference_task.remote(
        model=segformer_ref,
        feature_extractor=segformer_feature_extractor_ref,
        images=batch,
    )
    # Collect all object references to batches.
    prediction_refs.append(task_ref)

In [None]:
# Retrieve results.
# Note: This is a synchronous/blocking operation which waits for all processes to complete
# before returning.
predictions = ray.get(prediction_refs)

In [None]:
# Inspect the resulting segmentation maps array.
predictions[0][0]

**Coding Exercise**

You have seen how the sequential version and stateless inference using Ray Tasks performs on 10 batches of 16 images each. Try scaling the number of batches as well as the number of images per batch to see the effect on performance.

Hint: `BATCH_SIZE` and `N_BATCHES` is set in the Part 3 under "Prepare batches"

Note: In order to perform inference on more than 160 images, you need to set the `SMALL_DATA` flag to `False` to download the complete test set.

#### Summary: Distributed, stateless batch inference with Ray Tasks

##### Key concepts

<div class="alert alert-info">
  <strong>Object store</strong>: Ray's distributed shared-memory store that makes remote objects available anywhere in a Ray cluster.
</div>

<div class="alert alert-info">
  <strong>Stateless inference</strong>: Inference that depends only on an inputted trained model and does not preserve state once predictions are generated.
</div>

##### Key API elements

* **`ray.init()`**  
Start Ray runtime and connect to the Ray cluster.

* **`@ray.remote`**  
Decorator that specifies a Python function or class to be executed as a task (remote function) or actor (remote class) in a different process.

* **`.remote`**  
Postfix to the remote functions and classes; remote operations are asynchronous.

* **`ray.put()`**  
Put an object in the in-memory object store; returns an object reference used to pass the object to any remote function or method call.

* **`ray.get()`**  
Get a remote object(s) from the object store by specifying the object reference(s).

### Batch inference with Ray Actors

Moving from stateless to stateful inference, Ray Actors offer the advantage of holding some mutable internal state which allows the actor to fetch the model once and reuse it for all tasks assigned to the actor.

To set up stateful inference using Ray Actors, you will need to follow a few important steps:

1. Create replicas of your trained model as Ray Actors, which can hold mutable internal state and avoid the need to reload large models for each inference job.
2. Feed data into these model replicas in parallel and retrieve predictions.

You can either manually assign tasks to actors (more control) or use the ActorPool utility (more convenient), which automates load balancing for you. If you choose to assign actors manually, you will need to continually manage idle actors and assign tasks until the entire inference job is completed.

|<img src="https://technical-training-assets.s3.us-west-2.amazonaws.com/Scaling_inference/actor_inference.png" width="70%" loading="lazy">|
|:--|
|Ray Actors can generate predictions on batches of data. Because each actor keeps track of an internal state, it can be reused for inference on multiple batches.|

#### Define remote class for inference

In [None]:
# Specify this class as a Ray Actor.
@ray.remote
class PredictionActor:
    # An interface (i.e. constructor) to load/cache the model and feature extractor.
    def __init__(
        self,
        model: SegformerForSemanticSegmentation,
        feature_extractor: SegformerFeatureExtractor,
    ):
        self.model = model
        self.feature_extractor = feature_extractor

    # This is the same logic as the `predict()` function defined in Part 3.
    def predict(self, images: list[JpegImageFile]) -> list[np.array]:
        # Set the device on which PyTorch will run.
        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        self.model.to(device)
        self.model.eval()

        # The feature extractor processes the raw images.
        inputs = self.feature_extractor(images=images, return_tensors="pt")

        # The model is applied to input images in the inference step.
        with torch.no_grad():
            outputs = self.model(pixel_values=inputs.pixel_values.to(device))

        # Post-process the output for display.
        image_sizes = [image.size[::-1] for image in images]
        segmentation_maps_postprocessed = (
            self.feature_extractor.post_process_semantic_segmentation(
                outputs=outputs, target_sizes=image_sizes
            )
        )

        # Return list of segmentation maps detached from the computation graph.
        return [j.detach().cpu().numpy() for j in segmentation_maps_postprocessed]

#### Create a list of Ray Actors

In [None]:
N_ACTORS = 2

# Create a list of actors with N_ACTORS instances of `PredictionActor`.
actors = [
    PredictionActor.remote(
        model=segformer_ref, feature_extractor=segformer_feature_extractor_ref
    )
    for _ in range(N_ACTORS)
]

actors

Note: `N_ACTORS` is initally set to 2 here, which hinders performance. Ideally, you want to set the number of actors to be proportional to the amount of resources you have available, such as number of CPUs and/or GPUs.

#### Prepare batches

In [None]:
# Get BATCH_SIZE * N_BATCHES randomly shuffled image IDs from the test dataset.
image_indices = get_image_indices(dataset=test_dataset, n=BATCH_SIZE * N_BATCHES)

# Split indices into N_BATCHES
image_indices_grouped = np.split(np.asarray(image_indices), N_BATCHES)

batches = []

# Create a list of images for each batch of indices sampled from the test dataset.
for image_idx in image_indices_grouped:
    batch = [test_dataset[int(i)]["image"] for i in image_idx]
    batches.append(batch)

batches[0]

#### Run parallel inference on 10 batches

In [None]:
def prediction_results_postprocessing(
    predictions: list[list[np.array]], segmentation_maps: list[np.array]
) -> list[list[np.array]]:
    predictions.append(segmentation_maps)

`prediction_results_postprocessing` is simple function in this tutorial and exists to abstract away the final processing step. In practice it will likely be much more complex.

In [None]:
predictions = []  # A list of final predictions.
future_to_actor_mapping = (
    {}
)  # A mapping of object references to the actor that promised them.

In [None]:
# Make a copy to avoid modifying the original list of actors.
idle_actors = actors.copy()

while batches:
    # Assign batches to available actors.
    if idle_actors:
        actor = idle_actors.pop()
        batch = batches.pop()
        future = actor.predict.remote(images=batch)
        # Map the future to the actors executing prediction.
        future_to_actor_mapping[future] = actor

    # Retrieve the completed tasks and process them.
    else:
        # Retrieve the first future to return.
        [ready], _ = ray.wait(list(future_to_actor_mapping.keys()), num_returns=1)

        # Get the actor with the completed task and add back to idle list.
        actor = future_to_actor_mapping.pop(ready)
        idle_actors.append(actor)

        # Post-processing on on result using ray.get() to retrieve result from reference.
        prediction_results_postprocessing(
            predictions=predictions, segmentation_maps=ray.get(ready)
        )

# Process any leftover results at the end.
for future in future_to_actor_mapping.keys():
    prediction_results_postprocessing(
        predictions=predictions, segmentation_maps=ray.get(future)
    )

<div class="alert alert-warning">
  <strong>Tip</strong>: <a href="https://docs.ray.io/en/latest/ray-core/package-ref.html#ray-wait">ray.wait() </a> is a synchronous operation that allows you to process results without waiting on all tasks to complete. It also <a href="https://docs.ray.io/en/master/ray-core/patterns/limit-pending-tasks.html"> limits the number of  pending tasks </a> so that the pending task queue won't grow indefinitely and cause out of memory problems.
</div>

|<img src="https://technical-training-assets.s3.us-west-2.amazonaws.com/Scaling_inference/distributed_timeline.png" width="70%" loading="lazy">|
|:--|
|Timeline of distributed batch inference where batches are assigned as soon as a worker completes a task and becomes available.|

In [None]:
# Inspect the resulting segmentation maps array.
predictions[0][0]

**Coding Exercise**

In this tutorial, the default setting for `N_ACTORS` is 2. Try setting the number of actors to the number of CPUs/GPUs you have available. 

How does this affect runtime performance?

Hint: Change `N_ACTORS` in the section called "Create list of Ray Actors."

#### Using Ray ActorPool utility

You have just manually managed batch assignment and task scheduling on Ray Actors for batch inference. This offers plenty of granular control over exactly how to distribute work and monitor in-flight tasks. 

However, you may choose to opt to use the convenient Ray [ActorPool](https://docs.ray.io/en/latest/ray-core/package-ref.html#ray-util-actorpool) utility which wraps the list of actors to automatically handle futures management. In this short example, we will recreate this approach and demonstrate how to use this utility abstraction.

|<img src="https://technical-training-assets.s3.us-west-2.amazonaws.com/Scaling_inference/actor_pool.png" width="70%" loading="lazy">|
|:--|
|The ActorPool wraps around a list of `n` actors so you do not have to manage idle actors and manually distribute workloads.|

#### Prepare batches

In [None]:
# Get BATCH_SIZE * N_BATCHES randomly shuffled image IDs from the test dataset.
image_indices = get_image_indices(dataset=test_dataset, n=BATCH_SIZE * N_BATCHES)

# Split indices into N_BATCHES
image_indices_grouped = np.split(np.asarray(image_indices), N_BATCHES)

batches = []

# Create a list of images for each batch of indices sampled from the test dataset.
for image_idx in image_indices_grouped:
    batch = [test_dataset[int(i)]["image"] for i in image_idx]
    batches.append(batch)

batches[0]

#### Create ActorPool

In [None]:
from ray.util.actor_pool import ActorPool

In [None]:
# Wrap the actors in an ActorPool utility to automatically handle future management.
actor_pool = ActorPool(actors)

Note: The [ActorPool](https://docs.ray.io/en/latest/ray-core/package-ref.html#ray-util-actorpool) is fixed in size, unlike task-based approach where the number of parallel tasks can be dynamic. To have autoscaling of the ActorPool, you will need to use the Ray AIR approach discussed in the next approach.

#### Run parallel inference on 10 batches

In [None]:
# Runs prediction and returns object references to segmentation maps.
def actor_call(
    actor: ray.actor.ActorHandle, batch_of_images: list[list[JpegImageFile]]
) -> list[ray._raylet.ObjectRef]:
    return actor.predict.remote(images=batch_of_images)

In [None]:
predictions = []  # A list of final predictions.

In [None]:
for segmentation_maps in actor_pool.map_unordered(actor_call, batches):
    prediction_results_postprocessing(
        predictions=predictions, segmentation_maps=segmentation_maps
    )

By using the `ActorPool` utility, you were able to easily run distributed batch inference with just a few lines of code. The `map_unordered` function runs the defined inference logic on each batch and handles post-processing, eliminating the need for manual orchestration of actors. This simplifies the process and reduces the need for monitoring tasks and actors at various stages of completion.

Note: `map_unordered` has slightly better efficiency that a similar method `actor_pool.map` since this example does not preference the order of the results.

In [None]:
# Inspect the resulting segmentation maps array.
predictions[0][0]

**Coding Exercise**

While the `ActorPool` utility offers a good level of abstraction above orchestrating actors directly, there are [methods](https://docs.ray.io/en/latest/ray-core/package-ref.html?highlight=actorpool#ray-util-actorpool) available to you to schedule tasks, inspect in-flight jobs, and retrieve idle actors.

Try look into the actor pool by printing out which actors are idle and which tasks remain during the inferencing step.

#### Shutdown Ray runtime

In [None]:
# Terminate processes started by ray.init().
ray.shutdown()

#### Summary: Distributed, stateful batch inference with Ray Actors

##### Key concepts

<div class="alert alert-info">
  <strong>Stateful inference</strong>: Inference carried out over stateful processes where Ray actors hold model replicas and can mutate and persist state.
</div>

##### Key API elements

* **`ActorPool()`**  
Wraps the list of actors that run inference.


## Summary: Choosing a Ray architecture for batch inference

In this module, you have encountered two approaches for scalable, distributed batch inference with Ray:

1. Batch inference with Ray AI Runtime (AIR)
1. Batch inference with Ray Core API

To compare these architectures and choose the appropriate one for your specific needs, consider two main categories: Ray Core-based approaches and Ray AI Runtime (AIR)-based approaches.

### Ray Core-based approaches
These approaches involve using Ray Core primitives, such as Tasks, Actors, and the optional [ActorPool utility](https://docs.ray.io/en/latest/ray-core/package-ref.html#ray-util-actorpool). Each method shares the ability to offer fine-grained control over the behavior of a distributed application. Ray Core-based batch inference exposes the parallelism mechanisms and allows you to specify *how* inference should be executed.

For some, this high level of control fits their use case well. One of Anyscale's customers performs large-scale batch inference on satellite images. Because each image is so large, it makes sense for them to run prediction on single images. Using Ray to parallelize inference allows them to control compute resources assigned to individual tasks, and this straightforward application significantly cuts down on runtime over alternative methods.

In a similar vein, [Dendra](https://dendra.io/) utilizes model replicas on Ray Actors to run batch inference on aerial imagery for [ecological restoration](https://www.anyscale.com/blog/how-ray-and-anyscale-make-it-easy-to-do-massive-scale-machine-learning-on). Ray allows their inference pipeline to scale from thousands to millions of ecosystem images without having to change any underlying infrastructure. Richard Decal, the Lead ML Engineer at Dendra, explains "This approach allowed us to maximize our network I/O and GPUs usage across the cluster." Now, Dendra can focus on its core business while offloading the compute management to Ray and Anyscale.

### Ray AI Runtime (AIR)-based approaches
These approaches involve using Ray AIR's functionality through [BatchPredictor](https://docs.ray.io/en/latest/ray-air/package-ref.html#batch-predictor) performing inference on [Ray Datasets](https://docs.ray.io/en/latest/data/api/dataset.html). By using these high-level abstractions, you define *what* needs to be done rather than *how* it should be done. You can focus on the inference logic for your specific use case while delegating the responsibility of distributing execution to Ray.

Ray AIR may be a convenient choice for a variety of reasons:

* [BatchPredictor](https://docs.ray.io/en/latest/ray-air/package-ref.html#batch-predictor) gives you out-of-the-box predictors, handles framework native batch conversions, and gives you an option to resume from AIR Checkpoint.
* Ray Datasets handle distributed processing, creating batches of data, pipelining, autoscaling, and memory management.
* Ray AIR libraries are connected, giving you an option to scale other parts of your pipeline in the future.

### Key differences

| |Ray Core|Ray AI Runtime|
|:-:|:-:|:-:|
|**Expose parallelism**|Yes|No|
|**Scalable from workstation to large cluster**|Yes|Yes|
|**Integrations**|Build yourself|Out-of-the-box (PyTorch, TF and [more](https://docs.ray.io/en/latest/ray-air/package-ref.html#trainer-and-predictor-integrations))|
|**Data pre-processing**|Build yourself|Out-of-the-box support|

### Developer experience

When working with [Ray Core]((https://docs.ray.io/en/latest/ray-core/walkthrough.html)) and [Ray AIR](https://docs.ray.io/en/latest/ray-air/getting-started.html), you will notice that they have different focuses. Ray Core enables developers to build and scale distributed Python applications and provides a high level of control, as you write Python code and scale it using Ray primitives. 

On the other hand, Ray AIR is a toolkit for simplifying ML compute and abstracts away parallel computing primitives, making it easier to use but with a lower level of control.

| |Ray Core|Ray AI Runtime|
|:-:|:-:|:-:|
|**Level of control**|High|Medium|
|**Directly program distributed apps**|Yes|No|
|**Flexibility**|High|Medium|
|**Ease of use**|Medium|High|
|**Entry barrier**|Depend on use case|Low|

### Recommendations*

In general, the final recommendation **depends heavily on your specific use case**. Ray Core-based solutions perform well in situations where the batch inference pipeline contains stages that are well-suited for straightforward parallelization. These applications can easily be divided into smaller, independent tasks that can be processed concurrently, without requiring significant communication or coordination between the different processing units.

For most batch inference problems, however, it is advisable to **start by using Ray AIR's BatchPredictor** due to its lower barrier to entry, built-in optimizations, ability to abstract away the more tedious aspects of distribution, among many reasons discussed previously.

**Keep in mind that these recommendations are meant to provide initial guidance and are not a definitive guide. If you have any questions or want to discuss your use case further, you can reach out on our Slack channel.*

|<img src="https://technical-training-assets.s3.us-west-2.amazonaws.com/Scaling_inference/actor_inference.png" width="90%" loading="lazy">|<img src="https://technical-training-assets.s3.us-west-2.amazonaws.com/Scaling_inference/air_batchpredictor.png" width="90%" loading="lazy">|
|:-:|:-:|
|Ray Core-based approach with Ray Actors|Ray AIR-based approach with BatchPredictor|

# Connect with the Ray community

You can learn and get more involved with the Ray community of developers and researchers:

* [**Ray documentation**](https://docs.ray.io/en/latest)

* [**Official Ray Website**](https://www.ray.io/)  
Browse the ecosystem and use this site as a hub to get the information that you need to get going and building with Ray.

* [**Join the Community on Slack**](https://forms.gle/9TSdDYUgxYs8SA9e8)  
Find friends to discuss your new learnings in our Slack space.

* [**Use the Discussion Board**](https://discuss.ray.io/)  
Ask questions, follow topics, and view announcements on this community forum.

* [**Join a Meetup Group**](https://www.meetup.com/Bay-Area-Ray-Meetup/)  
Tune in on meet-ups to listen to compelling talks, get to know other users, and meet the team behind Ray.

* [**Open an Issue**](https://github.com/ray-project/ray/issues/new/choose)  
Ray is constantly evolving to improve developer experience. Submit feature requests, bug-reports, and get help via GitHub issues.

* [**Become a Ray contributor**](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html)  
We welcome community contributions to improve our documentation and Ray framework.

<img src="https://technical-training-assets.s3.us-west-2.amazonaws.com/Generic/ray_logo.png" width="20%" loading="lazy">