# Introduction to Ray AI Runtime (AIR)

<img src="../_static/assets/Generic/ray_logo.png" width="20%" loading="lazy">

## About this notebook

### Is it right for you?

This notebook is an example-based introduction to the Ray AI Runtime (AIR).

You will go through an end-to-end example that covers data loading, training, hyper-parameter tuning, predicting and serving. Along the way you will learn about Ray AIR's specialized libraries that collectively form a unified API for scalable ML applications.

It is right for you if:

* have basic familiarity with Ray project
* you want to learn about Ray AIR: the unified API for scalable ML applications
* you have an existing ML application or workload and you look for tools that will let you scale it easily.

### Prerequisites

For this notebook you should have:

* practical Python and machine learning experience

You have completed:
* [Overview of Ray](https://github.com/ray-project/ray-educational-materials/blob/main/Introductory_modules/Overview_of_Ray.ipynb)

### Learning objectives

Upon completion of this notebook, you will know about:

* high-level ML libraries that compose Ray AIR: Data, Train, Tune, Serve, and RLlib
* how to use Ray AIR as a unified toolkit to write an end-to-end ML application in Python as well as scale individual jobs
* problems and challenges that Ray AIR attempt to solve

### What will you do?

You will run and analyze an end-to-end example that covers all Ray AIR libraries. Via hands-on exercises you will practice the key concepts from each stage of the example ML workflow:

|ML workflow stage|Ray AIR key concept|
|:--|:--|
|data loading and preprocessing|`Preprocessor` to load and transform data|
|model training|`Trainer` for supported ML frameworks (Keras, Pytorch and more)|
|hyper-parameter tuning|`Tuner` for hyperparameter search|
|batch prediction at scale|`BatchPredictor` to load model from best checkpoint for batch inference|
|model serving|`PredictorDeployment` for online inference|

## Part 1: Overview of Ray AI Runtime (AIR)

<div class="alert alert-info">
  <strong><a href="https://docs.ray.io/en/latest/ray-air/getting-started.html" target="_blank">Ray AI Runtime (AIR)</a></strong> is an open-source, Python, domain specific library that equips ML engineers, data scientists, and researchers with a scalable and unified toolkit for ML applications.
</div>

Ray AIR is built on top of Ray core. It caters for distributed data processing, model training, tuning, model serving, and reinforcement learning, all in Python. To that end it enables both individual workloads and end-to-end use cases to be implemented in the single unified library.

### Machine learning workflow with Ray AIR

Each of the five native libraries that Ray AIR wraps is focused on a specific stage of the ML workflow. Because this abstraction layer is built on top of Ray Core, it is distributed and scalable. Ray AIR brings together an ever-growing ecosystem of integrations with your favorite machine learning frameworks.

|<img src="../_static/assets/Introduction_to_Ray_AIR/e2e_air.png" width="70%" loading="lazy">|
|:--|
|Ray AIR enables end-to-end ML development and provides multiple options to integrate with other tools and libraries form the MLOps ecosystem.|

1. [Ray Data](https://docs.ray.io/en/latest/data/dataset.html): scalable, framework-agnostic loading and transforming raw data
1. [Ray Train](https://docs.ray.io/en/latest/train/train.html): distributed multi-node and multi-core model training with fault tolerance that integrates with your favorite training libraries
1. [Ray Tune](https://docs.ray.io/en/latest/tune/index.html): scales experiment execution and hyper-parameter tuning to optimize model performance
1. [Ray Serve](https://docs.ray.io/en/latest/serve/index.html): deploys your model for online or batch inference
1. [Ray RLlib](https://docs.ray.io/en/latest/rllib/index.html): distributed reinforcement learning workloads that integrate with the other Ray AIR libraries above

## Part 2: End to end ML workflow with Ray AI Runtime

### Overview

Predicting Big Tips w/ NYC Taxi Data
example application: predicting big tips on yellow taxi cabs in New York City.

To illustrate Ray AIR's capabilities, you will implement an end-to-end example, building a simple machine learning pipeline using Ray Data, Train, Tune, and Serve. Each part will introduce key concepts, integrations, and typical workloads for the AIR library before demonstrating its functionality with a code example.

#### Data

#### Model

#### Notes
Suppose we want to build an application for taxi drivers in NYC that predicts if a given ride will result in a large tip (<20%). This has the potential to influence drivers' decisions when accepting jobs to maximize their margin, and conversations around [information accessbility for gig workers](https://www.nytimes.com/2022/10/11/technology/gig-workers-drivers-para-app.html) are making waves in the news. For this project, let's use the [New York City Taxi & Limousine Commission's Trip Record Data](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page) to build a binary classification model. Starting off, let's take the yellow cab data from June 2021 which contains over 2 million samples with features including `passenger_count`, `trip_distance` (in miles), `fare_amount` (including tax, tip, fees, etc.), `trip_duration` (in seconds), `hour` (hour that trip started), `day_of_week`, and our target `is_big_tip` (whether the tip amount was greater than 20%).

Our workflow will consist of loading data, setting up a preprocessor, training the model with XGBoost, tuning hyperparameters, performing batch inference, and finally serving our online application.

### Ray Data


![Data Highlight](../_static/assets/Introduction_to_Ray_AIR/data_highlight.png)

First up, we want to load in the taxi dataset and transform its raw input into features that will be given to our machine learning model.

[Ray Datasets](https://docs.ray.io/en/latest/data/user-guide.html) are the standard way to load and pass data in Ray libraries and applications. This common basis for data handling allows users to leverage different libraries from the Ray ecosystem in whatever way serves their needs without being tethered to a particular framework.

The benefits of using the core `Dataset` abstractions for loading, transforming, and passing references to data in a Ray cluster include:

- **Flexibility**: Compatible with a variety of file formats, data sources, and distributed frameworks, Datasets work seamlessly with library integrations like Dask on Ray and can be passed between Ray tasks and actors without copying data.
- **Performance for ML Workloads**: Datasets offers important features like accelerator support, pipelining, and global random shuffles that accelerate ML training and inference workloads along with basic distributed data transformations such as map, filter, sort, groupby, and repartition.
- **Persistent Preprocessor**: The `Preprocessor` primitive explicitly captures and stores the transformations applied to convert inputs into features and is applied at both training and serving to keep the processing consistent across the pipeline.
- **Built on Ray Core**: inherits scalability to hundreds of nodes, efficient memory usage due to memory across processes on the same node, and object spilling and recovery to handle failures. Because Datasets are just lists of object references, they can be passed between tasks and actors without needing to make a copy of the data, which is crucial for making data-intensive applications and libraries scalable.

In *Figure 3* below, you can see the a general pattern for creating a `Dataset`, configuring a `Preprocessor`, and passing these into the `Trainer` for consistent data handling throughout the pipeline.

![Ray Data Code Snippet](../_static/assets/Introduction_to_Ray_AIR/data_code.png)

*Figure 3*

Let's take this generic structure and see how it plays out with our tip prediction task.

#### Start Ray runtime
To start, we'll import Ray (check out our [installation instructions](https://docs.ray.io/en/latest/ray-overview/installation.html)) and start a Ray cluster on our machine that can utilize all the cores available to you as workers. We use `ray.is_initialized` to ensure that we only have one Ray cluster active.

In [None]:
import ray

if ray.is_initialized:
    ray.shutdown()

ray.init()

#### Create Ray Dataset
Here, we read in the data from an S3 `.parquet` datasource, a column-major format designed to support fast data processing.

In [None]:
dataset = ray.data.read_parquet(
    "s3://anyscale-training-data/intro-to-ray-air/nyc_taxi_2021.parquet"
)

In [None]:
# split data into training and validation subsets
train_dataset, valid_dataset = dataset.train_test_split(test_size=0.3)

In [None]:
# split datasets into blocks for parallel preprocessing
# num_blocks should be lower than number of cores in the cluster
train_dataset = train_dataset.repartition(num_blocks=5)
valid_dataset = valid_dataset.repartition(num_blocks=5)

**Coding Exercise**

There exist many [`Dataset` API elements](https://docs.ray.io/en/latest/data/api/dataset.html#) available for common transformations and operations. Using the above as a reference:
1. Inspect the schema from the underlying Parquet metadata.
2. Count how many rows are in the training and validation datasets.
3. Inspect the first five samples of either dataset.
4. What is the average `fare_amount` grouped by `passenger_count`?

In [None]:
### YOUR CODE HERE ###

**Solution**

In [None]:
### SAMPLE IMPLEMENTATION ###

print(f"Schema of Training Dataset: \n {train_dataset.schema()}")  # <1>

print(f"Number of Samples in Training Dataset: \n {train_dataset.count()}")  # <2>
print(f"Number of Samples in Validation Dataset: \n {valid_dataset.count()}")  # <2>

train_dataset.show(5)  # <3>

train_dataset.groupby("passenger_count").mean("fare_amount").show()  # <4>

#### Preprocess dataset
To transform our raw data -> features, we'll define a `Preprocessor`. What's nice about a Ray AIR `Preprocessor` is that it is automatically incorporated...

- **During Training**: `Preprocessor` is passed into a `Trainer` to `fit` and `transform` input `Dataset`s.
- **During Tuning**: each `Trial` will instantiate its own copy of the `Preprocessor` and the fitting and transformation logic will occur once per `Trial`
- **During Checkpointing**: the `Preprocessor` is saved in the `Checkpoint` if was passed into the `Trainer`
- **During Predicting**: if the `Checkpoint` contains a `Preprocessor`, then it will be used to call `transform_batch` on input batches prior to performing inference

In the code below, we define a `MinMaxScaler` preprocessor that will scale the `trip_distance` and `trip_duration` columns by their range.

In [None]:
from ray.data.preprocessors import MinMaxScaler

# create a preprocessor to scale some columns
preprocessor = MinMaxScaler(columns=["trip_distance", "trip_duration"])

**Coding Exercise**

Ray AIR provides several [preprocessors out of the box](https://docs.ray.io/en/latest/ray-air/preprocessors.html#) as well as support for implementing custom preprocessors. 

For this exercise, visualize the distribution for each of the features in our dataset, read through the "Which preprocessor should you use?" section of the linked user guide above, and determine whether `MinMaxScaler` applied to `trip_distance` and `trip_duration` is sufficient.

Later on, you can compare model performance between the given preprocessor and your custom configuration.

In [None]:
### YOUR CODE HERE ###

**Solution**

In [None]:
### SAMPLE IMPLEMENTATION ###

from ray.data.preprocessors import *

pd_df = train_dataset.to_pandas(limit=1893433)
pd_df.hist("trip_distance")
pd_df.hist("trip_duration")

sample_preprocessor = PowerTransformer(
    columns=["trip_distance", "trip_duration"], power=0.5
)

Notice the positively-skewed distributions for `trip_distance` and `trip_duration`. For these numerical features, you can choose an appropriate AIR `Preprocessor` depending on your data's properties:

- `PowerTransformer`: your data isn't normal, but you need it to be
- `Normalizer`: you need unit norm rows
- `MinMaxScaler`: you aren't sure what your data looks like

Feature scaling can offer a performance boost during training, and testing choice of `Preprocessor` is worth investigating when you have few features which are not already unit normalized.

**Key Concepts in This Section**

`Dataset`: The standard way to load and exchange data in Ray AIR. In AIR, Datasets are used extensively for data loading, preprocessing, and batch inference.

`Preprocessors`: Preprocessors are primitives that can be used to transform input data into features. Preprocessors operate on Datasets, which makes them scalable and compatible with a variety of datasources and dataframe libraries. A Preprocessor is fitted during Training, and applied at runtime in both Training and Serving on data batches in the same way. AIR comes with a collection of built-in preprocessors, and you can also define your own with simple templates which you can read more about in our [User Guide](https://docs.ray.io/en/latest/ray-air/preprocessors.html).

### Ray Train
***

![Train Highlight](../_static/assets/Introduction_to_Ray_AIR/train_highlight.png)

Following data preprocessing, we can move forward with defining our model for binary classification of big tip rides.

[Ray Train](https://docs.ray.io/en/latest/ray-air/trainer.html) is a library for distributed training on Ray. It offers key tools for different parts of the training workflow, from feature processing, to scalable training, to integrations with ML tracking tools, to export mechanisms for models.

Ray AIR `Trainer`s enable users to distribute training with popular machine learning frameworks like PyTorch, Tensorflow, XGBoost, HuggingFace Transformers, Scikit-Learn, and more. Train supports features like callbacks for early stopping, checkpointing, and integration with Tensorboard, Weights/Biases, and MLflow for observability.

ML pracitioners tend to run into a few common problems with training models that prompt them to consider distributed solutions:

1. training time is too long to be practical
2. the data is too large to fit on one machine
3. the model itself is too large to fit on a single machine

Ray Train tackles the first problem by running distributed multi-node training with fault tolerance, leveraging Ray Data to scale preprocessing and distributed data ingestion. It is also composable with Ray Tune for scaling hyperparameter tuning and outputs the trained model in the form of a `Checkpoint` for batch inference.

In *Figure 4* below, you see that training comes in two major parts: defining the `Trainer` object and then fitting it to the training dataset. In this code snippet, we use a `TorchTrainer`, however, this may be swapped out with any [integrations](https://docs.ray.io/en/latest/ray-air/package-ref.html#trainer-and-predictor-integrations).

![Ray Train Code Snippet](../_static/assets/Introduction_to_Ray_AIR/train_code.png)

*Figure 4*

Let's put these concepts in practice by applying it to our taxi problem.

#### Define AIR `Trainer`

There are three broad categories of Trainers that AIR offers:

- Deep Learning Trainers (Pytorch, Tensorflow, Horovod)
- Tree-based Trainers (XGBoost, LightGBM)
- Other ML frameworks (HuggingFace, Scikit-Learn, RLlib)

In the example below, we will use an `XGBoostTrainer`to perform binary classification on these NYC Taxi rides. To construct a `Trainer`, you provide:

- a `ScalingConfig` which specifies how many parallel training workers and what type of resources (CPUs/GPUs) to use per worker during training.
- a collection of datasets and a preprocessor for the provided datasets which configures preprocessing and the datasets to ingest from

Optionally, you can choose to add `resume_from_checkpoint` which is a checkpoint path to resume from, should your training run be interrupted.

Below, we'll set up an `XGBoostTrainer` for our classification task. [XGBoost](https://xgboost.readthedocs.io/en/stable/) is a gradient boosted decision trees library. We'll then supply our `Preprocessor` from the previous step as well as training and validation datasets to ingest.

In [None]:
from ray.air.config import ScalingConfig
from ray.train.xgboost import XGBoostTrainer

trainer = XGBoostTrainer(
    label_column="is_big_tip",
    num_boost_round=50,
    scaling_config=ScalingConfig(
        num_workers=1,
        use_gpu=False,
    ),
    params={
        "objective": "binary:logistic",
        "eval_metric": ["logloss", "error"],
        "tree_method": "approx",
    },
    datasets={"train": train_dataset, "valid": valid_dataset},
    preprocessor=preprocessor,
)

#### Fit the Trainer

To invoke training, call `.fit()`. Trainer objects produce a `Result` object which gives you access to metrics, checkpoints, and errors.

In [None]:
result = trainer.fit()

**Coding Exercise**

You can check out the training results from the `Result` object with the following calls:

```python
# returns last saved checkpoint
result.checkpoint

# returns the `n` best saved checkpoints as configured in `RunConfig.CheckpointConfig`
result.best_checkpoints

# returns the final metrics as reported
result.metrics

# returns the contain an Exception if training failed
result.error
```

Inspect your training result below. What is the reported accuracy for the training and validation runs? 

Note: `error` is the binary classification error rate in this case calculated as `#(wrong cases)/#(all cases)`

In [None]:
### YOUR CODE HERE ###

**Solution**

In [None]:
### SAMPLE IMPLEMENTATION ###

print(f"Result Metrics: \n {result.metrics} \n")
print(f"Training Accuracy: \n {1 - result.metrics['train-error']} \n")
print(f"Validation Accuracy: \n {1 - result.metrics['valid-error']} \n")

**Key Concepts in This Section**

`Trainer`: Trainers are wrapper classes around third-party training frameworks such as XGBoost, Pytorch, and Tensorflow. They are built to help integrate with core Ray Actors (for distribution), Ray Tune, and Ray Datasets.

### Ray Tune
***

![Tune Highlight](../_static/assets/Introduction_to_Ray_AIR/tune_highlight.png)

Now that we have a baseline XGBoost model trained, we find the classification accuracy lacking. Among several methods to improve performance (collecting more data, feature engineering, choosing a different algorithm, transfer learning, etc.), **hyperparameter tuning** involves inserting the training loop into an optimization method to find the optimal set of hyperparameters and can be a powerful way to run experiements to achieve good results.

*Hyperparameters*, unlike model parameters which are learned by the model as it trains, are parameters that *you, the human* set. These hyperparameters remain static through a `trial` or experiement and influence the final outcome of training. For example, some common variables to adjust could include:

- `max_depth` in decision tree models
- `drop_out` rate in neural networks
- `discount_factor` in Q-learning
- `num_iterations` in logistic regression
- `n_grams` size of "n" in natural language processing

Setting up and executing hyperparameter optimization (HPO) in itself can be expensive in terms of compute resources and runtime, but there are several intricacies in making the process work *well*, including:

- **Vast Search Space**: your model could have anywhere between a handful to several dozen available hyperparameters, each with different data types and ranges. Some parameters might be correlated. Sampling good candidates from high-dimensional spaces is difficult.
- **Search Algorithms**: choosing hyperparameters at random can work surprisingly well, but in general, you need to test complex search algirhtms to achieve the best result.
- **Long Runtime**: even if you distribute tuning, training complex models in themselves can take a long time to complete per run, so it's best to have an efficiency at every stage in the pipeline.
- **Resource Allocation**: you must have enough compute resources available to during each trial as to not slow down search because of scheduling mismatches.
- **User Experience**: HPO is complicated, and visibility and tooling for developers like stopping bad runs early, saving intermediate results, restarting from checkpoints, or pausing and resume runs makes the process easier on the human.

Ray Tune is a distributed HPO library that addresses all of these topics above to provide a simplified interface for running trials and integrates with popular frameworks such as HyperOpt, Optuna, and many more.

In *Figure 5*, you'll find the general pattern for using AIR `Tuner`s which involves taking in a trainable, defining a search space, establishing a search algorithm, scheduling trials, and analyzing results. We'll go over the relevant components in the following section.

![Ray Tune Code Snippet](../_static/assets/Introduction_to_Ray_AIR/tune_code.png)

*Figure 5*

Let's see how to interact with Ray Tune to make some improvements to our big tip classifier.

#### Use AIR `Tuner` for hyperparameter search

To set up an AIR `Tuner`, we must specify:

- `search space`: a set of hyperparameters you wish to tune
- `search_algorithm`: to optimize parameter search
- `scheduler`: (optional) to stop searches early and speed up experiments

We pass the `search space`, `search algorithm`, `scheduler`, and `Trainer` to the `Tuner`, which runs the workload by evaluating multiple hyperparameters in parallel. Afterwards, `Tuner` returns its results in a `ResultGrid` for you to analyze.

Below, we'll define a search space with a few hyperparameters to tune. 

- `eta` is the learning rate
- `max_depth` specifies how deep each tree is with a default of 6. A higher value leads to a more complex model. Using `tune.randint(1, 9)`, it will sample an integer uniformly between 1 and 9, inclusive.
- `min_child_weight` defines the minimum sum of weights of all observations in a child, used to control overfitting

In [None]:
from ray import tune
from ray.tune.tuner import Tuner, TuneConfig

param_space = {
    "params": {
        "eta": tune.uniform(0.2, 0.4),
        "max_depth": tune.randint(1, 9),
        "min_child_weight": tune.uniform(0.8, 1.0),
    }
}

tuner = Tuner(
    trainer,
    param_space=param_space,
    tune_config=TuneConfig(num_samples=1, metric="train-logloss", mode="min"),
)

#### Execute hyperparameter search and analyze results

Now, we can execute tuning on our 10 trials. After tuning, we can query the `ResultGrid` object to see metrics, results, and checkpoints of each trial.

In [None]:
result_grid = tuner.fit()

**Coding Exercise**

You can probe the `ResultGrid` for metrics using these calls:

```python

# checks if there have been errors
result_grid.errors

# gets the best result
best_result = result_grid.get_best_result()

# gets the best checkpoint
best_checkpoint = best_result.checkpoint

# gets the best metrics
best_metrics = best_result.metrics

```

Inspect your tunings results, what is the best result from these experiments? Are they better than the baseline model in the training step in the previous section?

In [None]:
### YOUR CODE HERE ###

**Solution**

In [None]:
### SAMPLE IMPLEMENTATION ###

best_result = result_grid.get_best_result()

print(f"Best Result: \n {best_result} \n")
print(f"Training Accuracy: \n {1 - best_result.metrics['train-error']} \n")
print(f"Validation Accuracy: \n {1 - best_result.metrics['valid-error']} \n")

**Key Concepts in This Section**

`Tuner`: provides an interface that works with AIR `Trainer`s to perform distributed hyperparameter tuning. You define a set of hyperparameters you wish to tune in a search space, specify a search algorithm, and the `Tuner` returns its results in a `ResultGrid` that contains metrics, results, and checkpoints for each `trial`.

### Ray AIR Predictors
***

Ray AIR Predictors load models from your [checkpoints](https://docs.ray.io/en/latest/ray-air/key-concepts.html#air-checkpoints-doc) generated during training or tuning to perform distributed inference.

During batch prediction, the input batch is converted into a Pandas DataFrame. If there is a `Preprocessor` saved in the provided `Checkpoint`, the preprocessor will be used to transform the DataFrame. The transformed DataFrame is then passed to the model for ingerence and outputted predictions will be of the same type as the original input.

In *Figure 6*, you can see how `BatchPredictor` is passed a `Checkpoint` and `Predictor`.

![Batch Predictor Code Snippet](../_static/assets/Introduction_to_Ray_AIR/batchpredict_code.png)

*Figure 6*

#### Use AIR `BatchPredictor` for Batch Prediction
Previously, we have trained and tuned our XGBoost model on data from June 2021. Let's now take out best checkpoint from the tuning step and perform batch inference on taxi tip data from June 2022.

In [None]:
from ray.train.batch_predictor import BatchPredictor
from ray.train.xgboost import XGBoostPredictor

In [None]:
test_dataset = ray.data.read_parquet(
    "s3://anyscale-training-data/intro-to-ray-air/nyc_taxi_2022.parquet"
).drop_columns("is_big_tip")

In [None]:
batch_predictor = BatchPredictor.from_checkpoint(
    best_result.checkpoint, XGBoostPredictor
)

In [None]:
predicted_probabilities = batch_predictor.predict(test_dataset)

**Coding Exercise**

Now that you have the predictions generated from the testing set, how did the model perform? Compare the predictions outputted by `BatchPredictor` with the ground truth labels available in the raw data file.

In [None]:
### YOUR CODE HERE ###

**Solution**

In [None]:
### SAMPLE IMPLEMENTATION ###

print("PREDICTED PROBABILITIES")
predicted_probabilities.show()

**Key Concepts in This Section**

`Checkpoints`: store the full state of the model periodically, so that partially trained models are available and can be used to resume training from an intermediate point, instead of starting from scratch; also allows for the best model to be saved for batch inference later on

`BatchPredictor`: loads the best model from a checkpoint to perform batch inference on large-scales or online inference

### Ray Serve
***

![Serve Highlight](../_static/assets/Introduction_to_Ray_AIR/serve_highlight.png)

Finally, we want a way to serve our taxi tip prediction application to our end users, hopefully with a low latency to be maximally useful to drivers on the job. However, this poses a challenge since machine learning models are compute intensive and ideally, this model wouldn't be served in isolation, but rather adjacent to business logic or even other ML models.

Ray Serve is a scalable compute layer for sercing machine learning models that allows you to serve individual models or create composite model pipelines, where you can independently deploy, update, and scale individual components. Serve isn't tied to a specific machine learning library, but rather treats models as ordinary Python code. 

Additionally, it allows you to flexibly combine normal Python business logic alongside machine learning models. This makes it possible to build online inference services completely end-to-end: a Serve application could validate user input, query a database, perform inference scalably across multiple ML models, and combine, filter, and validate the output all in the process of handling a single inference request.

In *Figure 7*, you see the pattern for deploying a `Predictor` from a `Checkpoint` wth Ray Serve.

![Ray Serve Code Snippet](../_static/assets/Introduction_to_Ray_AIR/serve_code.png)

*Figure 7*

Let's deploy our big tip predictor with Ray Serve.

#### Use `PredictorDeployment` for Online Inference
Deploy the best model as an inference service by using Ray Serve and the `PredictorDeployment` class. After deploying the service, you can send requests to it.

In [None]:
from ray import serve
from fastapi import Request
from ray.serve import PredictorDeployment
from ray.serve.http_adapters import pandas_read_json

In [None]:
serve.run(
    PredictorDeployment.options(
        name="XGBoostService", num_replicas=2, route_prefix="/rayair"
    ).bind(XGBoostPredictor, best_result.checkpoint, http_adapter=pandas_read_json)
)

Let's send a request through HTTP. You can use the `PredictorDeployment` to deploy checkpoints trained in Ray AIR as live endpoints.

In [None]:
import requests

sample_input = test_dataset.take(1)
sample_input = dict(sample_input[0])

output = requests.post("http://localhost:8000/rayair", json=[sample_input]).json()
print(output)

**Coding Exercise**

You've just served a prediction for a single sample input from our test dataset. Predictors are able to accept array, dataframe, and custom inputs (that can be transformed to array or dataframe). You can also configure micro-batching to enhance performance.

Try reading through the [user guide](https://docs.ray.io/en/latest/ray-air/examples/serving_guide.html) for predictors to accept incoming data for more than one sample and run the prediction again.

In [None]:
### YOUR CODE HERE ###

**Solution**

In [None]:
### SAMPLE IMPLEMENTATION ###

n = 5

for i in range(n):
    sample_input = test_dataset.take(i)
    sample_input = dict(sample_input[0])

    output = requests.post("http://localhost:8000/rayair", json=[sample_input]).json()
    print(output)

**Key Concepts in This Section**

`Deployments`: you can think of this as a managed group of Ray actors that can be addressed together and will handle requests load-balanced across them.

### Shutdown Ray runtime

In [None]:
ray.shutdown()

Disconnect the worker, and terminate processes started by `ray.init()`.

### Part 2: Summary
You've now just created a Ray Dataset, preprocessed some features, built a model with XGBoost, searched a hyperparameter space for the best configuration, loaded the best model from a checkpoint to perform batch inference, and served that model for online inference. Through this end-to-end example, you explored how to use Ray AIR to distribute an entire ML pipeline.

#### Key Concepts

- `Datasets`
- `Preprocessors`
- `Trainers`
- `Tuner`
- `Checkpoints`
- `BatchPredictor`
- `Deployments`

# Connect with the Ray community

You can learn and get more involved with the Ray community of developers and researchers:

* [Ray documentation](https://docs.ray.io/en/latest)
* [Official Ray Website](https://www.ray.io/): Browse the ecosystem and use this site as a hub to get the information that you need to get going and building with Ray.
* [Join the Community on Slack](https://forms.gle/9TSdDYUgxYs8SA9e8): Find friends to discuss your new learnings in our Slack space.
* [Use the Discussion Board](https://discuss.ray.io/): Ask questions, follow topics, and view announcements on this community forum.
* [Join a Meetup Group](https://www.meetup.com/Bay-Area-Ray-Meetup/): Tune in on meet-ups to listen to compelling talks, get to know other users, and meet the team behind Ray.
* [Open an Issue](https://github.com/ray-project/ray/issues/new/choose): Ray is constantly evolving to improve developer experience. Submit feature requests, bug-reports, and get help via GitHub issues.
* [Become a Ray contributor](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html): We welcome community contributions to improve our documentation and Ray framework.

<img src="../_static/assets/Generic/ray_logo.png" width="20%" loading="lazy">