# Introduction to Ray AI Runtime (AIR)
---
(*Suggested Time to Complete: 30 minutes*)

✨ Welcome to Part II of "Introduction to Ray"! 🪩

![Map of Ray](images/map.png)

*Figure 1*

Ray AI Runtime (AIR) is a unified set of libraries built on top of Ray for distributed data processing, model training, tuning, model serving, and reinforcement learning, all in Python. AIR provides simple scalable machine learning for individual workloads and end-to-end workflows, bringing together an ever-growing ecosystem of integrations with your favorite machine learning frameworks.

Before we lay out each library and their unique jobs to be done, let's take a moment to motivate Ray AIR by taking a high-level view of the typical data science and machine learning workflow. Developing a machine learning system is an iterative and often cyclical process that touches on the following stages:

1. Data Collection & Feature Engineering: source, sample, and label raw data; preprocess raw data into well-defined input dataset(s)
2. Model Training: the learning part of machine learning that could utilize a popular framework like PyTorch, XGBoost, or Tensorflow
3. Hyperparameter Tuning: improve upon your baseline model by searching a hyperparameter space
3. Model Evaluation: perform batch inference on new data to evaluate perforamnce, potentially triggering more feature engineering or finding a more relevant set of data
4. Deployment: deploy your solution to production and/or serve your model to the end user

Each of the five native libraries that Ray AIR wraps tackles a piece of the ML specific tasks outlined above that you can see illustrated in *Figure 2*. Because this abstraction layer is built on top of Ray Core, it is distributed by nature.

1. 📊 [Ray Data](https://docs.ray.io/en/latest/data/dataset.html): scalable, framework-agnostic loading and transforming raw data across training and prediction
2. 🚂 [Ray Train](https://docs.ray.io/en/latest/train/train.html): distributed multi-node model training with fault tolerance that integrates with your favorite training libraries
3. 📈 [Ray Tune](https://docs.ray.io/en/latest/tune/index.html): scales experiment execution and hyperparameter tuning to optimize model performance
4. 🍦 [Ray Serve](https://docs.ray.io/en/latest/serve/index.html): deploys your model for online inference, with optional microbatching to improve performance
5. 🦾 [Ray RLlib](https://docs.ray.io/en/latest/rllib/index.html): distributed reinforcement learning workloads that integrate with the other Ray AIR libraries above

In this module, we will contextualize Ray Data, Train, Tune, and Serve with a common ML pipeline and discuss how each library facilitates the distinct steps we need to distribute an end-to-end example. Then, we will look at scaling individual workloads with a reinforcement learning specific application for RLlib.

**Learning Objectives**
1. Introduce the high-level data science libraries that compose Ray AIR: Data, Train, Tune, Serve, and RLlib
2. Understand how to use Ray AIR as a unified toolkit to write an end-to-end ML application in Python as well as scale individual jobs
3. Practice key concepts from each stage of the ML pipeline
    - Data - use out-of-the-box `Preprocessor`s to load and transform data
    - Train - use AIR `Trainer`s for supported ML frameworks
    - Tune - use AIR `Tuner`s for hyperparameter search
    - BatchPredictor - use AIR `BatchPredictor` to load model from best checkpoint for batch inference
    - Serve - use `PredictorDeployment` for online inference
    - RLlib - distribute RL workloads with RLlib

**Prerequisites**
- [Introduction to Ray Notebook](https://github.com): introduces Ray as a low-level distributed computing framework and covers key elements of Ray Core

![End to End](images/e2e_air.png)

*Figure 2*

# Predicting Big Tips w/ NYC Taxi Data
***

To illustrate Ray AIR's capabilities, we will walk through an end-to-end example, building a simple machine learning pipeline using Ray Data, Train, Tune, and Serve. Each section will introduce key components, integrations, and typical workloads for the AIR library before demonstrating its functionality with our example application: predicting big tips on yellow taxi cabs in New York City.

Suppose we want to build an application for taxi drivers in NYC that predicts if a given ride will result in a large tip (<20%). This has the potential to influence drivers' decisions when accepting jobs to maximize their margin, and conversations around [information accessbility for gig workers](https://www.nytimes.com/2022/10/11/technology/gig-workers-drivers-para-app.html) are making waves in the news. For this project, let's use the [New York City Taxi & Limousine Commission's Trip Record Data](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page) to build a binary classification model. Starting off, let's take the yellow cab data from June 2021 which contains over 2 million samples with features including `passenger_count`, `trip_distance` (in miles), `fare_amount` (including tax, tip, fees, etc.), `trip_duration` (in seconds), `hour` (hour that trip started), `day_of_week`, and our target `is_big_tip` (whether the tip amount was greater than 20%).

Our workflow will consist of loading data, setting up a preprocessor, training the model with XGBoost, tuning hyperparameters, performing batch inference, and finally serving our online application.

## 1. Ray Data
***
First up, we want to load in the taxi dataset and transform its raw input into features that will be given to our machine learning model.

[Ray Datasets](https://docs.ray.io/en/latest/data/user-guide.html) are the standard way to load and pass data in Ray libraries and applications. This common basis for data handling allows users to leverage different libraries from the Ray ecosystem in whatever way serves their needs without being tethered to a particular framework.

The benefits of using the core `Dataset` abstractions for loading, transforming, and passing references to data in a Ray cluster include:

- **Flexibility**: Compatible with a variety of file formats, data sources, and distributed frameworks, Datasets work seamlessly with library integrations like Dask on Ray and can be passed between Ray tasks and actors without copying data.
- **Performance for ML Workloads**: Datasets offers important features like accelerator support, pipelining, and global random shuffles that accelerate ML training and inference workloads along with basic distributed data transformations such as map, filter, sort, groupby, and repartition.
- **Persistent Preprocessor**: The `Preprocessor` primitive explicitly captures and stores the transformations applied to convert inputs into features and is applied at both training and serving to keep the processing consistent across the pipeline.
- **Built on Ray Core**: inherits scalability to hundreds of nodes, efficient memory usage due to memory across processes on the same node, and object spilling and recovery to handle failures. Because Datasets are just lists of object references, they can be passed between tasks and actors without needing to make a copy of the data, which is crucial for making data-intensive applications and libraries scalable.

In *Figure 3* below, you can see the a general pattern for creating a `Dataset`, configuring a `Preprocessor`, and passing these into the `Trainer` for consistent data handling throughout the pipeline.

![Ray Data Code Snippet](images/data_code.png)

*Figure 3*

Let's take this generic structure and see how it plays out with our tip prediction task.

### 1(a). Import Relevant Packages + Starting Ray
To start, we'll import Ray (check out our [installation instructions](https://docs.ray.io/en/latest/ray-overview/installation.html)) and start a Ray cluster on our machine that can utilize all the cores available to you as workers. We use `ray.is_initialized` to ensure that we only have one Ray cluster active.

In [1]:
import ray

if ray.is_initialized:
    ray.shutdown()

ray.init()

2022-10-24 13:58:13,351	INFO worker.py:1509 -- Started a local Ray instance. View the dashboard at [1m[32m127.0.0.1:8266 [39m[22m


0,1
Python version:,3.10.6
Ray version:,2.0.0
Dashboard:,http://127.0.0.1:8266


### 1(b). Create Ray Dataset
Here, we read in the data from an S3 `.parquet` datasource, a column-major format designed to support fast data processing.

In [3]:
dataset = ray.data.read_parquet("s3://anyscale-training-data/intro-to-ray-air/nyc_taxi_2021.parquet")

# split data into training and validation subsets
train_dataset, valid_dataset = dataset.train_test_split(test_size=0.3)

# split datasets into blocks for parallel preprocessing
train_dataset.repartition(100)
valid_dataset.repartition(100)

Read progress: 100%|██████████| 1/1 [00:04<00:00,  4.19s/it]
Repartition: 100%|██████████| 100/100 [00:00<00:00, 136.22it/s]
Repartition: 100%|██████████| 100/100 [00:00<00:00, 427.26it/s]


Dataset(num_blocks=100, num_rows=811472, schema={passenger_count: double, trip_distance: double, fare_amount: double, trip_duration: int64, hour: int64, day_of_week: int64, is_big_tip: bool, __index_level_0__: int64})

**💻 Coding Excercise 💻**

There exist many [`Dataset` API elements](https://docs.ray.io/en/latest/data/api/dataset.html#) available for common transformations and operations. Using the above as a reference:
1. Inspect the schema from the underlying Parquet metadata.
2. Count how many rows are in the training and validation datasets.
3. Inspect the first five samples of either dataset.
4. What is the average `fare_amount` grouped by `passenger_count`?

In [None]:
### YOUR CODE HERE ###

### 1(c). Preprocessing
To transform our raw data -> features, we'll define a `Preprocessor`. What's nice about a Ray AIR `Preprocessor` is that it is automatically incorporated...

- **During Training**: `Preprocessor` is passed into a `Trainer` to `fit` and `transform` input `Dataset`s.
- **During Tuning**: each `Trial` will instantiate its own copy of the `Preprocessor` and the fitting and transformation logic will occur once per `Trial`
- **During Checkpointing**: the `Preprocessor` is saved in the `Checkpoint` if was passed into the `Trainer`
- **During Predicting**: if the `Checkpoint` contains a `Preprocessor`, then it will be used to call `transform_batch` on input batches prior to performing inference

In the code below, we define a `MinMaxScaler` preprocessor that will scale the `trip_distance` and `trip_duration` columns by their range.

In [4]:
from ray.data.preprocessors import MinMaxScaler

# create a preprocessor to scale some columns
preprocessor = MinMaxScaler(columns=["trip_distance", "trip_duration"])

**💻 Coding Excercise 💻**

Ray AIR provides several [preprocessors out of the box](https://docs.ray.io/en/latest/ray-air/preprocessors.html#) as well as support for implementing custom preprocessors. 

For this excercise, visualize the distribution for each of the features in our dataset, read through the "Which preprocessor should you use?" section of the linked user guide above, and determine whether `MinMaxScaler` applied to `trip_distance` and `trip_duration` is sufficient.

Later on, you can compare model performance between the given preprocessor and your custom configuration.

In [None]:
### YOUR CODE HERE ###

**Key Concepts in This Section**

`Dataset`: The standard way to load and exchange data in Ray AIR. In AIR, Datasets are used extensively for data loading, preprocessing, and batch inference.

`Preprocessors`: Preprocessors are primitives that can be used to transform input data into features. Preprocessors operate on Datasets, which makes them scalable and compatible with a variety of datasources and dataframe libraries. A Preprocessor is fitted during Training, and applied at runtime in both Training and Serving on data batches in the same way. AIR comes with a collection of built-in preprocessors, and you can also define your own with simple templates which you can read more about in our [User Guide](https://docs.ray.io/en/latest/ray-air/preprocessors.html).

## 2. Ray Train
***
Following data preprocessing, we can move forward with defining our model for binary classification of big tip rides.

[Ray Train](https://docs.ray.io/en/latest/ray-air/trainer.html) is a library for distributed training on Ray. It offers key tools for different parts of the training workflow, from feature processing, to scalable training, to integrations with ML tracking tools, to export mechanisms for models.

Ray AIR `Trainer`s enable users to distribute training with popular machine learning frameworks like PyTorch, Tensorflow, XGBoost, HuggingFace Transformers, Scikit-Learn, and more. Train supports features like callbacks for early stopping, checkpointing, and integration with Tensorboard, Weights/Biases, and MLflow for observability.

ML pracitioners tend to run into a few common problems with training models that prompt them to consider distributed solutions:

1. training time is too long to be practical
2. the data is too large to fit on one machine
3. the model itself is too large to fit on a single machine

Ray Train tackles the first problem by running distributed multi-node training with fault tolerance, leveraging Ray Data to scale preprocessing and distributed data ingestion. It is also composable with Ray Tune for scaling hyperparameter tuning and outputs the trained model in the form of a `Checkpoint` for batch inference.

In *Figure 4* below, you see that training comes in two major parts: defining the `Trainer` object and then fitting it to the training dataset. In this code snippet, we use a `TorchTrainer`, however, this may be swapped out with any [integrations](https://docs.ray.io/en/latest/ray-air/package-ref.html#trainer-and-predictor-integrations).

![Ray Train Code Snippet](images/train_code.png)

*Figure 4*

Let's put these concepts in practice by applying it to our taxi problem.

### 2(a). Define AIR `Trainer`

There are three broad categories of Trainers that AIR offers:

- Deep Learning Trainers (Pytorch, Tensorflow, Horovod)
- Tree-based Trainers (XGBoost, LightGBM)
- Other ML frameworks (HuggingFace, Scikit-Learn, RLlib)

In the example below, we will use an `XGBoostTrainer`to perform binary classification on these NYC Taxi rides. To construct a `Trainer`, you provide:

- a `ScalingConfig` which specifies how many parallel training workers and what type of resources (CPUs/GPUs) to use per worker during training.
- a collection of datasets and a preprocessor for the provided datasets which configures preprocessing and the datasets to ingest from

Optionally, you can choose to add `resume_from_checkpoint` which is a checkpoint path to resume from, should your training run be interrupted.

Below, we'll set up an `XGBoostTrainer` for our classification task. [XGBoost](https://xgboost.readthedocs.io/en/stable/) is a gradient boosted decision trees library. We'll then supply our `Preprocessor` from the previous step as well as training and validation datasets to ingest.

In [9]:
from ray.air.config import ScalingConfig
from ray.train.xgboost import XGBoostTrainer

trainer = XGBoostTrainer(

    label_column="is_big_tip",
    num_boost_round=100,

    scaling_config=ScalingConfig(
        # number of workers to use
        num_workers=8,
        # whether to use GPU acceleration
        use_gpu=False),

    # XGBoost specific params
    params={
        "objective": "binary:logistic",
        "eval_metric": ["logloss", "error"],
        "tree_method": "approx"
    },

    # feed in our datasets and preprocessor
    datasets={"train": train_dataset, "valid": valid_dataset},
    preprocessor=preprocessor
)

### 2(b). Fit the Trainer

To invoke training, call `.fit()`. Trainer objects produce a `Result` object which gives you access to metrics, checkpoints, and errors.

In [10]:
result = trainer.fit()

  tuner = Tuner(trainable=trainable, run_config=self.run_config)


Trial name,status,loc,iter,total time (s),train-logloss,train-error,valid-logloss
XGBoostTrainer_93eb7_00000,TERMINATED,127.0.0.1:59394,101,17.1879,0.65472,0.384835,0.6583


[2m[36m(_RemoteRayXGBoostActor pid=59424)[0m [14:55:37] task [xgboost.ray]:5107570080 got new rank 4
[2m[36m(_RemoteRayXGBoostActor pid=59422)[0m [14:55:37] task [xgboost.ray]:4827600144 got new rank 2
[2m[36m(_RemoteRayXGBoostActor pid=59425)[0m [14:55:37] task [xgboost.ray]:6212768992 got new rank 5
[2m[36m(_RemoteRayXGBoostActor pid=59426)[0m [14:55:37] task [xgboost.ray]:4978840848 got new rank 7
[2m[36m(_RemoteRayXGBoostActor pid=59423)[0m [14:55:37] task [xgboost.ray]:5101278576 got new rank 3
[2m[36m(_RemoteRayXGBoostActor pid=59427)[0m [14:55:37] task [xgboost.ray]:5121037776 got new rank 6
[2m[36m(_RemoteRayXGBoostActor pid=59421)[0m [14:55:37] task [xgboost.ray]:5089744288 got new rank 0
[2m[36m(_RemoteRayXGBoostActor pid=59420)[0m [14:55:37] task [xgboost.ray]:4829697392 got new rank 1


Result for XGBoostTrainer_93eb7_00000:
  date: 2022-10-24_14-55-38
  done: false
  experiment_id: f3ba88f2e8e34662be0f27851fc2a8e2
  hostname: Emmys-MacBook-Pro-16
  iterations_since_restore: 1
  node_ip: 127.0.0.1
  pid: 59394
  time_since_restore: 6.065418004989624
  time_this_iter_s: 6.065418004989624
  time_total_s: 6.065418004989624
  timestamp: 1666648538
  timesteps_since_restore: 0
  train-error: 0.39317630990903824
  train-logloss: 0.677846201610758
  training_iteration: 1
  trial_id: 93eb7_00000
  valid-error: 0.3921651024311375
  valid-logloss: 0.6778842082323272
  warmup_time: 0.002521038055419922
  
Result for XGBoostTrainer_93eb7_00000:
  date: 2022-10-24_14-55-44
  done: false
  experiment_id: f3ba88f2e8e34662be0f27851fc2a8e2
  hostname: Emmys-MacBook-Pro-16
  iterations_since_restore: 44
  node_ip: 127.0.0.1
  pid: 59394
  time_since_restore: 11.403082132339478
  time_this_iter_s: 1.0101821422576904
  time_total_s: 11.403082132339478
  timestamp: 1666648544
  timesteps_

2022-10-24 14:55:50,089	INFO tune.py:758 -- Total run time: 19.33 seconds (19.22 seconds for the tuning loop).


**💻 Coding Excercise 💻**

You can check out the training results from the `Result` object with the following calls:

```python
# returns last saved checkpoint
result.checkpoint

# returns the `n` best saved checkpoints as configured in `RunConfig.CheckpointConfig`
result.best_checkpoints

# returns the final metrics as reported
result.metrics

# returns the contain an Exception if training failed
result.error
```

Inspect your training result below. What is the reported accuracy for the training and validation runs? Note: `error` is the binary classification error rate in this case calculated as `#(wrong cases)/#(all cases)`

In [None]:
### YOUR CODE HERE ###

**Key Concepts in This Section**

`Trainer`: Trainers are wrapper classes around third-party training frameworks such as XGBoost, Pytorch, and Tensorflow. They are built to help integrate with core Ray Actors (for distribution), Ray Tune, and Ray Datasets.

## 3. Ray Tune
***
Now that we have a baseline XGBoost model trained, we find the classification accuracy lacking. Among several methods to improve performance (collecting more data, feature engineering, choosing a different algorithm, transfer learning, etc.), **hyperparameter tuning** involves inserting the training loop into an optimization method to find the optimal set of hyperparameters and can be a powerful way to run experiements to achieve good results.

*Hyperparameters*, unlike model parameters which are learned by the model as it trains, are parameters that *you, the human* set. These hyperparameters remain static through a `trial` or experiement and influence the final outcome of training. For example, some common variables to adjust could include:

- `max_depth` in decision tree models
- `drop_out` rate in neural networks
- `discount_factor` in Q-learning
- `num_iterations` in logistic regression
- `n_grams` size of "n" in natural language processing

Setting up and executing hyperparameter optimization (HPO) in itself can be expensive in terms of compute resources and runtime, but there are several intricacies in making the process work *well*, including:

- **Vast Search Space**: your model could have anywhere between a handful to several dozen available hyperparameters, each with different data types and ranges. Some parameters might be correlated. Sampling good candidates from high-dimensional spaces is difficult.
- **Search Algorithms**: choosing hyperparameters at random can work surprisingly well, but in general, you need to test complex search algirhtms to achieve the best result.
- **Long Runtime**: even if you distribute tuning, training complex models in themselves can take a long time to complete per run, so it's best to have an efficiency at every stage in the pipeline.
- **Resource Allocation**: you must have enough compute resources available to during each trial as to not slow down search because of scheduling mismatches.
- **User Experience**: HPO is complicated, and visibility and tooling for developers like stopping bad runs early, saving intermediate results, restarting from checkpoints, or pausing and resume runs makes the process easier on the human.

Ray Tune is a distributed HPO library that addresses all of these topics above to provide a simplified interface for running trials and integrates with popular frameworks such as HyperOpt, Optuna, and many more.

In *Figure 5*, you'll find the general pattern for using AIR `Tuner`s which involves taking in a trainable, defining a search space, establishing a search algorithm, scheduling trials, and analyzing results. We'll go over the relevant components in the following section.

![Ray Tune Code Snippet](images/tune_code.png)

*Figure 5*

Let's see how to interact with Ray Tune to make some improvements to our big tip classifier.

### 3(a). Use AIR `Tuner` for Hyperparameter Search

To set up an AIR `Tuner`, we must specify:

- `search space`: a set of hyperparameters you wish to tune
- `search_algorithm`: to optimize parameter search
- `scheduler`: (optional) to stop searches early and speed up experiments

We pass the `search space`, `search algorithm`, `scheduler`, and `Trainer` to the `Tuner`, which runs the workload by evaluating multiple hyperparameters in parallel. Afterwards, `Tuner` returns its results in a `ResultGrid` for you to analyze.

Below, we'll define a search space with a few hyperparameters to tune. 

- `eta` is the learning rate
- `max_depth` specifies how deep each tree is with a default of 6. A higher value leads to a more complex model. Using `tune.randint(1, 9)`, it will sample an integer uniformly between 1 and 9, inclusive.
- `min_child_weight` defines the minimum sum of weights of all observations in a child, used to control overfitting

In [19]:
from ray import tune
from ray.tune.tuner import Tuner, TuneConfig

param_space = {"params":
    {
    "eta": tune.uniform(0.2, 0.4),
    "max_depth": tune.randint(1, 9),
    "min_child_weight": tune.uniform(0.8, 1.0)
    }
}

tuner = Tuner(
    trainer,
    param_space=param_space,
    tune_config=TuneConfig(num_samples=10, metric="train-logloss", mode="min"),
)

  tuner = Tuner(


### 3(b). Execute Hyperparameter Search & Analyze Results

Now, we can execute tuning on our 10 trials. After tuning, we can query the `ResultGrid` object to see metrics, results, and checkpoints of each trial.

In [23]:
result_grid = tuner.fit()

# checks if there have been errors
result_grid.errors

# gets the best result
best_result = result_grid.get_best_result()

# gets the best checkpoint
best_checkpoint = best_result.checkpoint

# gets the best metrics
best_metrics = best_result.metrics

Trial name,status,loc,params/eta,params/max_depth,params/min_child_...,iter,total time (s),train-logloss,train-error,valid-logloss
XGBoostTrainer_3546f_00000,TERMINATED,127.0.0.1:78913,0.347661,4,0.97562,101,16.3036,0.657192,0.387866,0.658455
XGBoostTrainer_3546f_00001,TERMINATED,127.0.0.1:78937,0.386574,5,0.834934,101,18.2809,0.655652,0.385966,0.658115
XGBoostTrainer_3546f_00002,TERMINATED,127.0.0.1:79045,0.359465,4,0.80445,101,15.177,0.657138,0.388064,0.65846
XGBoostTrainer_3546f_00003,TERMINATED,127.0.0.1:79064,0.361996,5,0.928984,101,16.2104,0.655755,0.386201,0.658098
XGBoostTrainer_3546f_00004,TERMINATED,127.0.0.1:79082,0.337746,1,0.890484,101,15.1763,0.662333,0.394883,0.662733
XGBoostTrainer_3546f_00005,TERMINATED,127.0.0.1:79131,0.325147,8,0.918536,101,19.2442,0.648821,0.377365,0.65917
XGBoostTrainer_3546f_00006,TERMINATED,127.0.0.1:79204,0.362868,2,0.818079,101,15.2651,0.659538,0.391093,0.660168
XGBoostTrainer_3546f_00007,TERMINATED,127.0.0.1:79222,0.251808,1,0.906682,101,15.3066,0.662666,0.395404,0.663034
XGBoostTrainer_3546f_00008,TERMINATED,127.0.0.1:79284,0.33017,2,0.83719,101,15.294,0.659716,0.391304,0.660358
XGBoostTrainer_3546f_00009,TERMINATED,127.0.0.1:79348,0.273236,2,0.932753,101,15.2673,0.660012,0.391625,0.660546


[2m[36m(_RemoteRayXGBoostActor pid=78923)[0m [10:12:37] task [xgboost.ray]:5008561520 got new rank 2
[2m[36m(_RemoteRayXGBoostActor pid=78921)[0m [10:12:37] task [xgboost.ray]:4827600192 got new rank 0
[2m[36m(_RemoteRayXGBoostActor pid=78927)[0m [10:12:37] task [xgboost.ray]:5429482912 got new rank 7
[2m[36m(_RemoteRayXGBoostActor pid=78926)[0m [10:12:37] task [xgboost.ray]:5135881536 got new rank 4
[2m[36m(_RemoteRayXGBoostActor pid=78922)[0m [10:12:37] task [xgboost.ray]:5472474528 got new rank 1
[2m[36m(_RemoteRayXGBoostActor pid=78924)[0m [10:12:37] task [xgboost.ray]:5511271696 got new rank 3
[2m[36m(_RemoteRayXGBoostActor pid=78928)[0m [10:12:37] task [xgboost.ray]:6003283360 got new rank 6
[2m[36m(_RemoteRayXGBoostActor pid=78925)[0m [10:12:37] task [xgboost.ray]:5010707872 got new rank 5


Result for XGBoostTrainer_3546f_00000:
  date: 2022-10-25_10-12-38
  done: false
  experiment_id: 6294ff2cb5b64cd9b282db4b5a97a4e4
  hostname: Emmys-MacBook-Pro-16
  iterations_since_restore: 1
  node_ip: 127.0.0.1
  pid: 78913
  time_since_restore: 6.352486848831177
  time_this_iter_s: 6.352486848831177
  time_total_s: 6.352486848831177
  timestamp: 1666717958
  timesteps_since_restore: 0
  train-error: 0.39571983798740173
  train-logloss: 0.6767694083721559
  training_iteration: 1
  trial_id: 3546f_00000
  valid-error: 0.39466426444781827
  valid-logloss: 0.676805883080574
  warmup_time: 0.002576112747192383
  
Result for XGBoostTrainer_3546f_00000:
  date: 2022-10-25_10-12-43
  done: false
  experiment_id: 6294ff2cb5b64cd9b282db4b5a97a4e4
  hostname: Emmys-MacBook-Pro-16
  iterations_since_restore: 48
  node_ip: 127.0.0.1
  pid: 78913
  time_since_restore: 11.622880935668945
  time_this_iter_s: 1.0087149143218994
  time_total_s: 11.622880935668945
  timestamp: 1666717963
  timesteps

[2m[36m(_RemoteRayXGBoostActor pid=78944)[0m [10:12:55] task [xgboost.ray]:5093561808 got new rank 2
[2m[36m(_RemoteRayXGBoostActor pid=78945)[0m [10:12:55] task [xgboost.ray]:5054092656 got new rank 1
[2m[36m(_RemoteRayXGBoostActor pid=78948)[0m [10:12:55] task [xgboost.ray]:4940535152 got new rank 4
[2m[36m(_RemoteRayXGBoostActor pid=78947)[0m [10:12:55] task [xgboost.ray]:4981740816 got new rank 3
[2m[36m(_RemoteRayXGBoostActor pid=78949)[0m [10:12:55] task [xgboost.ray]:5175105024 got new rank 6
[2m[36m(_RemoteRayXGBoostActor pid=78946)[0m [10:12:55] task [xgboost.ray]:5340353904 got new rank 5
[2m[36m(_RemoteRayXGBoostActor pid=78943)[0m [10:12:55] task [xgboost.ray]:5545874608 got new rank 0
[2m[36m(_RemoteRayXGBoostActor pid=78950)[0m [10:12:55] task [xgboost.ray]:5060383968 got new rank 7


Result for XGBoostTrainer_3546f_00001:
  date: 2022-10-25_10-12-56
  done: false
  experiment_id: 16d5bd8e28bd4e01b556d61063bee18b
  hostname: Emmys-MacBook-Pro-16
  iterations_since_restore: 1
  node_ip: 127.0.0.1
  pid: 78937
  time_since_restore: 6.1234800815582275
  time_this_iter_s: 6.1234800815582275
  time_total_s: 6.1234800815582275
  timestamp: 1666717976
  timesteps_since_restore: 0
  train-error: 0.39455951174401205
  train-logloss: 0.6748282670917973
  training_iteration: 1
  trial_id: 3546f_00001
  valid-error: 0.3937116745864306
  valid-logloss: 0.6748866593440933
  warmup_time: 0.0032219886779785156
  
Result for XGBoostTrainer_3546f_00001:
  date: 2022-10-25_10-13-02
  done: false
  experiment_id: 16d5bd8e28bd4e01b556d61063bee18b
  hostname: Emmys-MacBook-Pro-16
  iterations_since_restore: 42
  node_ip: 127.0.0.1
  pid: 78937
  time_since_restore: 11.422483205795288
  time_this_iter_s: 1.0063369274139404
  time_total_s: 11.422483205795288
  timestamp: 1666717982
  times

[2m[36m(_RemoteRayXGBoostActor pid=79051)[0m [10:13:16] task [xgboost.ray]:5107569984 got new rank 1
[2m[36m(_RemoteRayXGBoostActor pid=79050)[0m [10:13:16] task [xgboost.ray]:5204038976 got new rank 0
[2m[36m(_RemoteRayXGBoostActor pid=79053)[0m [10:13:16] task [xgboost.ray]:4969158048 got new rank 3
[2m[36m(_RemoteRayXGBoostActor pid=79054)[0m [10:13:16] task [xgboost.ray]:6197040496 got new rank 5
[2m[36m(_RemoteRayXGBoostActor pid=79056)[0m [10:13:17] task [xgboost.ray]:5143221616 got new rank 6
[2m[36m(_RemoteRayXGBoostActor pid=79057)[0m [10:13:17] task [xgboost.ray]:4976498032 got new rank 7
[2m[36m(_RemoteRayXGBoostActor pid=79052)[0m [10:13:16] task [xgboost.ray]:5182674384 got new rank 2
[2m[36m(_RemoteRayXGBoostActor pid=79055)[0m [10:13:16] task [xgboost.ray]:5007955360 got new rank 4


Result for XGBoostTrainer_3546f_00002:
  date: 2022-10-25_10-13-18
  done: false
  experiment_id: 0d072a00348c495f96daf4a289286f7e
  hostname: Emmys-MacBook-Pro-16
  iterations_since_restore: 1
  node_ip: 127.0.0.1
  pid: 79045
  time_since_restore: 6.059243202209473
  time_this_iter_s: 6.059243202209473
  time_total_s: 6.059243202209473
  timestamp: 1666717998
  timesteps_since_restore: 0
  train-error: 0.39571983798740173
  train-logloss: 0.6763327628493864
  training_iteration: 1
  trial_id: 3546f_00002
  valid-error: 0.39466426444781827
  valid-logloss: 0.6763706242077918
  warmup_time: 0.0022759437561035156
  
Result for XGBoostTrainer_3546f_00002:
  date: 2022-10-25_10-13-23
  done: false
  experiment_id: 0d072a00348c495f96daf4a289286f7e
  hostname: Emmys-MacBook-Pro-16
  iterations_since_restore: 49
  node_ip: 127.0.0.1
  pid: 79045
  time_since_restore: 11.387831211090088
  time_this_iter_s: 1.0153319835662842
  time_total_s: 11.387831211090088
  timestamp: 1666718003
  timeste

[2m[36m(_RemoteRayXGBoostActor pid=79071)[0m [10:13:34] task [xgboost.ray]:4998518128 got new rank 4
[2m[36m(_RemoteRayXGBoostActor pid=79069)[0m [10:13:34] task [xgboost.ray]:5013198240 got new rank 1
[2m[36m(_RemoteRayXGBoostActor pid=79074)[0m [10:13:34] task [xgboost.ray]:5219767760 got new rank 6
[2m[36m(_RemoteRayXGBoostActor pid=79068)[0m [10:13:34] task [xgboost.ray]:5120152944 got new rank 0
[2m[36m(_RemoteRayXGBoostActor pid=79073)[0m [10:13:34] task [xgboost.ray]:5474325872 got new rank 5
[2m[36m(_RemoteRayXGBoostActor pid=79072)[0m [10:13:34] task [xgboost.ray]:4791948752 got new rank 3
[2m[36m(_RemoteRayXGBoostActor pid=79070)[0m [10:13:34] task [xgboost.ray]:6072620496 got new rank 2
[2m[36m(_RemoteRayXGBoostActor pid=79075)[0m [10:13:34] task [xgboost.ray]:5195519344 got new rank 7


Result for XGBoostTrainer_3546f_00003:
  date: 2022-10-25_10-13-35
  done: false
  experiment_id: dc5acb3a66da405ca964d663e319ec7e
  hostname: Emmys-MacBook-Pro-16
  iterations_since_restore: 1
  node_ip: 127.0.0.1
  pid: 79064
  time_since_restore: 6.139097690582275
  time_this_iter_s: 6.139097690582275
  time_total_s: 6.139097690582275
  timestamp: 1666718015
  timesteps_since_restore: 0
  train-error: 0.39455951174401205
  train-logloss: 0.6757355660135066
  training_iteration: 1
  trial_id: 3546f_00003
  valid-error: 0.3937116745864306
  valid-logloss: 0.6757902665136126
  warmup_time: 0.002619028091430664
  
Result for XGBoostTrainer_3546f_00003:
  date: 2022-10-25_10-13-41
  done: false
  experiment_id: dc5acb3a66da405ca964d663e319ec7e
  hostname: Emmys-MacBook-Pro-16
  iterations_since_restore: 48
  node_ip: 127.0.0.1
  pid: 79064
  time_since_restore: 11.48870301246643
  time_this_iter_s: 1.0107462406158447
  time_total_s: 11.48870301246643
  timestamp: 1666718021
  timesteps_s

[2m[36m(_RemoteRayXGBoostActor pid=79092)[0m [10:13:52] task [xgboost.ray]:4981986816 got new rank 6
[2m[36m(_RemoteRayXGBoostActor pid=79093)[0m [10:13:52] task [xgboost.ray]:5051995504 got new rank 7
[2m[36m(_RemoteRayXGBoostActor pid=79089)[0m [10:13:52] task [xgboost.ray]:4985935168 got new rank 4
[2m[36m(_RemoteRayXGBoostActor pid=79087)[0m [10:13:52] task [xgboost.ray]:5996139936 got new rank 1
[2m[36m(_RemoteRayXGBoostActor pid=79091)[0m [10:13:52] task [xgboost.ray]:5142451664 got new rank 5
[2m[36m(_RemoteRayXGBoostActor pid=79090)[0m [10:13:52] task [xgboost.ray]:4932457744 got new rank 3
[2m[36m(_RemoteRayXGBoostActor pid=79088)[0m [10:13:52] task [xgboost.ray]:5541680496 got new rank 2
[2m[36m(_RemoteRayXGBoostActor pid=79086)[0m [10:13:52] task [xgboost.ray]:5007955312 got new rank 0


Result for XGBoostTrainer_3546f_00004:
  date: 2022-10-25_10-13-53
  done: false
  experiment_id: 3b5e3470f837437c9c2ca2f90f8ea3ae
  hostname: Emmys-MacBook-Pro-16
  iterations_since_restore: 1
  node_ip: 127.0.0.1
  pid: 79082
  time_since_restore: 6.028246879577637
  time_this_iter_s: 6.028246879577637
  time_total_s: 6.028246879577637
  timestamp: 1666718033
  timesteps_since_restore: 0
  train-error: 0.39817886347179965
  train-logloss: 0.6810990490550395
  training_iteration: 1
  trial_id: 3546f_00004
  valid-error: 0.3970697695052941
  valid-logloss: 0.6810741150800886
  warmup_time: 0.002722024917602539
  
Result for XGBoostTrainer_3546f_00004:
  date: 2022-10-25_10-13-59
  done: false
  experiment_id: 3b5e3470f837437c9c2ca2f90f8ea3ae
  hostname: Emmys-MacBook-Pro-16
  iterations_since_restore: 54
  node_ip: 127.0.0.1
  pid: 79082
  time_since_restore: 11.37238597869873
  time_this_iter_s: 1.0098187923431396
  time_total_s: 11.37238597869873
  timestamp: 1666718039
  timesteps_s

[2m[36m(_RemoteRayXGBoostActor pid=79188)[0m [10:14:10] task [xgboost.ray]:5968614912 got new rank 0
[2m[36m(_RemoteRayXGBoostActor pid=79189)[0m [10:14:10] task [xgboost.ray]:4974400928 got new rank 1
[2m[36m(_RemoteRayXGBoostActor pid=79192)[0m [10:14:10] task [xgboost.ray]:5191456016 got new rank 4
[2m[36m(_RemoteRayXGBoostActor pid=79193)[0m [10:14:10] task [xgboost.ray]:5144663360 got new rank 5
[2m[36m(_RemoteRayXGBoostActor pid=79195)[0m [10:14:10] task [xgboost.ray]:5080307104 got new rank 7
[2m[36m(_RemoteRayXGBoostActor pid=79190)[0m [10:14:10] task [xgboost.ray]:5059991040 got new rank 2
[2m[36m(_RemoteRayXGBoostActor pid=79191)[0m [10:14:10] task [xgboost.ray]:5429482720 got new rank 3
[2m[36m(_RemoteRayXGBoostActor pid=79194)[0m [10:14:10] task [xgboost.ray]:5235496208 got new rank 6


Result for XGBoostTrainer_3546f_00005:
  date: 2022-10-25_10-14-11
  done: false
  experiment_id: 6e96cb6e43704648b66772448d39724e
  hostname: Emmys-MacBook-Pro-16
  iterations_since_restore: 1
  node_ip: 127.0.0.1
  pid: 79131
  time_since_restore: 6.143538951873779
  time_this_iter_s: 6.143538951873779
  time_total_s: 6.143538951873779
  timestamp: 1666718051
  timesteps_since_restore: 0
  train-error: 0.39160825864976473
  train-logloss: 0.6761828285225364
  training_iteration: 1
  trial_id: 3546f_00005
  valid-error: 0.3909771378433267
  valid-logloss: 0.6763219812653442
  warmup_time: 0.002707958221435547
  
Result for XGBoostTrainer_3546f_00005:
  date: 2022-10-25_10-14-16
  done: false
  experiment_id: 6e96cb6e43704648b66772448d39724e
  hostname: Emmys-MacBook-Pro-16
  iterations_since_restore: 39
  node_ip: 127.0.0.1
  pid: 79131
  time_since_restore: 11.412540912628174
  time_this_iter_s: 1.0158588886260986
  time_total_s: 11.412540912628174
  timestamp: 1666718056
  timesteps

[2m[36m(_RemoteRayXGBoostActor pid=79210)[0m [10:14:32] task [xgboost.ray]:5270099312 got new rank 1
[2m[36m(_RemoteRayXGBoostActor pid=79209)[0m [10:14:32] task [xgboost.ray]:5145253328 got new rank 0
[2m[36m(_RemoteRayXGBoostActor pid=79211)[0m [10:14:32] task [xgboost.ray]:6097884624 got new rank 2
[2m[36m(_RemoteRayXGBoostActor pid=79212)[0m [10:14:32] task [xgboost.ray]:5058287008 got new rank 3
[2m[36m(_RemoteRayXGBoostActor pid=79215)[0m [10:14:32] task [xgboost.ray]:5041509744 got new rank 5
[2m[36m(_RemoteRayXGBoostActor pid=79216)[0m [10:14:32] task [xgboost.ray]:4920530384 got new rank 7
[2m[36m(_RemoteRayXGBoostActor pid=79214)[0m [10:14:32] task [xgboost.ray]:4948186432 got new rank 6
[2m[36m(_RemoteRayXGBoostActor pid=79213)[0m [10:14:32] task [xgboost.ray]:5127542224 got new rank 4


Result for XGBoostTrainer_3546f_00006:
  date: 2022-10-25_10-14-33
  done: false
  experiment_id: 3404cdca2d6a411bbc88195535daddf2
  hostname: Emmys-MacBook-Pro-16
  iterations_since_restore: 1
  node_ip: 127.0.0.1
  pid: 79204
  time_since_restore: 6.155975103378296
  time_this_iter_s: 6.155975103378296
  time_total_s: 6.155975103378296
  timestamp: 1666718073
  timesteps_since_restore: 0
  train-error: 0.39749808944916454
  train-logloss: 0.6779274024789324
  training_iteration: 1
  trial_id: 3546f_00006
  valid-error: 0.39637227162489896
  valid-logloss: 0.6780735767887413
  warmup_time: 0.002561807632446289
  
Result for XGBoostTrainer_3546f_00006:
  date: 2022-10-25_10-14-38
  done: false
  experiment_id: 3404cdca2d6a411bbc88195535daddf2
  hostname: Emmys-MacBook-Pro-16
  iterations_since_restore: 52
  node_ip: 127.0.0.1
  pid: 79204
  time_since_restore: 11.46732497215271
  time_this_iter_s: 1.0203659534454346
  time_total_s: 11.46732497215271
  timestamp: 1666718078
  timesteps_

[2m[36m(_RemoteRayXGBoostActor pid=79232)[0m [10:14:50] task [xgboost.ray]:6069114320 got new rank 4
[2m[36m(_RemoteRayXGBoostActor pid=79229)[0m [10:14:50] task [xgboost.ray]:4998518032 got new rank 1
[2m[36m(_RemoteRayXGBoostActor pid=79228)[0m [10:14:50] task [xgboost.ray]:4806628672 got new rank 0
[2m[36m(_RemoteRayXGBoostActor pid=79231)[0m [10:14:50] task [xgboost.ray]:5123298672 got new rank 3
[2m[36m(_RemoteRayXGBoostActor pid=79233)[0m [10:14:50] task [xgboost.ray]:5372859664 got new rank 5
[2m[36m(_RemoteRayXGBoostActor pid=79234)[0m [10:14:50] task [xgboost.ray]:5028926880 got new rank 6
[2m[36m(_RemoteRayXGBoostActor pid=79235)[0m [10:14:50] task [xgboost.ray]:5152658752 got new rank 7
[2m[36m(_RemoteRayXGBoostActor pid=79230)[0m [10:14:50] task [xgboost.ray]:6004102560 got new rank 2


Result for XGBoostTrainer_3546f_00007:
  date: 2022-10-25_10-14-51
  done: false
  experiment_id: 4c88cd72838a4718aa65eba9cd6637dc
  hostname: Emmys-MacBook-Pro-16
  iterations_since_restore: 1
  node_ip: 127.0.0.1
  pid: 79222
  time_since_restore: 6.228977203369141
  time_this_iter_s: 6.228977203369141
  time_total_s: 6.228977203369141
  timestamp: 1666718091
  timesteps_since_restore: 0
  train-error: 0.39817886347179965
  train-logloss: 0.683701577538643
  training_iteration: 1
  trial_id: 3546f_00007
  valid-error: 0.3970697695052941
  valid-logloss: 0.6836827898503336
  warmup_time: 0.002849102020263672
  
Result for XGBoostTrainer_3546f_00007:
  date: 2022-10-25_10-14-56
  done: false
  experiment_id: 4c88cd72838a4718aa65eba9cd6637dc
  hostname: Emmys-MacBook-Pro-16
  iterations_since_restore: 53
  node_ip: 127.0.0.1
  pid: 79222
  time_since_restore: 11.538372993469238
  time_this_iter_s: 1.015974998474121
  time_total_s: 11.538372993469238
  timestamp: 1666718096
  timesteps_s

[2m[36m(_RemoteRayXGBoostActor pid=79310)[0m [10:15:08] task [xgboost.ray]:5014246816 got new rank 3
[2m[36m(_RemoteRayXGBoostActor pid=79312)[0m [10:15:08] task [xgboost.ray]:5032793408 got new rank 5
[2m[36m(_RemoteRayXGBoostActor pid=79307)[0m [10:15:08] task [xgboost.ray]:5118711056 got new rank 0
[2m[36m(_RemoteRayXGBoostActor pid=79308)[0m [10:15:08] task [xgboost.ray]:6072259952 got new rank 1
[2m[36m(_RemoteRayXGBoostActor pid=79314)[0m [10:15:08] task [xgboost.ray]:4994372880 got new rank 7
[2m[36m(_RemoteRayXGBoostActor pid=79313)[0m [10:15:08] task [xgboost.ray]:5065627088 got new rank 6
[2m[36m(_RemoteRayXGBoostActor pid=79309)[0m [10:15:08] task [xgboost.ray]:5163144512 got new rank 2
[2m[36m(_RemoteRayXGBoostActor pid=79311)[0m [10:15:08] task [xgboost.ray]:6077502688 got new rank 4


Result for XGBoostTrainer_3546f_00008:
  date: 2022-10-25_10-15-09
  done: false
  experiment_id: ac9e880b85004ff8807f3efc86a0664c
  hostname: Emmys-MacBook-Pro-16
  iterations_since_restore: 1
  node_ip: 127.0.0.1
  pid: 79284
  time_since_restore: 6.218915224075317
  time_this_iter_s: 6.218915224075317
  time_total_s: 6.218915224075317
  timestamp: 1666718109
  timesteps_since_restore: 0
  train-error: 0.39749808944916454
  train-logloss: 0.6790240902627057
  training_iteration: 1
  trial_id: 3546f_00008
  valid-error: 0.39637227162489896
  valid-logloss: 0.6791590095909203
  warmup_time: 0.0028328895568847656
  
Result for XGBoostTrainer_3546f_00008:
  date: 2022-10-25_10-15-14
  done: false
  experiment_id: ac9e880b85004ff8807f3efc86a0664c
  hostname: Emmys-MacBook-Pro-16
  iterations_since_restore: 50
  node_ip: 127.0.0.1
  pid: 79284
  time_since_restore: 11.51710319519043
  time_this_iter_s: 1.0106492042541504
  time_total_s: 11.51710319519043
  timestamp: 1666718114
  timesteps

[2m[36m(_RemoteRayXGBoostActor pid=79354)[0m [10:15:26] task [xgboost.ray]:5016343920 got new rank 0
[2m[36m(_RemoteRayXGBoostActor pid=79360)[0m [10:15:26] task [xgboost.ray]:4820260160 got new rank 6
[2m[36m(_RemoteRayXGBoostActor pid=79359)[0m [10:15:26] task [xgboost.ray]:5032072656 got new rank 7
[2m[36m(_RemoteRayXGBoostActor pid=79356)[0m [10:15:26] task [xgboost.ray]:5001663856 got new rank 3
[2m[36m(_RemoteRayXGBoostActor pid=79357)[0m [10:15:26] task [xgboost.ray]:4996420976 got new rank 4
[2m[36m(_RemoteRayXGBoostActor pid=79355)[0m [10:15:26] task [xgboost.ray]:4938749344 got new rank 2
[2m[36m(_RemoteRayXGBoostActor pid=79353)[0m [10:15:26] task [xgboost.ray]:5389882688 got new rank 1
[2m[36m(_RemoteRayXGBoostActor pid=79358)[0m [10:15:26] task [xgboost.ray]:5120153040 got new rank 5


Result for XGBoostTrainer_3546f_00009:
  date: 2022-10-25_10-15-27
  done: false
  experiment_id: 10b30c5460dd41ed9f2555a096fde0fd
  hostname: Emmys-MacBook-Pro-16
  iterations_since_restore: 1
  node_ip: 127.0.0.1
  pid: 79348
  time_since_restore: 6.114388942718506
  time_this_iter_s: 6.114388942718506
  time_total_s: 6.114388942718506
  timestamp: 1666718127
  timesteps_since_restore: 0
  train-error: 0.39749808944916454
  train-logloss: 0.6810629199280553
  training_iteration: 1
  trial_id: 3546f_00009
  valid-error: 0.39637227162489896
  valid-logloss: 0.6811773395342271
  warmup_time: 0.002668142318725586
  
Result for XGBoostTrainer_3546f_00009:
  date: 2022-10-25_10-15-32
  done: false
  experiment_id: 10b30c5460dd41ed9f2555a096fde0fd
  hostname: Emmys-MacBook-Pro-16
  iterations_since_restore: 52
  node_ip: 127.0.0.1
  pid: 79348
  time_since_restore: 11.405245780944824
  time_this_iter_s: 1.012063980102539
  time_total_s: 11.405245780944824
  timestamp: 1666718132
  timesteps

2022-10-25 10:15:36,356	INFO tune.py:758 -- Total run time: 185.91 seconds (185.79 seconds for the tuning loop).


**💻 Coding Excercise 💻**

`Tuner` allows you to specify an optimization algorithm via the `TuneConfig` by setting the following flags:

- `search_alg` which provides an optimizer for selecting the optimal hyperparameters
- `scheduler` which provides a scheduling/resource allocation algorithm for accelerating the search process

Read more about [schedulers](https://docs.ray.io/en/latest/tune/key-concepts.html#schedulers-ref) and [search algorithms](https://docs.ray.io/en/latest/tune/key-concepts.html#search-alg-ref) in Ray AIR and implement them on this example to see a difference in results.

In [None]:
### YOUR CODE HERE ###

**Key Concepts in This Section**

`Tuner`: provides an interface that works with AIR `Trainer`s to perform distributed hyperparameter tuning. You define a set of hyperparameters you wish to tune in a search space, specify a search algorithm, and the `Tuner` returns its results in a `ResultGrid` that contains metrics, results, and checkpoints for each `trial`.

## 4. Ray AIR Predictors
***

Ray AIR Predictors load models from your [checkpoints](https://docs.ray.io/en/latest/ray-air/key-concepts.html#air-checkpoints-doc) generated during training or tuning to perform distributed inference.

During batch prediction, the input batch is converted into a Pandas DataFrame. If there is a `Preprocessor` saved in the provided `Checkpoint`, the preprocessor will be used to transform the DataFrame. The transformed DataFrame is then passed to the model for ingerence and outputted predictions will be of the same type as the original input.

In *Figure 6*, you can see how `BatchPredictor` is passed a `Checkpoint` and `Predictor`.

![Batch Predictor Code Snippet](images/batchpredict_code.png)

*Figure 6*

### Use AIR `BatchPredictor` for Batch Prediction
Previously, we have trained and tuned our XGBoost model on data from June 2021. Let's now take out best checkpoint from the tuning step and perform batch inference on taxi tip data from June 2022.

In [44]:
from ray.train.batch_predictor import BatchPredictor
from ray.train.xgboost import XGBoostPredictor

batch_predictor = BatchPredictor.from_checkpoint(best_result.checkpoint, XGBoostPredictor)

test_dataset = ray.data.read_parquet("s3://anyscale-training-data/intro-to-ray-air/nyc_taxi_2022.parquet").drop_columns("is_big_tip")

predicted_probabilities = batch_predictor.predict(test_dataset)
print("PREDICTED PROBABILITIES")
predicted_probabilities.show()

Read->Map_Batches: 100%|██████████| 1/1 [00:05<00:00,  5.87s/it]
Map Progress (10 actors 3 pending): 100%|██████████| 1/1 [00:09<00:00,  9.71s/it]

PREDICTED PROBABILITIES
{'predictions': 0.4903855621814728}
{'predictions': 0.5747880935668945}
{'predictions': 0.5164017677307129}
{'predictions': 0.3982800245285034}
{'predictions': 0.5598841309547424}
{'predictions': 0.6375836730003357}
{'predictions': 0.470651239156723}
{'predictions': 0.5483236312866211}
{'predictions': 0.5234551429748535}
{'predictions': 0.6507155299186707}
{'predictions': 0.5890452265739441}
{'predictions': 0.660395622253418}
{'predictions': 0.5486254692077637}
{'predictions': 0.5812450647354126}
{'predictions': 0.4990919530391693}
{'predictions': 0.5993757843971252}
{'predictions': 0.6223717927932739}
{'predictions': 0.5275571346282959}
{'predictions': 0.5579653382301331}
{'predictions': 0.5674110054969788}





**💻 Coding Excercise 💻**

Now that you have the predictions generated from the testing set, how did we do? Compare the predictions outputted by `BatchPredictor` with the ground truth labels available in the raw data file.

In [None]:
### YOUR CODE HERE ###

**Key Concepts in This Section**

`Checkpoints`: store the full state of the model periodically, so that partially trained models are available and can be used to resume training from an intermediate point, instead of starting from scratch; also allows for the best model to be saved for batch inference later on

`BatchPredictor`: loads the best model from a checkpoint to perform batch inference on large-scales or online inference

## 5. Ray Serve
***

Finally, we want a way to serve our taxi tip prediction application to our end users, hopefully with a low latency to be maximally useful to drivers on the job. However, this poses a challenge since machine learning models are compute intensive and ideally, this model wouldn't be served in isolation, but rather adjacent to business logic or even other ML models.

Ray Serve is a scalable compute layer for sercing machine learning models that allows you to serve individual models or create composite model pipelines, where you can independently deploy, update, and scale individual components. Serve isn't tied to a specific machine learning library, but rather treats models as ordinary Python code. 

Additionally, it allows you to flexibly combine normal Python business logic alongside machine learning models. This makes it possible to build online inference services completely end-to-end: a Serve application could validate user input, query a database, perform inference scalably across multiple ML models, and combine, filter, and validate the output all in the process of handling a single inference request.

In *Figure 7*, you see the pattern for deploying a `Predictor` from a `Checkpoint` wth Ray Serve.

![Ray Serve Code Snippet](images/serve_code.png)

*Figure 7*

Let's deploy our big tip predictor with Ray Serve.

### 5(a) Use `PredictorDeployment` for Online Inference
Deploy the best model as an inference service by using Ray Serve and the `PredictorDeployment` class. After deploying the service, you can send requests to it.

In [93]:
from ray import serve
from fastapi import Request
from ray.serve import PredictorDeployment
from ray.serve.http_adapters import pandas_read_json

serve.run(
    PredictorDeployment.options(name="XGBoostService", num_replicas=2, route_prefix="/rayair").bind(
        XGBoostPredictor, result.checkpoint, http_adapter=pandas_read_json
    )
)

[2m[36m(ServeController pid=17477)[0m INFO 2022-10-25 16:00:42,744 controller 17477 http_state.py:129 - Starting HTTP proxy with name 'SERVE_CONTROLLER_ACTOR:SERVE_PROXY_ACTOR-a31940c751ab40dd69b8a31e21d443c3ecd0250ef36a03a3da923acf' on node 'a31940c751ab40dd69b8a31e21d443c3ecd0250ef36a03a3da923acf' listening on '127.0.0.1:8000'
[2m[36m(HTTPProxyActor pid=17478)[0m INFO:     Started server process [17478]
[2m[36m(ServeController pid=17477)[0m INFO 2022-10-25 16:00:43,361 controller 17477 deployment_state.py:1232 - Adding 2 replicas to deployment 'XGBoostService'.


RayServeSyncHandle(deployment='XGBoostService')

In [94]:
import requests

sample_input = test_dataset.take(1)
sample_input = dict(sample_input[0])

output = requests.post("http://localhost:8000/rayair", json=[sample_input]).json()
print(output)

[{'predictions': 0.507861316204071}]


[2m[36m(HTTPProxyActor pid=17478)[0m INFO 2022-10-25 16:00:48,478 http_proxy 127.0.0.1 http_proxy.py:315 - POST /rayair 307 2.9ms
[2m[36m(HTTPProxyActor pid=17478)[0m INFO 2022-10-25 16:00:48,493 http_proxy 127.0.0.1 http_proxy.py:315 - POST /rayair 200 13.4ms
[2m[36m(ServeReplica:XGBoostService pid=17479)[0m INFO 2022-10-25 16:00:48,493 XGBoostService XGBoostService#DHICPJ replica.py:482 - HANDLE __call__ OK 10.7ms
[2m[36m(ServeReplica:XGBoostService pid=17482)[0m INFO 2022-10-25 16:00:48,478 XGBoostService XGBoostService#zzxwhN replica.py:482 - HANDLE __call__ OK 0.2ms
[2m[36m(HTTPProxyActor pid=17478)[0m Traceback (most recent call last):
[2m[36m(HTTPProxyActor pid=17478)[0m   File "/Users/emmy/miniforge3/lib/python3.10/site-packages/uvicorn/protocols/http/h11_impl.py", line 136, in handle_events
[2m[36m(HTTPProxyActor pid=17478)[0m     event = self.conn.next_event()
[2m[36m(HTTPProxyActor pid=17478)[0m   File "/Users/emmy/miniforge3/lib/python3.10/site-packa

**Key Concepts in This Section**

`Deployments`: you can think of this as a managed group of Ray actors that can be addressed together and will handle requests load-balanced across them.

## Summary
You've now just created a Ray Dataset, preprocessed some features, built a model with XGBoost, searched a hyperparameter space for the best configuration, loaded the best model from a checkpoint to perform batch inference, and served that model for online inference. Through this end-to-end example, you explored how to use Ray AIR to distribute an entire ML pipeline.

### Key Concepts

- `Datasets`
- `Preprocessors`
- `Trainers`
- `Tuner`
- `Checkpoints`
- `BatchPredictor`
- `Deployments`

### Next Up

Now that you've seen how you can use Ray AIR's unified toolkit to scale an end-to-end machine learning application, let's see how we can use it to scale individual workloads. In the next section we will cover a reinforcement learning example

# Reinforcement Learning on Ray AIR
In this example, we're going to train a reinforcement learning agent using online training. Online training means that the data from the environment is sampled while we are running the algorithm. In contrast, offline training uses data that has been stored on disk before.

## Ray RLLib
Designed for quick interation and a fast path to production, it includes 25+ latest algorithms that are all implemented to run at scale and in multi-agent mode.

RLlib is an open-source library for reinforcement learning (RL), offering support for production-level, highly distributed RL workloads while maintaining unified and simple APIs for a large variety of industry applications. Whether you would like to train your agents in a multi-agent setup, purely from offline (historic) datasets, or using externally connected simulators, RLlib offers a simple solution for each of your decision making needs.

If you either have your problem coded (in Python) as an RL environment or own lots of pre-recorded, historical behavioral data to learn from, you will be up and running in only a few days. RLlib is already used in production by industry leaders in many different verticals such as climate control, industrial control, manufacturing an dlogistics, finance, gaming, automobile, robotics, boat design, and many others.

We can start by running some imports. We're using OpenAI's gym, which is a standard API for reinforcement learning.

In [None]:
import gym
import numpy

In [None]:
from ray.air import RunConfig
from ray.air import ScalingConfig
from ray.air import Checkpoint

from ray.train.rl import RLTrainer
from ray.train.rl import RLPredictor

from ray.air import Result

from ray.tune import Tuner

from ray.rllib.algorithms.marwil import BCTrainer

We're going to use the CartPole environment. insert a gif of cartpole as well as description of the premise

In [None]:
env = gym.make("CartPole-v0")

Set up an RL Trainer??

In [None]:
trainer = RLTrainer(
    run_config = RunConfig(stop={"training_iteration": 5}),
    scaling_config = ScalingConfig(num_workers=2, use_gpu=False),
    algorithm="PPO",
    config={
        "env": "CartPole-v1",
        "framework": "tf",
    },
)

In [None]:
def train_rl_ppo_online(num_workers: int, use_gpu: bool = False) -> Result:
    print("Starting online training")
    trainer = RLTrainer(
        run_config=RunConfig(stop={"training_iteration": 5}),
        scaling_config=ScalingConfig(num_workers=num_workers, use_gpu=use_gpu),
        algorithm="PPO",
        config={
            "env": "CartPole-v1",
            "framework": "tf",
        },
    )
    # Todo (krfricke/xwjiang): Enable checkpoint config in RunConfig
    # result = trainer.fit()
    tuner = Tuner(
        trainer,
        _tuner_kwargs={"checkpoint_at_end": True},
    )
    result = tuner.fit()[0]
    return result

In [None]:
def evaluate_using_checkpoint(checkpoint: Checkpoint, num_episodes) -> list:
    predictor = RLPredictor.from_checkpoint(checkpoint)

    env = gym.make("CartPole-v0")

    rewards = []
    for i in range(num_episodes):
        obs = env.reset()
        reward = 0.0
        done = False
        while not done:
            action = predictor.predict(np.array([obs]))
            obs, r, done, _ = env.step(action[0])
            reward += r
        rewards.append(reward)

    return rewards

In [None]:
result = train_rl_ppo_online(num_workers=2, use_gpu=False)

In [None]:
num_eval_episodes = 3

rewards = evaluate_using_checkpoint(result.checkpoint, num_episodes=num_eval_episodes)
print(f"Average reward over {num_eval_episodes} episodes: " f"{np.mean(rewards)}")

### Summary
#### Key Concepts
#### Key API Elements in This Section
#### Next

# Extra Resources
---
If you would like to practice your new skills further with some in-depth examples beyond the embedded coding excercises, take a look at this list of suggested problems:
- Watch the Ray Summit Talk on [Introduction to Ray AIR](https://github.com/ray-project/hackathon5-algo)
- Check out the [Ray AIR Documentation](https://docs.ray.io/en/latest/ray-air/getting-started.html)
- Understand its [Components and APIs](https://docs.ray.io/en/latest/ray-air/package-ref.html)
- Ray AIR [User Guides](https://docs.ray.io/en/latest/ray-air/user-guides.html) and [Examples](https://docs.ray.io/en/latest/ray-air/examples/index.html)


# Next Steps
---
🎉 Congratulations! You have completed the tutorial on an Introduction to Ray AI Runtime! We dicussed each library in Ray AIR (Data, Train, Tune, Serve, RLLib) and saw some example machine learning workloads to be done with each. In the next module, we will introduce the ecosystem of integrated libraries runs on Ray Core's distributed execution engine, and with Ray Clusters, you can deploy your workloads on AWS, GCP, Azure, or on Kubernetes.

From here, you can learn and get more involved with our active community of developers and researchers by checking out the following resources:
- 💻 [Official Ray Website](https://www.ray.io/): Browse the ecosystem and use this site as a hub to get the information that you need to get going and building with Ray.
- 💬 [Join the Community on Slack](https://forms.gle/9TSdDYUgxYs8SA9e8): Find friends to discuss your new learnings in our Slack space.
- 📣 [Use the Discussion Board](https://discuss.ray.io/): Ask questions, follow topics, and view announcements on this community forum.
- 🙋‍♀️ [Join a Meetup Group](https://www.meetup.com/Bay-Area-Ray-Meetup/): Tune in on meet-ups to listen to compelling talks, get to know other users, and meet the team behind Ray.
- 🪲 [Open an Issue](https://github.com/ray-project/ray/issues/new/choose): Ray is constantly evolving to improve developer experience. Submit feature requests, bug-reports, and get help via GitHub issues.