# Introduction to the Ray AI Libraries

Let's start with a quick end-to-end example to get a sense of what the Ray AI Libraries can do.

<div class="alert alert-block alert-info">
<b> Here is the roadmap for this notebook:</b>
<ul>
    <li><b>Part 1:</b> Overview of the Ray AI Libraries</a></li>
    <li><b>Part 2:</b> Quick end-to-end example</a></li>
</ul>
</div>


## 1. Overview of the Ray AI Libraries

<img src="https://technical-training-assets.s3.us-west-2.amazonaws.com/Ray_AI_Libraries/Ray+AI+Libraries.png" width="70%" loading="lazy">

Built on top of Ray Core, the Ray AI Libraries inherit all the performance and scalability benefits offered by Core while providing a convenient abstraction layer for machine learning. These Python-first native libraries allow ML practitioners to distribute individual workloads, end-to-end applications, and build custom use cases in a unified framework.

The Ray AI Libraries bring together an ever-growing ecosystem of integrations with popular machine learning frameworks to create a common interface for development.

|<img src="https://technical-training-assets.s3.us-west-2.amazonaws.com/Introduction_to_Ray_AIR/e2e_air.png" width="100%" loading="lazy">|
|:-:|
|Ray AI Libraries enable end-to-end ML development and provides multiple options for integrating with other tools and libraries form the MLOps ecosystem.|



## 2. Quick end-to-end example

|Ray AIR Component|NYC Taxi Use Case|
|:--|:--|
|Ray Data|Ingest and transform raw data; perform batch inference by mapping the checkpointed model to batches of data.|
|Ray Train|Use `Trainer` to scale XGBoost model training.|
|Ray Tune|Use `Tuner` for hyperparameter search.|
|Ray Serve|Deploy the model for online inference.|

For this classification task, you will apply a simple [XGBoost](https://xgboost.readthedocs.io/en/stable/) (a gradient boosted trees framework) model to the June 2021 [New York City Taxi & Limousine Commission's Trip Record Data](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page). This dataset contains over 2 million samples of yellow cab rides, and the goal is to predict whether a trip will result in a tip greater than 20% or not.

**Dataset features**
* **`passenger_count`**
    * Float (whole number) representing number of passengers.
* **`trip_distance`** 
    * Float representing trip distance in miles.
* **`fare_amount`**
    * Float representing total price including tax, tip, fees, etc.
* **`trip_duration`**
    * Integer representing seconds elapsed.
* **`hour`**
    * Hour that the trip started.
    * Integer in the range `[0, 23]`
* **`day_of_week`**
    * Integer in the range `[1, 7]`.
* **`is_big_tip`**
    * Whether the tip amount was greater than 20%.
    * Boolean `[True, False]`.

__Import libraries__

In [1]:
import json
import pandas as pd
import requests
import xgboost
from starlette.requests import Request

import ray
from ray import tune
from ray.train import ScalingConfig, RunConfig
from ray.train.xgboost import XGBoostTrainer
from ray.tune import Tuner, TuneConfig
from ray import serve

__Read, preprocess with Ray Data__

In [2]:
# Read the dataset
dataset = ray.data.read_parquet("s3://anonymous@anyscale-training-data/intro-to-ray-air/nyc_taxi_2021.parquet")

# Split the dataset into training and validation sets
train_dataset, valid_dataset = dataset.train_test_split(test_size=0.3)

2025-04-09 21:06:23,800	INFO worker.py:1843 -- Started a local Ray instance. View the dashboard at [1m[32mhttp://127.0.0.1:8265 [39m[22m


Parquet Files Sample 0:   0%|                                                                                 …

2025-04-09 21:06:34,257	INFO streaming_executor.py:108 -- Starting execution of Dataset. Full logs are in /tmp/ray/session_2025-04-09_21-06-22_537812_450727/logs/ray-data
2025-04-09 21:06:34,257	INFO streaming_executor.py:109 -- Execution plan of Dataset: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadParquet]


Running 0: 0.00 row [00:00, ? row/s]

- ReadParquet->SplitBlocks(147) 1: 0.00 row [00:00, ? row/s]

__Fit model with Ray Train__

In [3]:
# Define the trainer
trainer = XGBoostTrainer(
    label_column="is_big_tip",
    scaling_config=ScalingConfig(num_workers=4, use_gpu=True),
    params={"objective": "binary:logistic"},
    datasets={"train": train_dataset, "valid": valid_dataset},
    run_config=RunConfig(storage_path="/mnt/cluster_storage/"),
)

# Fit the trainer
result = trainer.fit()

2025-04-09 21:07:17,309	INFO tune.py:616 -- [output] This uses the legacy output and progress reporter, as Jupyter notebooks are not supported by the new engine, yet. For more information, please see https://github.com/ray-project/ray/issues/36949


== Status ==
Current time: 2025-04-09 21:07:17 (running for 00:00:00.12)
Using FIFO scheduling algorithm.
Logical resource usage: 0/8 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
Result logdir: /tmp/ray/session_2025-04-09_21-06-22_537812_450727/artifacts/2025-04-09_21-07-17/XGBoostTrainer_2025-04-09_21-07-17/driver_artifacts
Number of trials: 1/1 (1 PENDING)


== Status ==
Current time: 2025-04-09 21:07:22 (running for 00:00:05.16)
Using FIFO scheduling algorithm.
Logical resource usage: 0/8 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
Result logdir: /tmp/ray/session_2025-04-09_21-06-22_537812_450727/artifacts/2025-04-09_21-07-17/XGBoostTrainer_2025-04-09_21-07-17/driver_artifacts
Number of trials: 1/1 (1 PENDING)


== Status ==
Current time: 2025-04-09 21:07:27 (running for 00:00:10.20)
Using FIFO scheduling algorithm.
Logical resource usage: 0/8 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
Result logdir: /tmp/ray/session_2025-04-09_21-06-22_537812_450727/artifacts/2025-04-09_21-07-17/XG



== Status ==
Current time: 2025-04-09 21:08:17 (running for 00:01:00.63)
Using FIFO scheduling algorithm.
Logical resource usage: 0/8 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
Result logdir: /tmp/ray/session_2025-04-09_21-06-22_537812_450727/artifacts/2025-04-09_21-07-17/XGBoostTrainer_2025-04-09_21-07-17/driver_artifacts
Number of trials: 1/1 (1 PENDING)


== Status ==
Current time: 2025-04-09 21:08:22 (running for 00:01:05.66)
Using FIFO scheduling algorithm.
Logical resource usage: 0/8 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
Result logdir: /tmp/ray/session_2025-04-09_21-06-22_537812_450727/artifacts/2025-04-09_21-07-17/XGBoostTrainer_2025-04-09_21-07-17/driver_artifacts
Number of trials: 1/1 (1 PENDING)


== Status ==
Current time: 2025-04-09 21:08:28 (running for 00:01:10.70)
Using FIFO scheduling algorithm.
Logical resource usage: 0/8 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
Result logdir: /tmp/ray/session_2025-04-09_21-06-22_537812_450727/artifacts/2025-04-09_21-07-17/XG



== Status ==
Current time: 2025-04-09 21:09:18 (running for 00:02:01.14)
Using FIFO scheduling algorithm.
Logical resource usage: 0/8 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
Result logdir: /tmp/ray/session_2025-04-09_21-06-22_537812_450727/artifacts/2025-04-09_21-07-17/XGBoostTrainer_2025-04-09_21-07-17/driver_artifacts
Number of trials: 1/1 (1 PENDING)


== Status ==
Current time: 2025-04-09 21:09:23 (running for 00:02:06.18)
Using FIFO scheduling algorithm.
Logical resource usage: 0/8 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
Result logdir: /tmp/ray/session_2025-04-09_21-06-22_537812_450727/artifacts/2025-04-09_21-07-17/XGBoostTrainer_2025-04-09_21-07-17/driver_artifacts
Number of trials: 1/1 (1 PENDING)


== Status ==
Current time: 2025-04-09 21:09:28 (running for 00:02:11.22)
Using FIFO scheduling algorithm.
Logical resource usage: 0/8 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
Result logdir: /tmp/ray/session_2025-04-09_21-06-22_537812_450727/artifacts/2025-04-09_21-07-17/XG



== Status ==
Current time: 2025-04-09 21:10:18 (running for 00:03:01.54)
Using FIFO scheduling algorithm.
Logical resource usage: 0/8 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
Result logdir: /tmp/ray/session_2025-04-09_21-06-22_537812_450727/artifacts/2025-04-09_21-07-17/XGBoostTrainer_2025-04-09_21-07-17/driver_artifacts
Number of trials: 1/1 (1 PENDING)


== Status ==
Current time: 2025-04-09 21:10:23 (running for 00:03:06.56)
Using FIFO scheduling algorithm.
Logical resource usage: 0/8 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
Result logdir: /tmp/ray/session_2025-04-09_21-06-22_537812_450727/artifacts/2025-04-09_21-07-17/XGBoostTrainer_2025-04-09_21-07-17/driver_artifacts
Number of trials: 1/1 (1 PENDING)


== Status ==
Current time: 2025-04-09 21:10:28 (running for 00:03:11.58)
Using FIFO scheduling algorithm.
Logical resource usage: 0/8 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
Result logdir: /tmp/ray/session_2025-04-09_21-06-22_537812_450727/artifacts/2025-04-09_21-07-17/XG



== Status ==
Current time: 2025-04-09 21:11:19 (running for 00:04:01.99)
Using FIFO scheduling algorithm.
Logical resource usage: 0/8 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
Result logdir: /tmp/ray/session_2025-04-09_21-06-22_537812_450727/artifacts/2025-04-09_21-07-17/XGBoostTrainer_2025-04-09_21-07-17/driver_artifacts
Number of trials: 1/1 (1 PENDING)


== Status ==
Current time: 2025-04-09 21:11:24 (running for 00:04:07.03)
Using FIFO scheduling algorithm.
Logical resource usage: 0/8 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
Result logdir: /tmp/ray/session_2025-04-09_21-06-22_537812_450727/artifacts/2025-04-09_21-07-17/XGBoostTrainer_2025-04-09_21-07-17/driver_artifacts
Number of trials: 1/1 (1 PENDING)


== Status ==
Current time: 2025-04-09 21:11:29 (running for 00:04:12.07)
Using FIFO scheduling algorithm.
Logical resource usage: 0/8 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
Result logdir: /tmp/ray/session_2025-04-09_21-06-22_537812_450727/artifacts/2025-04-09_21-07-17/XG



== Status ==
Current time: 2025-04-09 21:12:19 (running for 00:05:02.49)
Using FIFO scheduling algorithm.
Logical resource usage: 0/8 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
Result logdir: /tmp/ray/session_2025-04-09_21-06-22_537812_450727/artifacts/2025-04-09_21-07-17/XGBoostTrainer_2025-04-09_21-07-17/driver_artifacts
Number of trials: 1/1 (1 PENDING)


== Status ==
Current time: 2025-04-09 21:12:24 (running for 00:05:07.52)
Using FIFO scheduling algorithm.
Logical resource usage: 0/8 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
Result logdir: /tmp/ray/session_2025-04-09_21-06-22_537812_450727/artifacts/2025-04-09_21-07-17/XGBoostTrainer_2025-04-09_21-07-17/driver_artifacts
Number of trials: 1/1 (1 PENDING)


== Status ==
Current time: 2025-04-09 21:12:29 (running for 00:05:12.56)
Using FIFO scheduling algorithm.
Logical resource usage: 0/8 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
Result logdir: /tmp/ray/session_2025-04-09_21-06-22_537812_450727/artifacts/2025-04-09_21-07-17/XG



== Status ==
Current time: 2025-04-09 21:13:20 (running for 00:06:02.91)
Using FIFO scheduling algorithm.
Logical resource usage: 0/8 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
Result logdir: /tmp/ray/session_2025-04-09_21-06-22_537812_450727/artifacts/2025-04-09_21-07-17/XGBoostTrainer_2025-04-09_21-07-17/driver_artifacts
Number of trials: 1/1 (1 PENDING)


== Status ==
Current time: 2025-04-09 21:13:25 (running for 00:06:07.94)
Using FIFO scheduling algorithm.
Logical resource usage: 0/8 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
Result logdir: /tmp/ray/session_2025-04-09_21-06-22_537812_450727/artifacts/2025-04-09_21-07-17/XGBoostTrainer_2025-04-09_21-07-17/driver_artifacts
Number of trials: 1/1 (1 PENDING)


== Status ==
Current time: 2025-04-09 21:13:30 (running for 00:06:12.97)
Using FIFO scheduling algorithm.
Logical resource usage: 0/8 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
Result logdir: /tmp/ray/session_2025-04-09_21-06-22_537812_450727/artifacts/2025-04-09_21-07-17/XG



== Status ==
Current time: 2025-04-09 21:14:20 (running for 00:07:03.33)
Using FIFO scheduling algorithm.
Logical resource usage: 0/8 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
Result logdir: /tmp/ray/session_2025-04-09_21-06-22_537812_450727/artifacts/2025-04-09_21-07-17/XGBoostTrainer_2025-04-09_21-07-17/driver_artifacts
Number of trials: 1/1 (1 PENDING)


== Status ==
Current time: 2025-04-09 21:14:25 (running for 00:07:08.36)
Using FIFO scheduling algorithm.
Logical resource usage: 0/8 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
Result logdir: /tmp/ray/session_2025-04-09_21-06-22_537812_450727/artifacts/2025-04-09_21-07-17/XGBoostTrainer_2025-04-09_21-07-17/driver_artifacts
Number of trials: 1/1 (1 PENDING)


== Status ==
Current time: 2025-04-09 21:14:30 (running for 00:07:13.39)
Using FIFO scheduling algorithm.
Logical resource usage: 0/8 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
Result logdir: /tmp/ray/session_2025-04-09_21-06-22_537812_450727/artifacts/2025-04-09_21-07-17/XG



== Status ==
Current time: 2025-04-09 21:15:21 (running for 00:08:03.75)
Using FIFO scheduling algorithm.
Logical resource usage: 0/8 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
Result logdir: /tmp/ray/session_2025-04-09_21-06-22_537812_450727/artifacts/2025-04-09_21-07-17/XGBoostTrainer_2025-04-09_21-07-17/driver_artifacts
Number of trials: 1/1 (1 PENDING)


== Status ==
Current time: 2025-04-09 21:15:26 (running for 00:08:08.79)
Using FIFO scheduling algorithm.
Logical resource usage: 0/8 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
Result logdir: /tmp/ray/session_2025-04-09_21-06-22_537812_450727/artifacts/2025-04-09_21-07-17/XGBoostTrainer_2025-04-09_21-07-17/driver_artifacts
Number of trials: 1/1 (1 PENDING)


== Status ==
Current time: 2025-04-09 21:15:31 (running for 00:08:13.82)
Using FIFO scheduling algorithm.
Logical resource usage: 0/8 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
Result logdir: /tmp/ray/session_2025-04-09_21-06-22_537812_450727/artifacts/2025-04-09_21-07-17/XG



== Status ==
Current time: 2025-04-09 21:16:21 (running for 00:09:04.20)
Using FIFO scheduling algorithm.
Logical resource usage: 0/8 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
Result logdir: /tmp/ray/session_2025-04-09_21-06-22_537812_450727/artifacts/2025-04-09_21-07-17/XGBoostTrainer_2025-04-09_21-07-17/driver_artifacts
Number of trials: 1/1 (1 PENDING)


== Status ==
Current time: 2025-04-09 21:16:26 (running for 00:09:09.23)
Using FIFO scheduling algorithm.
Logical resource usage: 0/8 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
Result logdir: /tmp/ray/session_2025-04-09_21-06-22_537812_450727/artifacts/2025-04-09_21-07-17/XGBoostTrainer_2025-04-09_21-07-17/driver_artifacts
Number of trials: 1/1 (1 PENDING)


== Status ==
Current time: 2025-04-09 21:16:31 (running for 00:09:14.26)
Using FIFO scheduling algorithm.
Logical resource usage: 0/8 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
Result logdir: /tmp/ray/session_2025-04-09_21-06-22_537812_450727/artifacts/2025-04-09_21-07-17/XG



== Status ==
Current time: 2025-04-09 21:17:21 (running for 00:10:04.64)
Using FIFO scheduling algorithm.
Logical resource usage: 0/8 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
Result logdir: /tmp/ray/session_2025-04-09_21-06-22_537812_450727/artifacts/2025-04-09_21-07-17/XGBoostTrainer_2025-04-09_21-07-17/driver_artifacts
Number of trials: 1/1 (1 PENDING)


== Status ==
Current time: 2025-04-09 21:17:26 (running for 00:10:09.67)
Using FIFO scheduling algorithm.
Logical resource usage: 0/8 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
Result logdir: /tmp/ray/session_2025-04-09_21-06-22_537812_450727/artifacts/2025-04-09_21-07-17/XGBoostTrainer_2025-04-09_21-07-17/driver_artifacts
Number of trials: 1/1 (1 PENDING)


== Status ==
Current time: 2025-04-09 21:17:32 (running for 00:10:14.71)
Using FIFO scheduling algorithm.
Logical resource usage: 0/8 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
Result logdir: /tmp/ray/session_2025-04-09_21-06-22_537812_450727/artifacts/2025-04-09_21-07-17/XG



== Status ==
Current time: 2025-04-09 21:18:22 (running for 00:11:05.06)
Using FIFO scheduling algorithm.
Logical resource usage: 0/8 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
Result logdir: /tmp/ray/session_2025-04-09_21-06-22_537812_450727/artifacts/2025-04-09_21-07-17/XGBoostTrainer_2025-04-09_21-07-17/driver_artifacts
Number of trials: 1/1 (1 PENDING)


== Status ==
Current time: 2025-04-09 21:18:27 (running for 00:11:10.09)
Using FIFO scheduling algorithm.
Logical resource usage: 0/8 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
Result logdir: /tmp/ray/session_2025-04-09_21-06-22_537812_450727/artifacts/2025-04-09_21-07-17/XGBoostTrainer_2025-04-09_21-07-17/driver_artifacts
Number of trials: 1/1 (1 PENDING)


== Status ==
Current time: 2025-04-09 21:18:32 (running for 00:11:15.13)
Using FIFO scheduling algorithm.
Logical resource usage: 0/8 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
Result logdir: /tmp/ray/session_2025-04-09_21-06-22_537812_450727/artifacts/2025-04-09_21-07-17/XG



== Status ==
Current time: 2025-04-09 21:19:22 (running for 00:12:05.46)
Using FIFO scheduling algorithm.
Logical resource usage: 0/8 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
Result logdir: /tmp/ray/session_2025-04-09_21-06-22_537812_450727/artifacts/2025-04-09_21-07-17/XGBoostTrainer_2025-04-09_21-07-17/driver_artifacts
Number of trials: 1/1 (1 PENDING)


== Status ==
Current time: 2025-04-09 21:19:27 (running for 00:12:10.50)
Using FIFO scheduling algorithm.
Logical resource usage: 0/8 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
Result logdir: /tmp/ray/session_2025-04-09_21-06-22_537812_450727/artifacts/2025-04-09_21-07-17/XGBoostTrainer_2025-04-09_21-07-17/driver_artifacts
Number of trials: 1/1 (1 PENDING)


== Status ==
Current time: 2025-04-09 21:19:32 (running for 00:12:15.54)
Using FIFO scheduling algorithm.
Logical resource usage: 0/8 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
Result logdir: /tmp/ray/session_2025-04-09_21-06-22_537812_450727/artifacts/2025-04-09_21-07-17/XG

2025-04-09 21:19:47,895	INFO tune.py:1009 -- Wrote the latest version of all result files and experiment state to '/mnt/cluster_storage/XGBoostTrainer_2025-04-09_21-07-17' in 0.0038s.
2025-04-09 21:19:47,898	INFO tune.py:1041 -- Total run time: 750.59 seconds (750.57 seconds for the tuning loop).
Resume training with: <FrameworkTrainer>.restore(path="/mnt/cluster_storage/XGBoostTrainer_2025-04-09_21-07-17", ...)
- XGBoostTrainer_254f2_00000: FileNotFoundError('Could not fetch metrics for XGBoostTrainer_254f2_00000: both result.json and progress.csv were not found at /mnt/cluster_storage/XGBoostTrainer_2025-04-09_21-07-17/XGBoostTrainer_254f2_00000_0_2025-04-09_21-07-17')


== Status ==
Current time: 2025-04-09 21:19:47 (running for 00:12:30.57)
Using FIFO scheduling algorithm.
Logical resource usage: 0/8 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
Result logdir: /tmp/ray/session_2025-04-09_21-06-22_537812_450727/artifacts/2025-04-09_21-07-17/XGBoostTrainer_2025-04-09_21-07-17/driver_artifacts
Number of trials: 1/1 (1 PENDING)




__Optimize hyperparameters with Ray Tune__

In [4]:
# Define the tuner
tuner = Tuner(
    trainer,
    param_space={"params": {"max_depth": tune.randint(2, 12)}},
    tune_config=TuneConfig(num_samples=3, metric="valid-logloss", mode="min"),
    run_config=RunConfig(storage_path="/mnt/cluster_storage/"),
)

# Fit the tuner and get the best checkpoint
checkpoint = tuner.fit().get_best_result().checkpoint

0,1
Current time:,2025-04-09 20:59:28
Running for:,00:00:25.92
Memory:,12.9/31.3 GiB

Trial name,status,loc,params/max_depth,iter,total time (s),train-logloss,valid-logloss
XGBoostTrainer_fe7d2_00000,TERMINATED,192.168.99.98:448623,4,11,4.9593,0.66104,0.661369
XGBoostTrainer_fe7d2_00001,TERMINATED,192.168.99.98:449027,5,11,5.11561,0.660262,0.660711
XGBoostTrainer_fe7d2_00002,TERMINATED,192.168.99.98:449437,9,11,5.37494,0.657308,0.658953


2025-04-09 20:59:02,694	INFO data_parallel_trainer.py:339 -- GPUs are detected in your Ray cluster, but GPU training is not enabled for this trainer. To enable GPU training, make sure to set `use_gpu` to True in your scaling config.
2025-04-09 20:59:02,698	INFO data_parallel_trainer.py:339 -- GPUs are detected in your Ray cluster, but GPU training is not enabled for this trainer. To enable GPU training, make sure to set `use_gpu` to True in your scaling config.
2025-04-09 20:59:02,702	INFO data_parallel_trainer.py:339 -- GPUs are detected in your Ray cluster, but GPU training is not enabled for this trainer. To enable GPU training, make sure to set `use_gpu` to True in your scaling config.


(pid=448837) Running 0: 0.00 row [00:00, ? row/s]

(pid=448837) - split(4, equal=True) 1: 0.00 row [00:00, ? row/s]

(pid=448836) Running 0: 0.00 row [00:00, ? row/s]

(pid=448836) - split(4, equal=True) 1: 0.00 row [00:00, ? row/s]

(pid=449243) Running 0: 0.00 row [00:00, ? row/s]

(pid=449243) - split(4, equal=True) 1: 0.00 row [00:00, ? row/s]

(pid=449244) Running 0: 0.00 row [00:00, ? row/s]

(pid=449244) - split(4, equal=True) 1: 0.00 row [00:00, ? row/s]

(pid=449655) Running 0: 0.00 row [00:00, ? row/s]

(pid=449655) - split(4, equal=True) 1: 0.00 row [00:00, ? row/s]

(pid=449656) Running 0: 0.00 row [00:00, ? row/s]

(pid=449656) - split(4, equal=True) 1: 0.00 row [00:00, ? row/s]

2025-04-09 20:59:28,609	INFO tune.py:1009 -- Wrote the latest version of all result files and experiment state to '/mnt/cluster_storage/XGBoostTrainer_2025-04-09_20-59-02' in 0.0071s.
2025-04-09 20:59:28,615	INFO tune.py:1041 -- Total run time: 25.93 seconds (25.91 seconds for the tuning loop).


__Batch inference with Ray Data__

In [4]:
class OfflinePredictor:
    def __init__(self):
        # Load expensive state
        self._model = xgboost.Booster()
        self._model.load_model(checkpoint.path + "/model.ubj")

    def __call__(self, batch: dict) -> dict:
        # Make prediction in batch
        dmatrix = xgboost.DMatrix(pd.DataFrame(batch))
        outputs = self._model.predict(dmatrix)
        return {"prediction": outputs}

In [6]:
# Apply the predictor to the validation dataset
valid_dataset_inputs = valid_dataset.drop_columns(['is_big_tip'])
predicted_probabilities = valid_dataset_inputs.map_batches(OfflinePredictor, concurrency=2)

In [8]:
# Materialize a batch
predicted_probabilities.take_batch()

2025-04-09 21:00:46,144	INFO streaming_executor.py:108 -- Starting execution of Dataset. Full logs are in /tmp/ray/session_2025-04-09_20-57-16_228490_447200/logs/ray-data
2025-04-09 21:00:46,145	INFO streaming_executor.py:109 -- Execution plan of Dataset: InputDataBuffer[Input] -> ActorPoolMapOperator[MapBatches(drop_columns)->MapBatches(OfflinePredictor)] -> LimitOperator[limit=20]


Running 0: 0.00 row [00:00, ? row/s]

- MapBatches(drop_columns)->MapBatches(OfflinePredictor) 1: 0.00 row [00:00, ? row/s]

- limit=20 2: 0.00 row [00:00, ? row/s]

2025-04-09 21:00:47,337	ERROR streaming_executor_state.py:483 -- An exception was raised from a task of operator "MapBatches(drop_columns)->MapBatches(OfflinePredictor)". Dataset execution will now abort. To ignore this exception and continue, set DataContext.max_errored_blocks.


RayTaskError(UserCodeException): [36mray::MapBatches(drop_columns)->MapBatches(OfflinePredictor)()[39m (pid=450100, ip=192.168.99.98, actor_id=32d0c34a19bde173ce4734d101000000, repr=MapWorker(MapBatches(drop_columns)->MapBatches(OfflinePredictor)))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/archive/ray/course/intro2ray/.venv/lib/python3.12/site-packages/ray/data/_internal/execution/util.py", line 78, in __call__
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/home/map/.local/share/uv/python/cpython-3.12.0-linux-x86_64-gnu/lib/python3.12/concurrent/futures/_base.py", line 456, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/home/map/.local/share/uv/python/cpython-3.12.0-linux-x86_64-gnu/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/home/map/.local/share/uv/python/cpython-3.12.0-linux-x86_64-gnu/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipykernel_447200/2559185443.py", line 10, in __call__
  File "/data/archive/ray/course/intro2ray/.venv/lib/python3.12/site-packages/xgboost/core.py", line 729, in inner_f
    return func(**kwargs)
           ^^^^^^^^^^^^^^
  File "/data/archive/ray/course/intro2ray/.venv/lib/python3.12/site-packages/xgboost/core.py", line 2502, in predict
    self._validate_features(fn)
  File "/data/archive/ray/course/intro2ray/.venv/lib/python3.12/site-packages/xgboost/core.py", line 3243, in _validate_features
    raise ValueError(msg.format(self.feature_names, feature_names))
ValueError: feature_names mismatch: ['passenger_count', 'trip_distance', 'fare_amount', 'trip_duration', 'hour', 'day_of_week'] ['passenger_count', 'trip_distance', 'fare_amount', 'trip_duration', 'hour', 'day_of_week', '__index_level_0__']
training data did not have the following fields: __index_level_0__

The above exception was the direct cause of the following exception:

[36mray::MapBatches(drop_columns)->MapBatches(OfflinePredictor)()[39m (pid=450100, ip=192.168.99.98, actor_id=32d0c34a19bde173ce4734d101000000, repr=MapWorker(MapBatches(drop_columns)->MapBatches(OfflinePredictor)))
  File "/data/archive/ray/course/intro2ray/.venv/lib/python3.12/site-packages/ray/data/_internal/execution/operators/actor_pool_map_operator.py", line 415, in submit
    yield from _map_task(
  File "/data/archive/ray/course/intro2ray/.venv/lib/python3.12/site-packages/ray/data/_internal/execution/operators/map_operator.py", line 535, in _map_task
    for b_out in map_transformer.apply_transform(iter(blocks), ctx):
  File "/data/archive/ray/course/intro2ray/.venv/lib/python3.12/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 530, in __call__
    for data in iter:
  File "/data/archive/ray/course/intro2ray/.venv/lib/python3.12/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 216, in _udf_timed_iter
    output = next(input)
             ^^^^^^^^^^^
  File "/data/archive/ray/course/intro2ray/.venv/lib/python3.12/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 328, in __call__
    yield from self._batch_fn(input, ctx)
  File "/data/archive/ray/course/intro2ray/.venv/lib/python3.12/site-packages/ray/data/_internal/planner/plan_udf_map_op.py", line 398, in transform_fn
    res = fn(batch)
          ^^^^^^^^^
  File "/data/archive/ray/course/intro2ray/.venv/lib/python3.12/site-packages/ray/data/_internal/planner/plan_udf_map_op.py", line 302, in fn
    _handle_debugger_exception(e)
  File "/data/archive/ray/course/intro2ray/.venv/lib/python3.12/site-packages/ray/data/_internal/planner/plan_udf_map_op.py", line 326, in _handle_debugger_exception
    raise UserCodeException() from e
ray.exceptions.UserCodeException

__Online prediction with Ray Serve__

In [None]:
@serve.deployment
class OnlinePredictor:
    def __init__(self, checkpoint):
        # Load expensive state
        self._model = xgboost.Booster()
        self._model.load_model(checkpoint.path + "/model.ubj")

    async def __call__(self, request: Request) -> dict:
        # Handle HTTP request
        data = await request.json()
        data = json.loads(data)
        return {"prediction": self.predict(data)}

    def predict(self, data: list[dict]) -> list[float]:
        # Make prediction
        dmatrix = xgboost.DMatrix(pd.DataFrame(data))
        return self._model.predict(dmatrix)

# Run the deployment
handle = serve.run(OnlinePredictor.bind(checkpoint=checkpoint))

In [None]:
# Form payload
valid_dataset_inputs = valid_dataset.drop_columns(["is_big_tip"])
sample_batch = valid_dataset_inputs.take_batch(1)
data = pd.DataFrame(sample_batch).to_json(orient="records")

# Send HTTP request
requests.post("http://localhost:8000/", json=data).json()

In [5]:
# Shutdown Ray Serve
serve.shutdown()

INFO 2025-04-09 21:32:13,406 serve 450727 -- Nothing to shut down. There's no Serve application running on this Ray cluster.


In [6]:
# Cleanup
!rm -rf /mnt/cluster_storage/XGBoostTrainer*