# Overview of Ray

<img src="../_static/assets/Generic/ray_logo.png" width="20%" loading="lazy">

## About this notebook

### Is it right for you?

This is an introductory notebook that gives a broad overview of the Ray project. It is right for you if:

* you are new to Ray and look for a project primer
* you are interested in how you can use Ray - Python first distributed computing library - to scale your Python applications and accelerate machine learning workloads

### Prerequisites

For this notebook you should have:

* practical Python and machine learning experience
* no prior experience with Ray or distributed computing

### Learning objectives

Upon completion of this notebook, you will know about:

* what is Ray?
* key Ray characteristics
* three layers of the Ray libraries: Core, native libraries, and integrations and ecosystem
* example Ray use cases and workloads
* what to do next to start using it

### What will you do?

In the *Part 1* of this notebook you will learn about Ray project. Then, in *Part 2* you will run an illustrative code example that will give you better "feel" of Ray.

## Part 1: Ray project

|<img src="../_static/assets/Overview_of_Ray/ray_project.png" width="70%" loading="lazy">|
|:--|
|Ray is one of the leading open source ML projects. (date accessed: Nov 2, 2022)|

### Introduction

#### What is Ray?

<div class="alert alert-info">
  <strong><a href="https://www.ray.io/" target="_blank">Ray</a></strong> is an open-source unified compute framework that makes it easy to scale AI and Python workloads.
</div>

Thanks to the Python first approach, ML engineers can parallelize Python applications on their laptop, cluster, cloud, Kubernetes, or on-premise hardware. Ray automatically handles all aspects of distributed execution including orchestration, scheduling, fault tolerance, and auto-scaling so that you can scale your apps without becoming a distributed systems expert.

With a rich ecosystem of libraries and integrations with many important data science tools, Ray lowers the effort needed to scale compute intensive workloads and applications.

#### Distributed computing: a bit of a context and project history

|<img src="../_static/assets/Overview_of_Ray/project_history.jpeg" width="70%" loading="lazy">|
|:--|
|Compute demand is growing faster than supply. It exceeds progression of CPUs, GPUs and TPUs processing power. (date accessed: Nov 2, 2022)|

Distributed computing is hard. At the same time it is becoming increasingly crucial and necessary for modern machine learning and AI systems.

OpenAI's recent paper [AI and Compute](https://openai.com/blog/ai-and-compute/) suggests exponential growth in compute needed to train AI models. Study suggests that compute needed for AI systems has been doubling every 3.4 months since 2012.

This context drove researchers to begin building solutions to simplify running code on compute clusters without having to think about how to orchestrate and utilize individual machines. That is, let Ray do the hard bit of orchestrating and executing, and you do the easy bit of writing Python code.

Ray was developed at the University of California Berkeley's [RISELab](https://rise.cs.berkeley.edu/), the successor to the [AMPLab](https://amplab.cs.berkeley.edu/about/), that created [Apache Spark](https://spark.apache.org/) and [Mesos](https://mesos.apache.org/). 

[Anyscale](https://www.anyscale.com/), the company behind Ray, was founded by Ray creators to build a managed Ray platform and offers hosted solutions for Ray applications.

### Key Ray characteristics

|<img src="../_static/assets/Overview_of_Ray/python_first.jpeg" width="70%" loading="lazy">|<img src="../_static/assets/Overview_of_Ray/simple_and_flexible_api.jpeg" width="70%" loading="lazy">|<img src="../_static/assets/Overview_of_Ray/scalability.jpeg" width="70%" loading="lazy">|<img src="../_static/assets/Overview_of_Ray/heterogeneous_hardware.jpeg" width="70%" loading="lazy">|
|:-:|:-:|:-:|:-:|
|Python first approach|Simple and Flexible API|Scalability|Support for heterogeneous hardware|

#### Python first approach

<img src="../_static/assets/Overview_of_Ray/python_first.jpeg" width="100px" loading="lazy">

Ray's framework provides Python library with abstractions and primitives that enables ML practitioners and Python developers to build distributed applications. Ray exposes concise and easy to use API. Its core library that enables parallel execution introduces just a few key abstractions:

1. [Tasks](https://docs.ray.io/en/latest/ray-core/key-concepts.html#tasks): remote, stateless Python functions
1. [Actors](https://docs.ray.io/en/latest/ray-core/key-concepts.html#actors): remote stateful Python classes
1. [Objects](https://docs.ray.io/en/latest/ray-core/key-concepts.html#objects): in-memory, immutable objects or value that can be accessed anywhere in the computing cluster

You will learn more about these abstractions in the [Ray Core tutorials](https://github.com/ray-project/ray-educational-materials/tree/main/Ray_Core).

#### Simple and flexible API

<img src="../_static/assets/Overview_of_Ray/simple_and_flexible_api.jpeg" width="100px" loading="lazy">

##### Ray Core

<div class="alert alert-info">
  <strong><a href="https://docs.ray.io/en/latest/ray-core/walkthrough.html" target="_blank">Ray Core</a></strong> is an open-source, Python, general purpose, distributed computing library that enables ML engineers and Python developers to scale Python applications and accelerate machine learning workloads.
</div>

Foundational library for the whole ecosystem - provides minimalist API that enables distributed computing. With just a few methods you can start building distributed apps.

* `ray.init()` - start Ray runtime and connect to the Ray cluster
* `@ray.remote` -  functions and classes decorator specifying that it will be executed as a task (remote function) or actor (remote class) in a different process
* `.remote` - postfix to the remote functions and classes. Remote operations are *asynchronous*
* `ray.put()` - put an object in the in-memory object store and return its ID. Use this ID to pass object to any remote function or method call
* `ray.get()` - get a remote object or a list of remote objects from the object store

*(In the second part of this notebook you will see illustrative example for some of these methods.)*

##### Ray AI Runtime (AIR)

<div class="alert alert-info">
  <strong><a href="https://docs.ray.io/en/latest/ray-air/getting-started.html" target="_blank">Ray AI Runtime (AIR)</a></strong> is an open-source, Python, domain specific library that equips ML engineers, data scientists, and researchers with a scalable and unified toolkit for ML applications.
</div>

Ray AI Runtime (AIR) (sometimes referred to as native libraries) and ecosystem libraries, provides higher level APIs that cater for more domain specific use cases. Ray AIR enables Python developers and ML engineers to scale individual workloads, end-to-end workflows, and popular ecosystem frameworks, all in familiar Python programming language.

#### Scalability

<img src="../_static/assets/Overview_of_Ray/scalability.jpeg" width="100px" loading="lazy">

Ray allows their users to utilize large compute clusters in an easy, productive, and resource-efficient way.

Fundamentally, Ray treats the entire cluster as a single, unified pool of resources and takes care of optimally mapping compute workloads to the pool. By doing so, Ray largely eliminates non-scalable factors in the system. Successful user stories include, but are not limited to:
* [how Instacart uses Ray to power their large scale fulfillment ML pipline](https://www.youtube.com/watch?v=3t26ucTy0Rs),
* [how OpenAI trains their largest models](https://twitter.com/anyscalecompute/status/1562136159135973380),
* [how companies like HuggingFace and Cohere use Ray Train for scaling model training](https://www.youtube.com/watch?v=For8yLkZP5w).

Ray's [autoscaler](https://docs.ray.io/en/latest/cluster/key-concepts.html#autoscaling) implements automatic scaling of Ray clusters based on the resource demands of an application. The autoscaler will increase worker nodes when the Ray workload exceeds the cluster's capacity. Whenever worker nodes sit idle, the autoscaler will scale them down.

#### Support for heterogeneous hardware

<img src="../_static/assets/Overview_of_Ray/heterogeneous_hardware.jpeg" width="100px" loading="lazy">

One of the key properties of Ray is natively supporting heterogeneous hardware by allowing developers to specify such hardware when instantiating a Task or Actor. For example, a developer can specify in the same application that one Task needs 128 CPUs, while an Actor requires 36 CPUs and 8 Nvidia A100 GPUs.

Fractional GPUs are also supported.

|<img src="../_static/assets/Overview_of_Ray/heterogeneous_hardware_code.png" width="70%" loading="lazy">|
|:--|
|Easily specify amount of resources needed, by using `num_cpus` and `num_gpus`|

An illustrative example is the [production deep learning pipeline at Uber](https://www.anyscale.com/ray-summit-2022/agenda/sessions/215). A heterogeneous setup of 8 GPU nodes and 9 CPU nodes improves the pipeline throughput by 50%, while substantially saving capital cost, compared with the legacy setup of 16 GPU nodes.

|<img src="../_static/assets/Overview_of_Ray/uber.png" width="70%" loading="lazy">|
|:--|
|Production deep learning pipeline at Uber.|

### Ray libraries

|<img src="../_static/assets/Overview_of_Ray/map.png" width="70%" loading="lazy">|
|:--|
|Stack of Ray libraries - unified toolkit for ML workloads.|

Now, you will learn about three *layers* that comprise Ray in the greater detail:

1. Ray Core
1. Ray AI Runtime (native libraries)
1. integrations and ecosystem

#### Ray clusters

<div class="alert alert-info">
  <strong><a href="https://docs.ray.io/en/latest/cluster/getting-started.html" target="_blank">Ray cluster</a></strong> is a set of worker nodes connected to a common Ray head node. Ray clusters can be fixed-size, or they may autoscale up and down according to the resources requested by applications running on the cluster.
</div>

Notice that the bottom layer is [cluster](https://docs.ray.io/en/latest/cluster/getting-started.html). Ray sets up and manages clusters of computers so that you can run distributed applications on them.  You can deploy a Ray cluster on AWS, GCP or on Kubernetes via the officially supported [KubeRay](https://docs.ray.io/en/latest/cluster/kubernetes/index.html) project. Note that [Anyscale](https://www.anyscale.com/), the company behind Ray, builds enterprise-ready AI compute platform for running and managing ray applications.

#### Ray Core

<div class="alert alert-info">
  <strong><a href="https://docs.ray.io/en/latest/ray-core/walkthrough.html" target="_blank">Ray Core</a></strong> is an open-source, Python, general purpose, distributed computing library that enables ML engineers and Python developers to scale Python applications and accelerate machine learning workloads.
</div>

Ray Core is a foundation that Ray's ML libraries (Ray AIR) and third-party integrations (Ray Ecosystem) are built on. This library enables Python developer to easily build scalable, distributed systems that run on your laptop, cluster, cloud or Kubernetes.

Let's expand a bit on key abstractions mentioned before:

1. [Tasks](https://docs.ray.io/en/latest/ray-core/key-concepts.html#tasks): remote, stateless Python functions.  
They are arbitrary Python functions that are executed asychronously on separate Python workers on a Ray cluster nodes. User can specify their resource requirements in terms of CPUs, GPUs, and custom resources which are used by the cluster scheduler to distribute tasks for parallelized execution.

1. [Actors](https://docs.ray.io/en/latest/ray-core/key-concepts.html#actors): remote stateful Python classes.  
What tasks are to functions, actors are to classes. An actor is a stateful worker, and the methods of an actor are scheduled on that specific worker and can access and mutate the state of that worker. Like tasks, actors support CPU, GPU, and custom resource requirements.

1. [Objects](https://docs.ray.io/en/latest/ray-core/key-concepts.html#objects): in-memory, immutable objects or value that can be accessed anywhere in the computing cluster.  
In Ray, tasks and actors create and compute on objects. These remote objects can be stored anywhere in a Ray cluster. Object References are used to refer to them, and they are cached in Ray's distributed shared memory: object store.

#### Ray AI Runtime

<div class="alert alert-info">
  <strong><a href="https://docs.ray.io/en/latest/ray-air/getting-started.html" target="_blank">Ray AI Runtime (AIR)</a></strong> is an open-source, Python, domain specific library that equips ML engineers, data scientists, and researchers with a scalable and unified toolkit for ML applications.
</div>

Ray AIR is built on top of Ray core. It caters for distributed data processing, model training, tuning, model serving, and reinforcement learning, all in Python. To that end, it enables both individual workloads and end-to-end use cases to be implemented in the single unified library.

Ray AIR brings together an ever-growing ecosystem of integrations with your favorite machine learning frameworks.

|<img src="../_static/assets/Introduction_to_Ray_AIR/e2e_air.png" width="70%" loading="lazy">|
|:--|
|Ray AIR enables end-to-end ML development and provides multiple options to integrate with other tools and libraries form the MLOps ecosystem.|

Each of the five native libraries that Ray AIR wraps is focused on a specific ML task. Because this abstraction layer is built on top of Ray Core, it is distributed and scalable.

1. [Ray Data](https://docs.ray.io/en/latest/data/dataset.html): scalable, framework-agnostic loading and transforming raw data across training and prediction
1. [Ray Train](https://docs.ray.io/en/latest/train/train.html): distributed multi-node and multi-core model training with fault tolerance that integrates with your favorite training libraries
1. [Ray Tune](https://docs.ray.io/en/latest/tune/index.html): scales experiment execution and hyperparameter tuning to optimize model performance
1. [Ray Serve](https://docs.ray.io/en/latest/serve/index.html): deploys your model for online inference, with optional microbatching to improve performance
1. [Ray RLlib](https://docs.ray.io/en/latest/rllib/index.html): distributed reinforcement learning workloads that integrate with the other Ray AIR libraries above

#### Integrations and ecosystem libraries

Ray integrates with a [growing ecosystem](https://docs.ray.io/en/latest/ray-overview/ray-libraries.html) of the most popular Python and machine learning libraries and frameworks that you may already be familiar with. Instead of trying to create new standards, Ray allows you to scale existing workloads by unifying tools in a common interface. This interface enables you to run ML tasks in a distributed way, a property most of the respective backends don't have, or not to the same extent.

For example, Ray Datasets is backed by Arrow and comes with many integrations to other frameworks, such as Spark and Dask. Ray Train and RLlib are backed by the full power of Tensorflow and PyTorch. Ray Tune supports algorithms from practically every noteable HPO tool available, including Hyperopt, Optuna, Nevergrad, Ax, SigOpt, and many others. Ray Serve can be used with frameworks such as FastAPI, gradio, and Streamlit.

### Ray use cases

Now that you have a sense of what Ray is in theory, it's important to discuss what makes Ray so useful in practice. In this section, you will encounter the ways that individuals, organizations, and companies leverage Ray to build their AI applications.

First, you will explore how Ray scales common ML workloads. Then, you will about some advanced implementations.

#### Scaling common ML workloads

##### Parallel training of many models
When any given model you want to train can fit on a single GPU, then Ray can assign each training run to a separate Ray Task. In this way, all available workers are utilized to run independent remote functions rather than one worker running jobs sequentially.

|<img src="../_static/assets/Overview_of_Ray/training_small_models.png" width="70%" loading="lazy">|
|:--|
|A visualization of data paralleism pattern for distributed training.|

##### Distributed training of large models
In contrast to training many models, model parallelism partitions a large model across many machines for training. Ray Train has built in abstractions for this for distributing shards of models and running training in parallel.

|<img src="../_static/assets/Overview_of_Ray/model_parallelism.png" width="70%" loading="lazy">|
|:--|
|A visualization of model parallelism pattern for distributed training.|

##### Managing parallel hyperparameter tuning experiments
In a similar vein, running multiple hyperparameter tuning experiments is a pattern apt for distributed computing because each model experiement is independent of the others. Ray Tune handles the hard bit of distributing your hyperparameter optimization and makes available key features such as checkpointing your best result, optimizing scheduling, specifying search patterns, and more.

|<img src="../_static/assets/Overview_of_Ray/tuning_use_case.png" width="70%" loading="lazy">|
|:--|
|Distributed tuning with distributed training per trial.|

##### Reinforcement Learning
Ray RLlib offers support for production-level, distributed reinforcement learning workloads while maintaining unified and simple APIs for a large variety of industry applications.

|<img src="../_static/assets/Overview_of_Ray/rllib_use_case.png" width="70%" loading="lazy">|
|:--|
|RLlib's algorithm classes leverage parallel iterators to implement a synchronous sampling pattern.|

##### Batch inference on CPUs and GPUs
Performing inference on batches of new data can be parallelized by exporting the architecture and weights of a trained model to the shared object store and allowing Ray to handle assignment of predictions to be executed on the batches.

|<img src="../_static/assets/Overview_of_Ray/batch_inference.png" width="70%" loading="lazy">|
|:--|
|Using Ray AIR's `BatchPredictor` for batch inference.|

##### Multi-model composition for model serving

[Ray Serve](https://docs.ray.io/en/latest/serve/index.html) supports complex [model deployment patterns](https://www.youtube.com/watch?v=mM4hJLelzSw) requiring the orchestration of multiple Ray actors, where different actors provide inference for different models. It handles both batch and online inference and can scale to thousands of models in production.

|<img src="../_static/assets/Overview_of_Ray/multi_model_serve.png" width="70%" loading="lazy">|
|:--|
|Deployment patterns with Ray Serve.|

##### ML Platform

[Merlin](https://shopify.engineering/merlin-shopify-machine-learning-platform) is a Shopify's ML platform built on Ray. It enables fast-iteration and [scaling of distributed applications](https://www.youtube.com/watch?v=kbvzvdKH7bc) such as product categorization and recommendations.

|<img src="../_static/assets/Overview_of_Ray/shopify.png" width="70%" loading="lazy">|
|:--|
|Merlin architecture built on Ray.|

Spotify [uses Ray for advanced applications](https://www.anyscale.com/ray-summit-2022/agenda/sessions/180)] that include personalizing content recommendations for home podcasts and personalizing Spotify Radio track sequencing.

|<img src="../_static/assets/Overview_of_Ray/spotify.png" width="70%" loading="lazy">|
|:--|
|How Ray ecosystem powers ML scientists and engineers at Spotify.|

#### Implementing advanced ML workloads

##### Alpa - training very large models with Ray

[Alpa](https://ai.googleblog.com/2022/05/alpa-automated-model-parallel-deep.html) is a [Ray-native library](https://www.anyscale.com/ray-summit-2022/agenda/sessions/170) that automatically partitions, schedules, and executes the training and serving computation of very large deep learning models on hundreds of GPUs.

|<img src="../_static/assets/Overview_of_Ray/alpa.png" width="70%" loading="lazy">|
|:--|
|Parallelization plans for a computational graph from Alpa. A, B, C, and D are operators. Each color represents a different device (i.e. GPU) executing a partition or the full operator leveraging Ray actors.|

##### Exoshuffle - large scale data shuffling

In Ray 2.0, [Exoshuffle](https://cs.paperswithcode.com/paper/exoshuffle-large-scale-shuffle-at-the) is integrated with the Ray Data to provide an application level shuffle system that [outperforms Spark and achieves 82% of theoretical performance on a 100TB sort on 100 nodes](https://www.anyscale.com/ray-summit-2022/agenda/sessions/220).

|<img src="../_static/assets/Overview_of_Ray/exoshuffle.png" width="70%" loading="lazy">|
|:--|
|Shuffle on Ray architecture diagram.|

##### Hamilton - feature engineering

[Hamilton](https://github.com/stitchfix/hamilton) is an open source dataflow micro-framework that manages feature engineering for time series forecasting. Library developed at [StitchFix](https://www.anyscale.com/ray-summit-2022/agenda/sessions/115), it provides scalable parallelism, where each Hamilton function is distributed and data is limited by machine memory.

Integration with Ray provides an out-of-the-box experience.

|<img src="../_static/assets/Overview_of_Ray/stitchfix.png" width="70%" loading="lazy">|
|:--|
|Hamilton architecture on Ray clusters.|

##### Riot Games - reinforcement learning

Riot Games reinforcement learning platform built on Ray drives [key AI applications](https://www.anyscale.com/ray-summit-2022/agenda/sessions/148) that builds bots to play their games at various skill levels to provide additional signals to their designers to deliver the best experiences for players.

|<img src="../_static/assets/Overview_of_Ray/riot.png" width="70%" loading="lazy">|
|:--|
|Riot reinforcement learning workflow on Ray.|

### Summary of the part 1

|<img src="../_static/assets/Overview_of_Ray/map_layers_only.png" width="70%" loading="lazy">|
|:--|
|Stack of Ray libraries - unified toolkit for ML workloads.|

You just learned about:

* what is Ray?
* key Ray characteristics
* three layers of the Ray libraries: Core, native libraries, and integrations and ecosystem
* example Ray use cases and workloads

#### Concepts

Let's revisit key concepts introduced in Part 1:

<div class="alert alert-info">
  <strong><a href="https://www.ray.io/" target="_blank">Ray</a></strong> is an open-source unified compute framework that makes it easy to scale AI and Python workloads.
</div>

<div class="alert alert-info">
  <strong><a href="https://docs.ray.io/en/latest/cluster/getting-started.html" target="_blank">Ray cluster</a></strong> is a set of worker nodes connected to a common Ray head node. Ray clusters can be fixed-size, or they may autoscale up and down according to the resources requested by applications running on the cluster.
</div>

<div class="alert alert-info">
  <strong><a href="https://docs.ray.io/en/latest/ray-core/walkthrough.html" target="_blank">Ray Core</a></strong> is an open-source, Python, general purpose, distributed computing library that enables ML engineers and Python developers to scale Python applications and accelerate machine learning workloads.
</div>

<div class="alert alert-info">
  <strong><a href="https://docs.ray.io/en/latest/ray-air/getting-started.html" target="_blank">Ray AI Runtime (AIR)</a></strong> is an open-source, Python, domain specific library that equips ML engineers, data scientists, and researchers with a scalable and unified toolkit for ML applications.
</div>

#### APIs and technical abstractions

Let's revisit Ray Core three abstractions that enables parrallel computations:

1. [Tasks](https://docs.ray.io/en/latest/ray-core/key-concepts.html#tasks): remote, stateless Python functions
1. [Actors](https://docs.ray.io/en/latest/ray-core/key-concepts.html#actors): remote stateful Python classes
1. [Objects](https://docs.ray.io/en/latest/ray-core/key-concepts.html#objects): in-memory, immutable objects or value that can be accessed anywhere in the computing cluster

#### What's next?

You will run an illustrative code example that will give you better "feel" of Ray.

## Part 2: Hands-on code example - scaling regression with Ray Core

### Introduction

In this section, you will run an illustrative code example that will give you better "feel" of Ray. Specifically, you will use Ray Core to scale a bare bones version of a common ML task: regression on the structured data.

#### Data

Dataset is [California House Prices](https://scikit-learn.org/stable/datasets/real_world.html#california-housing-dataset) as available via scikit-learn.

|<img src="../_static/assets/Overview_of_Ray/California_dataset.png" width="70%" loading="lazy">|
|:--|
|`n_samples = 20640`, target is numeric and corresponds to the average house value in units of 100k.|

#### Model and task

You will train and score [random forest](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html) models using [mean squared error](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html) metric.

In order to find the best performing model you will train many models with varying `n_estimators` hyper-parameter. This brings the topic of sequential vs. parallel model training. You will first first implement the sequential approach, then improve it by distributing training with Ray Core - you will achieve better performance and faster model training.

### Sequential implementation

Vanilla implementation assumes sequential training. Models are trained one by one in the sequential way, as depicted on the diagram below. 

|<img src="../_static/assets/Overview_of_Ray/sequential_timeline.png" width="70%" loading="lazy">|
|:--|
|Timeline of sequential tasks, one after the other.|

#### Preliminaries

In [None]:
# imports
import time
from operator import itemgetter

import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

#### Prepare dataset

In [None]:
X, y = fetch_california_housing(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=201
)

In [None]:
X.head(n=5)

#### Set number of models to train

In [None]:
NUM_MODELS = 20

You will train `num_models` in both sequential and parallel scenarios.

#### Implement function to train and score model

In [None]:
def train_and_score_model(
    train_set: pd.DataFrame,
    test_set: pd.DataFrame,
    train_labels: pd.Series,
    test_labels: pd.Series,
    n_estimators: int,
):
    start_time = time.time()  # measure wall time for single model training

    model = RandomForestRegressor(n_estimators=n_estimators, random_state=201)
    model.fit(train_set, train_labels)
    y_pred = model.predict(test_set)
    score = mean_squared_error(test_labels, y_pred)

    time_delta = time.time() - start_time
    print(
        f"n_estimators={n_estimators}, mse={score:.4f}, took: {time_delta:.2f} seconds"
    )

    return n_estimators, score

This function takes data, instantiates `RandomForestRegressor` model, trains it and scores the model on the test set.

Function returns tuple:
```
(n_estimators, mse_score)
```

For example:

```
(8, 0.2983)
```

#### Implement function that runs **sequential** model training

In [None]:
def run_sequential(n_models: int):
    return [
        train_and_score_model(
            train_set=X_train,
            test_set=X_test,
            train_labels=y_train,
            test_labels=y_test,
            n_estimators=8 + 4 * j,
        )
        for j in range(n_models)
    ]

This function train `n_models` sequentially for the increasing number of `n_estimators` (it increases by 4 for each model, so 8, 12, 16, 20, ...). 

Function returns list of tuples:
```
[(n_estimators, mse_score), (n_estimators, mse_score), ...]
```

For example:

```
[(8, 0.2983), (12, 0.2826), (16, 0.2761), (24, 0.2694)]
```

#### Run sequential model training 

In [None]:
%%time

mse_scores = run_sequential(n_models=NUM_MODELS)

Note: wall time on the M1 MacBook Pro: 1min (60s).

#### Analyse results

In [None]:
best = min(mse_scores, key=itemgetter(1))
print(f"Best model: mse={best[1]:.4f}, n_estimators={best[0]}")

Training completed, but performance is slow due to sequential nature of model training.

### Parallel implementation

Now, you use Ray to train these models in parallel, utilizing all available resources. Diagram below gives visual intuition for this setup.

|<img src="../_static/assets/Overview_of_Ray/distributed_timeline.png" width="70%" loading="lazy">|
|:--|
|Sample timeline with ten tasks running across 4 worker nodes in parallel with minor overhead from scheduler.|

#### Initialize Ray runtime

In [None]:
# import Ray
import ray

if ray.is_initialized:
    ray.shutdown()

ray.init()

* `ray.init()` starts Ray runtime on the compute cluster.
* with `ray.is_initialized` you make sure that you have only one Ray cluster.

#### Put data in the object store

|<img src="../_static/assets/Overview_of_Ray/object_store.png" width="70%" loading="lazy">|
|:--|
|Diagram of workers in worker nodes using `ray.put()` to place values and using `ray.get()` to retrieve them from each node's object store.|

In [None]:
X_train_ref = ray.put(X_train)
X_test_ref = ray.put(X_test)
y_train_ref = ray.put(y_train)
y_test_ref = ray.put(y_test)

Your data is now available for all remote Tasks and Actors in the cluster. You use Object Ref to reference the object when needed.

An example Object Reference looks like this:

`ObjectRef(00ffffffffffffffffffffffffffffffffffffff0100000002000000)`

#### Implement function to train and score model

In [None]:
@ray.remote
def train_and_score_model(
    train_set_ref: pd.DataFrame,
    test_set_ref: pd.DataFrame,
    train_labels_ref: pd.Series,
    test_labels_ref: pd.Series,
    n_estimators: int,
):
    start_time = time.time()  # measure wall time for single model training

    model = RandomForestRegressor(n_estimators=n_estimators, random_state=201)
    model.fit(train_set_ref, train_labels_ref)
    y_pred = model.predict(test_set_ref)
    score = mean_squared_error(test_labels_ref, y_pred)

    time_delta = time.time() - start_time
    print(
        f"n_estimators={n_estimators}, mse={score:.4f}, took: {time_delta:.2f} seconds"
    )

    return n_estimators, score

* It is exactly the same function as in the sequential example
* You added `@ray.remote` decorator to specify that this function will be executed as a remote task in a different process (remotely)

#### Implement function that runs **parallel** model training

In [None]:
def run_parallel(n_models: int):
    results_ref = [
        train_and_score_model.remote(
            train_set_ref=X_train_ref,
            test_set_ref=X_test_ref,
            train_labels_ref=y_train_ref,
            test_labels_ref=y_test_ref,
            n_estimators=8 + 4 * j,
        )
        for j in range(n_models)
    ]
    return ray.get(results_ref)

You modified `run_sequential()` function to achieve parallel execution.

**Remote Tasks**

* Functions with `.remote` (as in the code above) suffix returns an `ObjectRef` associated with the computations to be done.
* When you run a remote function (Ray Task), it will immediately return an `ObjectRef` (Object Reference). It is a *promise* of future work (Python futures), meaning that the task is delegated to a worker, and `ObjectRef` is returned while the task executes in the background. This is an asynchronous operation.
* To access the expected output, you call `ray.get()` (as in the code above) on the `ObjectRef` or list of `ObjectRef`. It is a synchronous operation (blocking call). In other words: Use `ray.get()` on the returned list of `ObjectRef` to get remote objects from the object store.

Operation:

```
ray.get([ObjectRef, ObjectRef, ObjectRef, ...])
```

returns list of `(n_estimators, score)` tuples.

#### Run parallel model training 

In [None]:
%%time

mse_scores = run_parallel(n_models=NUM_MODELS)

Notice **6x performance gain**

* Parallel: 10s.
* Sequential: 1min (60s).


*(experiment on the M1 MacBook Pro)*

#### Analyse results

In [None]:
best = min(mse_scores, key=itemgetter(1))
print(f"Best model: mse={best[1]:.4f}, n_estimators={best[0]}")

Training completed, with **7x performance gain** due to parallel execution.

#### Shutdown Ray runtime

In [None]:
ray.shutdown()

Disconnect the worker, and terminate processes started by `ray.init()`.

### Summary of the part 2: code example

You achieved significant performance gain by introducing parallel model training. You adapted sequential model training computational job to run in parallel by using Ray Core API.

With Ray you parallelized training without having to implement orchestration, fault tolerance or autoscaling component that requires specialized knowledge of distributed systems.

#### Key Concepts

* [Tasks](https://docs.ray.io/en/latest/ray-core/key-concepts.html#tasks): remote, stateless Python functions
* [Actors](https://docs.ray.io/en/latest/ray-core/key-concepts.html#actors): remote stateful Python classes
* [Objects](https://docs.ray.io/en/latest/ray-core/key-concepts.html#objects): in-memory, immutable objects or value that can be accessed anywhere in the computing cluster

#### Key API Elements

* `ray.init()` - start Ray runtime and connect to the Ray cluster
* `@ray.remote` -  functions and classes decorator specifying that it will be executed as a task (remote function) or actor (remote class) in a different process
* `.remote` - postfix to the remote functions and classes. Remote operations are *asynchronous*
* `ray.put()` - put an object in the in-memory object store and return its ID. Use this ID to pass object to any remote function or method call
* `ray.get()` - get a remote object or a list of remote objects from the object store (synchronous operation)

|<img src="../_static/assets/Overview_of_Ray/side_by_side.png" width="100%" loading="lazy">|
|:--|
|Schematic illustration of the code changes needed to create distributed Ray remote tasks.|

# Connect with the Ray community

You can learn and get more involved with the Ray community of developers and researchers:

* [Ray documentation](https://docs.ray.io/en/latest)
* [Official Ray Website](https://www.ray.io/): Browse the ecosystem and use this site as a hub to get the information that you need to get going and building with Ray.
* [Join the Community on Slack](https://forms.gle/9TSdDYUgxYs8SA9e8): Find friends to discuss your new learnings in our Slack space.
* [Use the Discussion Board](https://discuss.ray.io/): Ask questions, follow topics, and view announcements on this community forum.
* [Join a Meetup Group](https://www.meetup.com/Bay-Area-Ray-Meetup/): Tune in on meet-ups to listen to compelling talks, get to know other users, and meet the team behind Ray.
* [Open an Issue](https://github.com/ray-project/ray/issues/new/choose): Ray is constantly evolving to improve developer experience. Submit feature requests, bug-reports, and get help via GitHub issues.
* [Become a Ray contributor](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html): We welcome community contributions to improve our documentation and Ray framework.

<img src="../_static/assets/Generic/ray_logo.png" width="20%" loading="lazy">