# Ray Core: Remote Classes as Actors, part 2: Tree of Actors

© 2019-2023, Anyscale. All Rights Reserved

## Overview

Ray has a myriad of design patterns for [tasks](https://docs.ray.io/en/latest/ray-core/tasks/patterns/index.html#task-patterns) and [actors](https://docs.ray.io/en/latest/ray-core/actors/patterns/index.html). These patterns allows you to write distributed applications. In this tutorial we'll explore one of the design pattern, commonly used in Ray libraries to scale workloads.

### Tree of Actors Pattern

This pattern is primarily used in Ray libraries [Ray Tune](https://docs.ray.io/en/latest/tune/index.html), [Ray Train](https://docs.ray.io/en/latest/train/train.html), and [RLlib](https://docs.ray.io/en/latest/rllib/index.html) to train models in a parallel or conduct distributed HPO.

In a tree of actors pattern, a collection of workers as Ray actors (or in some cases Ray tasks too), are managed by a supervisor actor. For example, you want to train multiple models, each of a different ML algorithm, at the same time, while being able to inspect its state during its training. As methology to examine simple baseline models rapidly, this pattern helps machine learning engineers to quickly build a set of baseline models for comparison.

|<img src="https://technical-training-assets.s3.us-west-2.amazonaws.com/Ray_Core/tree_of_actors.png" width="50%" height="45%">|
|:--|
| Common tree actor pattern with a supervisor launching a supervisor actor|

This pattern facilitates **Same Data Different Function/Model (SDDF)** scaling pattern. Popular in [AutoML](https://en.wikipedia.org/wiki/Automated_machine_learning) scenarios, where you may want to train different models, with their respective algorithms, at the same time using the same dataset.

|<img src="https://technical-training-assets.s3.us-west-2.amazonaws.com/Ray_Core/same_data_different_function.png" width="85%" height="75%">|
|:--|
| Same data different function/model is a common scaling pattern|

## Learning objectives

In this this tutorial, we revisit Ray Actors, diving deeper its use for scaling patterns, and learn more about:
 * Common Ray Actors patterns used in Ray native libraries for writing distributed Actors
   * Tree of Actors 
   * Same Data Different Functions (SDDF)
 * How to use Actors and ActorPool for Batch Inference

Let's implement a simple example to illustrate this pattern.

Some preliminary imports

In [None]:
import logging
import time
import ray
import random
from random import randint
import numpy as np
import pandas as pd
import pyarrow.parquet as pq
from pprint import pprint

import ray
from ray.util.actor_pool import ActorPool

First, let's start Ray…

In [None]:
if ray.is_initialized:
    ray.shutdown()
ray.init(logging_level=logging.ERROR)

## Example 1: Supervisor and worker actor pattern

**Problem**: We want to train multiple small models, say of type linear regression, each with their respective machine learning algorithm, on the same dataset, to create a set of baseline models for comparison. In our case, we use the scikit-learn [California house price](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_california_housing.html) dataset and use models of type linear regression. We'll train different linear regression models: Decision tree, random forest, and xgboost regressor.  Each model is trained and evaluated based on [mean square error](https://en.wikipedia.org/wiki/Mean_squared_error)(MSE).

To see this pattern used in production at scale, read [Training One Million Machine Learning Models in Record Time with Ray](https://www.anyscale.com/blog/training-one-million-machine-learning-models-in-record-time-with-ray).
Another blog that employs this pattern is training many models per a particular feature, such as a zip code or a product SKU. [Many Models Batch Training at Scale with Ray Core](https://www.anyscale.com/blog/many-models-batch-training-at-scale-with-ray-core)

|<img src="https://technical-training-assets.s3.us-west-2.amazonaws.com/Overview_of_Ray/California_dataset.png" width="75%" loading="lazy">|
|:--|
|`n_samples = 20640`, target is numeric and corresponds to the average house value in units of 100k.|

### Generic model factory utility  

This factory generates three different algorithms for linear regression–random forest, decsion tree, and xgboost–and trains on the same Scikit learn dataset [California housing price](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_california_housing.html)
Each training model returns its MSE score, along with time to train and relevant parameters. 

Each model can be in a particular state during training. The final state is `DONE`.

In [None]:
from model_helper_utils import RFRActor
from model_helper_utils import DTActor
from model_helper_utils import XGBoostActor
from model_helper_utils import RANDOM_FOREST_CONFIGS, DECISION_TREE_CONFIGS, XGBOOST_CONFIGS

class ModelFactory:
    """
    Mode factory to create different ML models
    """
    MODEL_TYPES = ["random_forest", "decision_tree", "xgboost"]
    
    @staticmethod
    def create_model(model_name: str) -> ray.actor.ActorHandle:
        if model_name not in ModelFactory.MODEL_TYPES:
            raise Exception(f"{model_name} not supported")
        if model_name == "random_forest":
            configs = RANDOM_FOREST_CONFIGS
            return RFRActor.remote(configs)
        elif model_name == "decision_tree":
            configs = DECISION_TREE_CONFIGS
            return DTActor.remote(configs)
        else: 
            configs = XGBOOST_CONFIGS
            return XGBoostActor.remote(configs)

### Create an Actor instance as a supervisor
The supervisor creates three worker actors, each with its own respective training model ML algorithm and its training function.

In [None]:
@ray.remote
class Supervisor:
    def __init__(self):
        # Create three Actor Workers, each by its unique model type and 
        # their respective training function
        self.worker_models =  [ModelFactory.create_model(name) for name in ModelFactory.MODEL_TYPES]

    def work(self):
        # do the train work for each Actor model
        results = [worker_model.train_and_evaluate_model.remote() for worker_model in self.worker_models]
        
        # Return the final results
        return ray.get(results)

### Launch Supervisor's workers
 * create the Supervisor actor
 * launch its workers
 * fetch the final results

In [None]:
supervisor = Supervisor.remote()
results = supervisor.work.remote()
values = ray.get(results)

Let for the workers to finish by returning their `DONE` state

In [None]:
states = []
# Wait for all models to finish
while True:
    for value in values:
        states.append(value["state"])
    result = all('DONE' == e for e in states)
    if result:
        break

In [None]:
from operator import itemgetter
sorted_by_mse = sorted(values, key=itemgetter('mse'))
print(f"\nResults from three training models sorted by MSE ascending order:")
pprint(sorted_by_mse)

### Look at the Ray Dashboard

You should see Actors running as process on the workers nodes
 * Supervisor
 * Eeach model actor: RFRActor, DTActor, XGBoostActor
 
Also, click on the `Actors` to view more metrics and data on individual Ray Actors

### Exercise
Change the estimators or max_depth in [model_helper_utils.py](model_helper_utils.py) 

### Solution

In [None]:
# DECISION_TREE_CONFIGS = {"max_depth": 15,
#                          "name": "decision_tree"}

# RANDOM_FOREST_CONFIGS = {"n_estimators": 150,
#                         "name": "random_forest"}

# XGBOOST_CONFIGS = {"max_depth": 10,
#                    "n_estimators": 150,
#                    "lr": 0.1,
#                    "eta": 0.3,
#                    "colsample_bytree": 1,
#                    "name": "xgboost"}

### Recap
To sum up, we went through implementing a tree actor design pattern, in which a Supervisor
launched worker models. Each model, in our case, is a separate linear regression model training on the same data. This tree actor pattern facilitates the **Same Data Different Functions (SDDF)** scaling pattern, a common machine learning workload.

For modularity, we used `model_helper_utils`, where all the model-related utility code is implemented. From all three linear regression models trained in parallel, with the same dataset, XGBoost seems to have faired well with the best MSE result. This gaves us an initial baseline of set of models to further experiment with different [hyperparameter optimization](https://en.wikipedia.org/wiki/Hyperparameter_optimization). For that I would turn to [Ray Train](https://docs.ray.io/en/latest/train/train.html) and [Ray Tune](https://docs.ray.io/en/latest/tune/index.html), whhich are part of [Ray AIR](https://docs.ray.io/en/latest/ray-air/getting-started.html).

### Homework Exercise (Optional):

Add another linear regression model and train four models. 

**HINTS**:
 * modify `model_helper_utils` to create a new super class of `ActorCls`
 * add new model configs for the new model
 * modify the `ModelFactory` class instantiate this new model
 * update the `Supervisor` to include the new model in its training
 
 You can use one of the linear models from this [array of linear models](https://www.kaggle.com/code/faressayah/practical-introduction-to-10-regression-algorithm).
 
Have fun 😜!

### Example 2: Actor-based batch inference

In our first tutorial on [Ray Tasks](./ex_01_remote_funcs.ipynb) we covered a use case to parallelize batch inference. In short, we
used the **Different Data Same Function(DDSF)** pattern. Here we will use the same pattern but with Ray Actors and [ActorPool](https://docs.ray.io/en/latest/ray-core/actors/actor-utils.html?highlight=ActorPool), which are state-based and preferred method to do batch inference. Primarily, the elements remain the same except for few modifications.

**Input dataset**: This is a large collection of observations to generate predictions for. The data is usually stored in an external storage system like S3, HDFS or database, across
many, files.

**ML model**: This is a trained ML model that is usually also stored in an external storage system or in a model store.

**Predictions**: These are the outputs when applying the ML model on observations. Normally, predictions are usually written back to the storage system. Unlike tasks
doing the predictions, we employ a pool of Actors.

For purpose of this tutorial, we make the following provisions:
 * create a dummy model that returns some fake prediction
 * use real-world NYC taxi data to provide large data set for batch inference
 * create a pool of actors and submit each shard to the pool.
 * return the predictions instead of writing it back to the disk

As an example of scaling pattern called **Different Data Same Function (DDSF)**, also known as **Distributed Data Parallel** (DDP) paradigm, our function in this digaram is the 
pretrained **model** and the data is split and disributed as **shards**.

|<img src="https://technical-training-assets.s3.us-west-2.amazonaws.com/Ray_Core/different_data_same_function.png" width="75%" height="35%">|
|:--|
|Different data same function is another scaling pattern|

In [None]:
NUM_ACTORS = 6             # You can always increase the number of actors to scale
NUM_SHARD_FILES = 12       # number of shard files you want each each actor to work on

In [None]:
# Our load model closure remains the same
def load_trained_model():
    # A fake model that predicts whether tips were given based on number of passengers in the taxi cab.
    def model(batch: pd.DataFrame) -> pd.DataFrame:
        # Some model payload so Ray copies the model in the shared plasma store to tasks scheduled across nodes.
        model.payload = np.arange(10, 10_000, dtype=float)
        #model.payload = np.arange(100, 100_000_000, dtype=float)
        model.cls = "regression"
        
        # give a tip if 2 or more passengers
        predict = batch["passenger_count"] >= 2 
        return pd.DataFrame({"score": predict})
    
    return model    

Create a Ray Actor that stores a model reference and does the prediction

In [None]:
@ray.remote
class NYCBatchPredictor:
    def __init__(self, model):
        self.model = model

    def predict(self, shard_path):
        # read each shard and convert to pandas
        df = pq.read_table(shard_path).to_pandas()
        
        # do the inference with our model and return the result
        result = self.model(df)
        return result

1. Get our trained model instance
2. Store it into the plasma object store

In [None]:
model = load_trained_model()
model_ref = ray.put(model)

Fetch our NYC taxi shard files

In [None]:
# Iterate thorough our NYC files ~ 2GB
input_shard_files = [
    f"s3://anonymous@air-example-data/ursa-labs-taxi-data/downsampled_2009_full_year_data.parquet"
    f"/fe41422b01c04169af2a65a83b753e0f_{i:06d}.parquet"
    for i in range(NUM_SHARD_FILES) ]

In [None]:
input_shard_files

1. Create five Actor instances, each initialized with the same model reference
2. Create a pool of five actors

We use the Ray actor pool utility [ActorPool](https://docs.ray.io/en/latest/ray-core/actors/actor-utils.html?highlight=ActorPool#actor-pool).

[Actool Pool API](https://docs.ray.io/en/latest/ray-core/package-ref.html?highlight=ActorPool#ray-util-actorpool) reference package.

In [None]:
actors = [NYCBatchPredictor.remote(model_ref) for _ in range(NUM_ACTORS)]
actors_pool = ActorPool(actors)

In [None]:
# Submit each shard to the pool of actors for batch reference
# The API syntax is not dissimilar to Python or Ray Multiprocessor pool APIs

for shard_path in input_shard_files:
    # Submit file shard for prediction to the pool
    actors_pool.submit(lambda actor, shard: actor.predict.remote(shard), shard_path)

In [None]:
# Iterate over finised actor's predictions
while actors_pool.has_next():
    r =  actors_pool.get_next()
    print(f"Predictions dataframe size: {len(r)} | Total score for tips: {r['score'].sum()}")

### Recap

What we have demonstrated above is an Actor tree design pattern, commonly used in Ray for writing distributed applications. In particular, Ray's native libraries such as Train, Tune, Serve, and RLib and [Ray AIR's](https://docs.ray.io/en/latest/ray-air/getting-started.html) components use it for distributed training and tuning trials. 

Additionally, we implemented a DDSF scaling design pattern using an Actor-based predictor model function, using an `ActorPool` utility class instead of task.  

Task-based batch inferene has an overhead cost that can be significant if the model size is large, since it has to fetch the model from the driver's plasma store. We can optimize it by using Ray actors, 
which will fetch the model just once and reuse it for all predictions assigned to the same actor in the pool.

In [None]:
ray.shutdown()

### Homework

1. Read references below.

## References

 * [Writing your First Distributed Python Application with Ray](https://www.anyscale.com/blog/writing-your-first-distributed-python-application-with-ray)
 * [Using and Programming with Actors](https://docs.ray.io/en/latest/actors.html)
 * [Ray Asynchronous and Threaded Actors: A way to achieve concurrency](https://medium.com/@2twitme/ray-asynchronous-and-threaded-actors-a-way-to-achieve-concurrency-ad9f86145f72)
 * [Training One Million Machine Learning Models in Record Time with Ray](https://www.anyscale.com/blog/training-one-million-machine-learning-models-in-record-time-with-ray)
 * [Many Models Batch Training at Scale with Ray Core](https://www.anyscale.com/blog/many-models-batch-training-at-scale-with-ray-core)