# Ray API Demo: Parallel Training

Demo created by [Peter Schafhalter](https://github.com/pschafhalter/)

## Training function

Trains a classifier with a hyperparameter.

Technical details:
- The classifier is a multi-layer perceptron.
- The hyperamater (alpha) is the regularization parameter.
- The value of alpha affects the [bias-variance tradeoff](https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff), which impacts whether the model underfits or overfits.

In [None]:
# Import the scikit-learn machine learning library
from sklearn.metrics import accuracy_score
from sklearn.neural_network import MLPClassifier

def train(alpha, train_x, test_x, train_y, test_y):
    # Instantiate a model with the given value of alpha
    classifier = MLPClassifier(alpha=alpha, max_iter=10000)
    # Train the model on the training data
    classifier.fit(train_x, train_y)
    # Evaluate the model on the test data and return the model's accuracy score
    predicted_y = classifier.predict(test_x)
    return accuracy_score(test_y, predicted_y)

## Training with Hyperparameters

- Try different values for alpha to train best model.
- Without parallelizing this is slow.

In [None]:
# Imports for profiling and visualization
from IPython.core.display import HTML
import time
import tqdm

from utils import ray_get_with_progress_bar

# Imports for generating the dataset
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_moons

# Generate the dataset
x, y = make_moons(noise=0.8, random_state=0)

# Set hyperparameters
trials = [10**-x for x in range(16)]

In [None]:
# Train with different values for alpha
results = []
start = time.time()
for alpha in tqdm.tqdm(trials):
    results.append(train(alpha, *train_test_split(x, y)))
serial_time = time.time() - start

# Print results
HTML(f"<h3>Best accuracy: {max(results):.2f}</h3><h3>Total time: {serial_time:.2f} seconds</h3>")

## Parallel Training with Ray

1. Import and set up ray with `ray.init()`
2. Add the `ray.remote` decorator to `train`
3. Replace calls to `train(...)` with `train.remote(...)`
4. Get the resulting python objects with `results = ray.get(results)`

In [None]:
import ray
ray.shutdown()
ray.init()

HTML(f"""<h3>Started Ray locally with:<h3>
<h3>{ray.cluster_resources()["CPU"] : .0f} CPUs</h3>
<h3>{ray.cluster_resources()["memory"]} GB of memory available</h3>
<h3>{ray.cluster_resources()["object_store_memory"]} GB of object store memory</h3>
""")

In [None]:
@ray.remote
def train(alpha, train_x, test_x, train_y, test_y):
    classifier = MLPClassifier(alpha=alpha, max_iter=10000)
    classifier.fit(train_x, train_y)
    predicted_y = classifier.predict(test_x)
    return accuracy_score(test_y, predicted_y)

In [None]:
# Train with different values for alpha
results = []
start = time.time()
for alpha in trials:
    # Call train with train.remote(...)
    results.append(train.remote(alpha, *train_test_split(x, y)))

# Get results
# results = ray.get(results)   # This works just like the result below, but without progress bar
results = ray_get_with_progress_bar(results)

parallel_time = time.time() - start

# Print results
HTML(f"""<h3>Best accuracy: {max(results):.2f}</h3>
<h3>Total time: {parallel_time:.2f} seconds ({serial_time / parallel_time : .2f}x faster)</h3>
""")

## Parallel Training with Ray on a Cluster

1. Launch a Ray cluster on AWS with `ray up cluster_config.yaml`
2. SSH into head node
3. Replace `ray.init()` with `ray.init(redis_address="...")`

The Ray Autoscalar can add and remove nodes as the workload changes. Currently, it integrates with AWS, GCP, Kubernetes, and private clusters.

In [None]:
# Connect to cluster
CLUSTER_ADDRESS = None # Set this to run on cluster
ray.shutdown()
ray.init(redis_address=CLUSTER_ADDRESS, include_webui=True)

url = ray.get_webui_url()  # Override this in case of SSH forwarding from the cluster
HTML(f"""<h3>Connected to Ray cluster with:<h3>
<h3>{ray.cluster_resources()["CPU"] : .0f} CPUs</h3>
<h3>{ray.cluster_resources()["memory"]} GB of memory available</h3>
<h3>{ray.cluster_resources()["object_store_memory"]} GB of object store memory</h3>
<br>
<a href='{url}'>Dashboard</a>
""")

In [None]:
# Train with different values for alpha
results = []
start = time.time()
for alpha in trials:
    # Call train with train.remote(...)
    results.append(train.remote(alpha, *train_test_split(x, y)))

# Get results
# results = ray.get(results)   # This works just like the result below, but without progress bar
results = ray_get_with_progress_bar(results)

cluster_time = time.time() - start

# Print results
HTML(f"""<h3>Best accuracy: {max(results):.2f}</h3>
<h3>Total time: {cluster_time:.2f} seconds ({serial_time / parallel_time : .2f}x faster)</h3>""")

In [None]:
# Train with different values for alpha
cluster_trials = [i for i in range(100)]
results = []
start = time.time()
for alpha in cluster_trials:
    # Call train with train.remote(...)
    results.append(train.remote(alpha, *train_test_split(x, y)))

# Get results
# results = ray.get(results)   # This works just like the result below, but without progress bar
results = ray_get_with_progress_bar(results)

cluster_large_workload_time = time.time() - start

# Print results
HTML(f"""<h3>Best accuracy: {max(results):.2f}</h3>
<h3>Total time: {cluster_large_workload_time:.2f} seconds ({cluster_large_workload_time / serial_time : .2f}x slower)</h3>
<h3>Total trials: {len(cluster_trials)}\t({round(len(cluster_trials) / len(trials))}x more trials)""")

## Summary

In [None]:
HTML(f"""<h3>Time without Ray:\t{serial_time:.2f} seconds</h3>
<h3>Time with Ray:\t{parallel_time:.2f} seconds ({serial_time / parallel_time :.2f}x faster)</h3>
<h3>Time with Ray on Cluster:\t{cluster_time:.2f} seconds ({serial_time / cluster_time :.2f}x faster)</h3>
<h3>Ran {round(len(cluster_trials) / len(trials))}x more trials on cluster in {cluster_large_workload_time:.2f} seconds ({cluster_large_workload_time / serial_time : .2f}x slower)
""")

Note: speedup doesn't exactly scale with cores due to overhead.

Speedup tends to become linear as the number of tasks increases, or tasks become longer.