# Overview of Ray

<img src="../_static/assets/Generic/ray_logo.png" width="20%" loading="lazy">

## About this notebook

### Is it right for you?

This is an introductory notebook that gives a broad overview of the Ray project. It is right for you if:
* you are new to Ray and look for a project primer,
* you are interested in how you can use Ray - Python first distributed computing library - to scale your Python applications and accelerate machine learning workloads.

### Prerequisites

For this notebook we assume:

* practical Python and machine learning experience,
* no prior experience with Ray or distributed computing.

### What will you learn?

Upon completion of this notebook, you will know about:

* what is Ray and why it matters?
* Ray key features,
* three layer of the Ray universe: Core, native libraries, and integrations and ecosystem,
* example Ray use cases and workloads,
* what to do next to start using it.

### What will you do?

In this notebook you will both learn about Ray and run illustrative code example that will give you better "feel" of Ray.

## Part 1: Ray project

<img src="../_static/assets/Overview_of_Ray/ray_project.png" width="70%" loading="lazy">

*Ray is one of the leading open source ML projects. Date accessed: Nov 2, 2022.*

### Introduction

#### What is Ray?

[Ray](https://www.ray.io/) is an open-source unified compute framework that makes it easy to scale AI and Python workloads. Thanks to the Python first approach, ML engineer can parallelize Python programs on their laptop, cluster, cloud, Kubernetes, or on-premise hardware.

Ray automatically handles all aspects of distributed execution including orchestration, scheduling, fault tolerance, and auto-scaling so that you can scale your apps without becoming a distributed systems expert.

With a rich ecosystem of libraries and integrations with many important data science tools, Ray lowers the effort needed to scale compute intensive workloads.

#### A bit of a project history
Distributed computing is hard. Many software systems require resources that far exceed what single servers can do. Even if one server was enough, modern systems need to be failsafe and provide features like high availability. That means you applications might have to run on multiple machines, or even datacenters, just to make sure they're running reliably.

Distributed computing is also becoming increasingly relevant for machine learning. OpenAI has shown exponential growth in compute needed to train AI models in their paper ["AI and Compute"](https://openai.com/blog/ai-and-compute/). The operations needed for AI systems in their study has been doubling every 3.4 months since 2012.

<img src="../_static/assets/Overview_of_Ray/project_history.jpeg" width="70%" loading="lazy">

This context drove researchers to begin building solutions to simplify running code on clusters without having to think about how to orchestrate individual machines. Ray was developed at the UC Berkeley [RISELab](https://rise.cs.berkeley.edu/), the successor to the [AMPLab](https://amplab.cs.berkeley.edu/about/), that created [Apache Spark](https://spark.apache.org/) and [Databricks](https://databricks.com/). [Anyscale](https://www.anyscale.com/), the company behind Ray, was founded by Ray creators to build a managed Ray platform and offers hosted solutions for Ray applications.

<ToDo add a bit of project history - book inspired - with some imgs - UC Berkeley, AMP lab\>

### Ray Characteristics

<img src="../_static/assets/Overview_of_Ray/ray_characteristics.jpeg" width="70%" loading="lazy">

#### Python first approach

Ray is a Python library that enables ML practitioners and Python developers to build distributed applications. Ray exposes concise and easy to use API. Its core library that enables parallel execution has only three key abstractions:

1. Task: remote, stateless Python function,
1. Actor: remote stateful Python class,
1. Object: in-memory, immutable object or value that can be accessed anywhere in the computing cluster.

You will learn more about these abstractions in the Ray Core tutorials.

#### Simple and Flexible API

**Ray Core** - foundation for the whole ecosystem - provides minimalist API that enables distributed computing. With just a few methods you can start building distributed apps.

* `ray.init()` - start and connect to the Ray cluster
* `@ray.remote` -  functions and classes decorator specifying that it will be executed as a task (remote function) or actor (remote class) in a different process
* `.remote` - postfix to the remote functions and classes. Remote operations are *asynchronous*.
* `ray.put()` - Put object in the in-memory object store and return its ID. Use ID to pass object to any remote function or method call.
* `ray.get()` - get a remote object or a list of remote objects from the object store.

*(In the second part of this notebook you will see illustrative example for some of these methods.)*

Native libraries and ecosystem libraries in turn, provides higher level APIs that cater for more domain specific use cases. **Ray AI Runtime (AIR)** is a scalable and unified toolkit for ML applications. Ray AIR enables Python developer and ML engineer to scale individual workloads, end-to-end workflows, and popular ecosystem frameworks, all in familiar Python programming language.

#### Scalability

Ray allows their ML modelers to utilize large clusters in an easy, productive, and resource-efficient way. Fundamentally, Ray treats the entire cluster as a single, unified pool of resources and takes care of optimally mapping compute Tasks/Actors to the pool. By doing so, Ray largely eliminates non-scalable factors in the system. Examples include: [how Instacart uses Ray to power their large scale fulfillment ML pipline](https://www.anyscale.com/ray-summit-2022/agenda/sessions/130), [how OpenAI trains their largest models](https://twitter.com/anyscalecompute/status/1562136159135973380), and [how companies like HuggingFace and Cohere use Ray Train for scaling model training](https://docs.ray.io/en/latest/train/train.html).

Ray's [autoscaler](https://docs.ray.io/en/latest/cluster/key-concepts.html#autoscaling) implements automatic scaling of Ray clusters based on the resource demands of an application while maximizing utilization and minimizing costs. The autoscaler will increase worker nodes when the Ray workload exceeds the cluster's capacity. Whenever worker nodes sit idle, the autoscaler will remove them.

#### Support for heterogeneous hardware

One of the key properties of Ray is natively supporting heterogeneous hardware by allowing developers to specify such hardware when instantiating a Task or Actor. For example, a developer can specify in the same application that one Task needs 1 CPU, while an Actor requires 2 CPUs and 1 Nvidia A100 GPU.

<img src="../_static/assets/Overview_of_Ray/het_hardware.png" width="70%" loading="lazy">

An illustrative example is the [production deep learning pipeline at Uber](https://www.anyscale.com/ray-summit-2022/agenda/sessions/215). A heterogeneous setup of 8 GPU nodes and 9 CPU nodes improves the pipeline throughput by 50%, while substantially saving capital cost, compared with the legacy setup of 16 GPU nodes.

<img src="../_static/assets/Overview_of_Ray/uber.png" width="70%" loading="lazy">

### Ray libraries

<img src="../_static/assets/Overview_of_Ray/map.png" width="70%" loading="lazy">

Here, will discuss the three *layers* that comprise Ray, that is Ray Core, native libraries (Ray AI Runtime), and integrations and ecosystem.

#### Ray Core

Ray Core is a low-level, distributed computing framework for Python with a concise core API, and you can think of it as the foundation that Ray's data science libraries (Ray AIR) and third-party integrations (Ray Ecosystem) are built on. This simple and general-purpose Python library enables every developer to easily build scalable, distributed systems that run on your laptop, cluster, cloud or Kubernetes.

#### Ray AI Runtime

Ray AI Runtime (AIR) is a unified set of libraries built on top of Ray for distributed data processing, model training, tuning, model serving, and reinforcement learning, all in Python. AIR provides simple scalable machine learning for individual workloads and end-to-end workflows, bringing together an ever-growing ecosystem of integrations with your favorite machine learning frameworks.

<img src="../_static/assets/Introduction_to_Ray_AIR/e2e_air.png" width="70%" loading="lazy">

Each of the five native libraries that Ray AIR wraps tackles a piece of the ML specific tasks outlined above. Because this abstraction layer is built on top of Ray Core, it is distributed by nature.

1. [Ray Data](https://docs.ray.io/en/latest/data/dataset.html): scalable, framework-agnostic loading and transforming raw data across training and prediction
2. [Ray Train](https://docs.ray.io/en/latest/train/train.html): distributed multi-node model training with fault tolerance that integrates with your favorite training libraries
3. [Ray Tune](https://docs.ray.io/en/latest/tune/index.html): scales experiment execution and hyperparameter tuning to optimize model performance
4. [Ray Serve](https://docs.ray.io/en/latest/serve/index.html): deploys your model for online inference, with optional microbatching to improve performance
5. [Ray RLlib](https://docs.ray.io/en/latest/rllib/index.html): distributed reinforcement learning workloads that integrate with the other Ray AIR libraries above

#### Integrations and ecosystem libraries

Ray integrates with a [growing ecosystem](https://docs.ray.io/en/latest/ray-overview/ray-libraries.html) of the most popular Python and machine learning libraries and frameworks that you may already be familiar with. Instead of trying to create new standards, Ray allows you to scale existing workloads by unifying tools in a common interface. This interface enables you to run tasks in a distributed way, a property most of the respective backends don't have, or not to the same extent.

For example, Ray Datasets is backed by Arrow and comes with many integrations to other frameworks, such as Spark and Dask. Ray Train and RLlib are backed by the full power of Tensorflow and PyTorch. Ray Tune supports algorithms from practically every noteable HPO tool available, including Hyperopt, Optuna, Nevergrad, Ax, SigOpt, and many others. Ray Serve can be used with frameworks such as FastAPI, gradio, and Streamlit.


### Ray use cases

#### ML practitioner use cases

#### ML platform engineer use cases


### Part 2: Hands-on exercise

Notes
- Example: Cross-Validation on Housing Data
    - Sequential Implementation
    - Distributed Implementation with Ray


<img src="../_static/assets/Overview_of_Ray/map.png" width="70%" loading="lazy">

*Figure 1*

**Context**

Today's artificial intelligence (AI) applications require enormous amounts of data to be trained on and machine learning (ML) models tend to grow over time. From consumer-facing products like recommendation systems and photo editing software to enterprise-level use cases like reducing downtime in manufacturing and order-fulfillment optimization, ML systems have become so complex and infrastructure intensive that developers have no option but to distribute execution across multiple machines. However, distributed computing is hard. It requires specialized knowledge about orchestrating clusters of computers together to efficiently schedule tasks and must provide features like fault tolerance when a component fails, high availability to minimize service interruption, and autoscaling to reduce waste.

As a data scientist, machine learning pracitioner, developer, or engineer, your contribution may center on building data processing pipelines, training complicated models, running efficient hyperparameter experiments, creating simulations of agents, and/or serving your application to users. In each case, you need to choose a distributed system to support each task, but you don't want to learn a different programming language or toss out your existing toolbox. This is where Ray comes in.

# Part One: Ray Core
---

<img src="../_static/assets/Overview_of_Ray/ray_core.png" width="70%" loading="lazy">

*Figure 2*

Ray Core is a low-level, distributed computing framework for Python with a concise core API, and you can think of it as the foundation that Ray's data science libraries (Ray AIR) and third-party integrations (Ray Ecosystem) are built on. This simple and general-purpose Python library enables every developer to easily build scalable, distributed systems that run on your laptop, cluster, cloud or Kubernetes.

A key strength lies in Ray Core's simple primitives: Tasks, Actors, and Objects.

**Tasks:** Ray enables you to designate functions to be executed asychronously on separate Python workers. These asynchronous Ray functions, *tasks*, can specify their resource requirements in terms of CPUs, GPUs, and custom resources which are used by the cluster scheduler to distribute tasks for parallelized execution.

**Actors:** What tasks are to functions, actors are to classes. An actor is a stateful worker and methods of the actor are scheduled on that specific worker and can access and mutate the state of that worker. Like tasks, actors support CPU, GPU, and custom resource requirements.

**Objects:** In Ray, tasks and actors create and compute on objects, and we refer to these objects as *remote objects* because they can be stored anywhere in a Ray cluster. We use *object references* to refer to them, and they are cached in Ray's distributed shared memory *object store*.

Ray sets up and manages clusters of computers so that you can run distributed tasks on them. Let's take some time to practice the key concepts with a hands-on example for each.

### Housing Prices with `sklearn`
***
Now that we've warmed-up with some isolated examples using tasks, objects, and actors, let's take a look at how we can apply Ray Core's flexible and simple API to scale a bare bones version of a common ML task: cross-validation.

Here, we have a dataset of [California Housing](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_california_housing.html) with 20,640 samples and features including `[longitude, latitude, housing_median_age, total_rooms, total_bedrooms, population, households, median_income, median_house_value, ocean_proximity]`. Given that we want to use a linear regression model, we want to assess how the results will generalize to an unseen independent dataset, say, on new housing data coming in this year. To do this, we would try cross-validation which is a model validation technique that resamples different portions of the data to train and test a model on different iterations. After we conduct these trials, we can average the error to get an estimate of the model's predictive performance.

However, training the same model multiple times on different subsets of a dataset can take a long time, especially if you're working with a much more complex model and larger dataset. Pictured below in Figure 3, the sequential approach trains each model one after another in a series.

<img src="../_static/assets/Overview_of_Ray/sequential_timeline.png" width="70%" loading="lazy">

*Figure 3*

In this example, we will first implement the sequential approach, then improve it by distributing training with Ray Core, and finally compare the code differences to highlight how minimal the changes are.

#### Import Relevant Packages

In [None]:
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn import metrics

num_trials = 100
X, y = fetch_california_housing(return_X_y=True, as_frame=True)

#### Train 100 Models Sequentially
Here, we will define a function that randomly splits our housing dataset into testing and training subsets (in the style of Monte Carlo Cross-Validation, where subsets are generated without replacement and have non-unique subsets from round to round). `sequential()` then fits a model, generates predictions, and returns the R-squared score (closer to 1 = better performance, closer to 0 = worse performance).

Then, we'll train 100 models on these random splits, one after another, and finally print out the average of the rounds.

In [None]:
%%time

def sequential():
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

    model = LinearRegression()

    model.fit(X_train, y_train)
    predictions = model.predict(X_test)
    r2 = metrics.r2_score(y_test, predictions)

    return r2

errors_seq = []

for i in range (num_trials):
    errors_seq.append(sequential())

print(sum(errors_seq) / num_trials)


#### From Sequential -> Parallel

We just trained 100 linear regression models in a series and averaged their R-squared values in about ~1 second. Let's now leverage Ray to train these models in parallel (where multiple tasks may happen simultaneously) and see a runtime improvement. In Figure 4, you can visually inspect the difference where the scheduler assigns each available worker (in this timeline, we chose `n = 4` workers) a task. The scheduler itself has a nontrivial overhead involved with communicating between workers and other cluster management.

<img src="../_static/assets/Overview_of_Ray/distributed_timeline.png" width="70%" loading="lazy">

*Figure 4*

With just a few code changes, we will modify our existing Python program to distribute it among *n* number of workers. Of course, this is a lightweight example, but it's illustrative of the kind of user experience you get with Ray Core's lean API.

Notice in Figure 5 that we need to use four API calls:

1. `ray.init()` - initialize a Ray context
2. `@ray.remote` - a decorator that turns functions into tasks and classes into actors
3. `.remote()` - postfix to every remote function, remote class declaration, or invocation of a remote class method; returns an `ObjectRef` associated with the work to be done
4. `ray.get()` - returns an object or list of objects from the object reference

You may notice that instead of storing the result of `train.remote()` directly into a list of `errors`, we instead store it in a list called `obj_refs`. Once you run a Ray remote function, it will immediately return an `ObjectRef` (or 'Object Reference'). This `ObjectRef` is a *promise* of future work, meaning that the task is delegated to a worker, an `ObjectRef` is returned while the task executes in the background, and in order to access the expected output, you need to call `ray.get()` on the `ObjectRef`, which is a blocking call.

<img src="../_static/assets/Overview_of_Ray/housing_diff.png" width="70%" loading="lazy">

*Figure 5*

And with just a few lines of difference, we're able to parallelize training without having to concern ourselves with orchestration, fault tolerance, autoscaling, or anything else that requires specialized knowledge of distributed systems.

#### Train 100 Models in Parallel with Ray
To start, we'll import Ray (check out our [installation instructions](https://docs.ray.io/en/latest/ray-overview/installation.html)) and start a Ray cluster on our local machine that can utilize all the cores available on your computer as workers. We use `ray.is_initialized` to allow us to make sure that we only have one Ray cluster active.

In [None]:
import ray

if ray.is_initialized:
    ray.shutdown()

ray.init()

As illustrated above in Figure 5, we will:
1. Add the decorator `@ray.remote` to our function `distributed()` to specify that it is a task to be run remotely. 
2. Then we call that function as `distributed.remote()` in the `for` loop to append to our list of object references. 
3. Finally, we fetch the result outside of the loop to access the final error list (as to not *block* the launching of remote tasks asynchronously) and print out the average R-squared value.

In [None]:
%%time

@ray.remote
def distributed():
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

    model = LinearRegression()

    model.fit(X_train, y_train)
    predictions = model.predict(X_test)
    r2 = metrics.r2_score(y_test, predictions)

    return r2

obj_refs = []

for i in range (num_trials):
    obj_refs.append(distributed.remote())

errors_dist = ray.get(obj_refs)

print(sum(errors_dist) / num_trials)

And now you've done it! You have distributed the training of 100 models in a very through round of cross-validation on our California Housing dataset. Compare the runtime for each method of training 100 models. Is this what you expected?

## Optional Excercise: Quick Sort

If you would like to test your knowledge of the Ray Core concepts we covered in the housing example, try your hand at building a distributed version of the classic algorithm, quick sort.

At a glance, quick sort selects an element from the array as a pivot, partitions the array into two sub-arrays (according to whether they are less than or greater than the pivot), then sorts them recursively. You can preview an animation of quick sort in action below in Figure 6.

Quick sort is an example of a divide and conquer algorithm, a strategy of solving a large problem by recursively breaking the problem down into smaller sub-problems. By nature, this strategy lends itself well to a parallel implementation because the execution of each operation independently solves smaller instances of the same problem which is then merged into the final solution.

Let's walkthrough the mechanics a little deeper in the code sample that follows.

<img src="https://upload.wikimedia.org/wikipedia/commons/6/6a/Sorting_quicksort_anim.gif" width="70%" loading="lazy">

*Figure 6*

**Sequential Implementation of Quick Sort**

Here is the classic version that you may be familiar with below.

1. Given an array, partition uses the last element as the pivot and creates two subarrays of elements less than the pivot and elements greater than the pivot (greater will always be empty in this case).
2. A call to quick_sort_sequential partitions the array into two subarrays and the pivot element, then calls itself by passing in the two newly created subarrays, and finally merges the sorted answer.
3. We use the if len(collection) <= 500000 condition because it will be useful later in the distributed version to avoid tiny tasks. We'll discuss it further in subsequent tutorials, but if you are curious, find more information [here](https://docs.ray.io/en/latest/ray-core/tips-for-first-time.html#tip-2-avoid-tiny-tasks).

Notice that each merge depends on the recursive call below it to return before completing its task.

In [None]:
def partition(collection):
    pivot = collection.pop()
    greater, lesser = [], []
    for element in collection:
        if element > pivot:
            greater.append(element)
        else:
            lesser.append(element)
    return lesser, pivot, greater

def quick_sort_sequential(collection):
    if len(collection) <= 500000:
        return sorted(collection)
    lesser, pivot, greater = partition(collection)
    lesser = quick_sort_sequential(lesser)
    greater = quick_sort_sequential(greater)
    return lesser + [pivot] + greater

**Your Turn: Distributed Implementation of Quick Sort**

In the space below, try to apply the key concepts in the section before to create a distributed implementation of quick sort. We'll start you out with some base of code to modify, but it's up to you finish it!

In [None]:
import ray
if ray.is_initialized:
    ray.shutdown()

### MODIFY THIS CODE BELOW ###
def quick_sort_distributed(collection):
    if len(collection) <= 500000:
        return sorted(collection)
    lesser, pivot, greater = partition(collection)
    lesser = quick_sort_sequential(lesser)
    greater = quick_sort_sequential(greater)
    return lesser + [pivot] + greater

**Compare Both Versions on a Large Array**

Let's now compare these two implementations by timing each and inspecting the results. We'll create a large random array and time both methods to see how long each takes. Note that we use `ray.put()` to store the array in the object store as to not pass the same large object as an argument repeatedly.

In [None]:
import time
from numpy import random

size = 10_000_000

unsorted = random.randint(1000000, size=(size)).tolist()

s = time.time()
quick_sort_sequential(unsorted)
print(f"Sequential Duration: {(time.time() - s):.3f}")

s = time.time()
unsorted_obj = ray.put(unsorted)
ray.get(quick_sort_distributed.remote(unsorted_obj))
print(f"Distributed Duration: {(time.time() - s):.3f}")

### Summary
1. Introduced to Ray Core and Most Popular Workloads
2. Key Concepts of Ray Core
3. Sequential -> Distributed Training of 100 Models

#### [Key Concepts](https://docs.ray.io/en/latest/ray-core/key-concepts.html)
- Tasks
- Actors
- Objects

#### [Key API Elements in This Section](https://docs.ray.io/en/latest/ray-core/package-ref.html#python-api)
- `ray.init()`
- `@ray.remote`
- `.remote()`
- `ray.get()`
- `ray.put()`

#### Next
Now that we've covered the core engine, let's go up one layer of abstraction to look at a suite of data science libraries build on top of Ray Core to target specific machine learning workloads in the next notebook!

# Homework
---
If you would like to practice your new skills further with some in-depth examples beyond the embedded coding excercises, take a look at this list of suggested problems:

- [Look at More Ray Core Examples](https://docs.ray.io/en/latest/ray-core/examples/overview.html)
    - Walk through applied examples of using Ray Core with common machine learning workloads.
- [Read About Debugging and Profiling on Ray](https://docs.ray.io/en/latest/ray-core/troubleshooting.html)
    - Dig into how to observe Ray work by visualizing tasks in the Ray timeline, profiling using Python's CProfile, understanding crashes and suboptimal performance, and more in this user guide.
- [Distribute a Classical Algorithm with Ray](https://github.com/ray-project/hackathon5-algo)
    - In this excercise, go to the GitHub repo linked above for details on choosing a classic algorithm implemented in Python, editing the implementation to parallelize the work with Ray, and compare your results against the sequential implementation.


# Next Steps
---
Congratulations! You have completed your first tutorial on an Introduction to Ray and Ray Core! We introduced the three layers of Ray: Core, AIR, and the Ecosystem. In this notebook, we explored Ray Core's key concepts of tasks, actors, and objects along with key API elements through examples. In the next module, we will talk about Ray AI Runtime, a set of native libraries built on top of Ray Core specialized for machine learning workloads.

From here, you can learn and get more involved with our active community of developers and researchers by checking out the following resources:
- [Ray's "Getting Started" Guides](https://docs.ray.io/en/latest/ray-overview/index.html): A collection of QuickStart Guides for every library including installation walkthrough, examples, blogs, talks, and more!
- [Official Ray Website](https://www.ray.io/): Browse the ecosystem and use this site as a hub to get the information that you need to get going and building with Ray.
- [Join the Community on Slack](https://forms.gle/9TSdDYUgxYs8SA9e8): Find friends to discuss your new learnings in our Slack space.
- [Use the Discussion Board](https://discuss.ray.io/): Ask questions, follow topics, and view announcements on this community forum.
- [Join a Meetup Group](https://www.meetup.com/Bay-Area-Ray-Meetup/): Tune in on meet-ups to listen to compelling talks, get to know other users, and meet the team behind Ray.
- [Open an Issue](https://github.com/ray-project/ray/issues/new/choose): Ray is constantly evolving to improve developer experience. Submit feature requests, bug-reports, and get help via GitHub issues.