# Overview of Ray

<img src="../_static/assets/Generic/ray_logo.png" width="20%" loading="lazy">

## About this notebook

### Is it right for you?

This is an introductory notebook that gives a broad overview of the Ray project. It is right for you if:

* you are new to Ray and look for a project primer
* you are interested in how you can use Ray - Python first distributed computing library - to scale your Python applications and accelerate machine learning workloads

### Prerequisites

For this notebook we assume:

* practical Python and machine learning experience
* no prior experience with Ray or distributed computing

### Learning objectives

Upon completion of this notebook, you will know about:

* what is Ray?
* key Ray characteristics
* three layers of the Ray libraries: Core, native libraries, and integrations and ecosystem
* example Ray use cases and workloads
* what to do next to start using it

### What will you do?

In the *Part 1* of this notebook you will learn about Ray project. Then, in *Part 2* you will run an illustrative code example that will give you better "feel" of Ray.

## Part 1: Ray project

|<img src="../_static/assets/Overview_of_Ray/ray_project.png" width="70%" loading="lazy">|
|:--|
|Ray is one of the leading open source ML projects. (date accessed: Nov 2, 2022)|

### Introduction

#### What is Ray?

<div class="alert alert-info">
  <strong><a href="https://www.ray.io/" target="_blank">Ray</a></strong> is an open-source unified compute framework that makes it easy to scale AI and Python workloads.
</div>

Thanks to the Python first approach, ML engineer can parallelize Python applications on their laptop, cluster, cloud, Kubernetes, or on-premise hardware. Ray automatically handles all aspects of distributed execution including orchestration, scheduling, fault tolerance, and auto-scaling so that you can scale your apps without becoming a distributed systems expert.

With a rich ecosystem of libraries and integrations with many important data science tools, Ray lowers the effort needed to scale compute intensive workloads and applications.

#### Distributed computing: a bit of a context and project history

|<img src="../_static/assets/Overview_of_Ray/project_history.jpeg" width="70%" loading="lazy">|
|:--|
|Compute demand is growing faster than supply. It exceeds progression of CPUs, GPUs and TPUs processing power. (date accessed: Nov 2, 2022)|

Distributed computing is hard. At the same time it is becoming increasingly crucial for modern machine learning and AI systems.

OpenAI's recent paper paper [AI and Compute](https://openai.com/blog/ai-and-compute/) suggests exponential growth in compute needed to train AI models. Study suggests that compute needed for AI systems has been doubling every 3.4 months since 2012.

This context drove researchers to begin building solutions to simplify running code on compute clusters without having to think about how to orchestrate and utilize individual machines. Ray was developed at the University of California Berkeley's [RISELab](https://rise.cs.berkeley.edu/), the successor to the [AMPLab](https://amplab.cs.berkeley.edu/about/), that created [Apache Spark](https://spark.apache.org/) and [Databricks](https://databricks.com/).

[Anyscale](https://www.anyscale.com/), the company behind Ray, was founded by Ray creators to build a managed Ray platform and offers hosted solutions for Ray applications.

### Key Ray Characteristics

|<img src="../_static/assets/Overview_of_Ray/python_first.jpeg" width="70%" loading="lazy">|<img src="../_static/assets/Overview_of_Ray/simple_and_flexible_api.jpeg" width="70%" loading="lazy">|<img src="../_static/assets/Overview_of_Ray/scalability.jpeg" width="70%" loading="lazy">|<img src="../_static/assets/Overview_of_Ray/heterogeneous_hardware.jpeg" width="70%" loading="lazy">|
|:-:|:-:|:-:|:-:|
|Python first approach|Simple and Flexible API|Scalability|Support for heterogeneous hardware|

#### Python first approach

<img src="../_static/assets/Overview_of_Ray/python_first.jpeg" width="100px" loading="lazy">

Ray is a Python library that enables ML practitioners and Python developers to build distributed applications. Ray exposes concise and easy to use API. Its core library that enables parallel execution introduces just a few key abstractions:

1. [Tasks](https://docs.ray.io/en/latest/ray-core/key-concepts.html#tasks): remote, stateless Python functions
1. [Actors](https://docs.ray.io/en/latest/ray-core/key-concepts.html#actors): remote stateful Python classes
1. [Objects](https://docs.ray.io/en/latest/ray-core/key-concepts.html#objects): in-memory, immutable objects or value that can be accessed anywhere in the computing cluster

You will learn more about these abstractions in the [Ray Core tutorials](https://github.com/ray-project/ray-educational-materials/tree/main/Ray_Core).

#### Simple and Flexible API

<img src="../_static/assets/Overview_of_Ray/simple_and_flexible_api.jpeg" width="100px" loading="lazy">

##### Ray Core

<div class="alert alert-info">
  <strong><a href="https://docs.ray.io/en/latest/ray-core/walkthrough.html" target="_blank">Ray Core</a></strong> is an open-source, Python, general purpose, distributed computing library that enables ML engineers and Python developers to scale Python applications and accelerate machine learning workloads.
</div>

Foundational library for the whole ecosystem - provides minimalist API that enables distributed computing. With just a few methods you can start building distributed apps.

* `ray.init()` - start and connect to the Ray cluster
* `@ray.remote` -  functions and classes decorator specifying that it will be executed as a task (remote function) or actor (remote class) in a different process
* `.remote` - postfix to the remote functions and classes. Remote operations are *asynchronous*
* `ray.put()` - put an object in the in-memory object store and return its ID. Use this ID to pass object to any remote function or method call
* `ray.get()` - get a remote object or a list of remote objects from the object store

*(In the second part of this notebook you will see illustrative example for some of these methods.)*

##### Ray AI Runtime (AIR)

<div class="alert alert-info">
  <strong><a href="https://docs.ray.io/en/latest/ray-air/getting-started.html" target="_blank">Ray AI Runtime (AIR)</a></strong> is an open-source, Python, domain specific library that equip ML engineers, data scientists, and researchers with a scalable and unified toolkit for ML applications.
</div>

Ray AI Runtime (AIR) (sometimes referred to as native libraries) and ecosystem libraries, provides higher level APIs that cater for more domain specific use cases. Ray AIR enables Python developer and ML engineer to scale individual workloads, end-to-end workflows, and popular ecosystem frameworks, all in familiar Python programming language.

#### Scalability

<img src="../_static/assets/Overview_of_Ray/scalability.jpeg" width="100px" loading="lazy">

Ray allows their users to utilize large compute clusters in an easy, productive, and resource-efficient way.

Fundamentally, Ray treats the entire cluster as a single, unified pool of resources and takes care of optimally mapping compute workloads to the pool. By doing so, Ray largely eliminates non-scalable factors in the system. Successful user stories include, but are not limited to:
* [how Instacart uses Ray to power their large scale fulfillment ML pipline](https://www.anyscale.com/ray-summit-2022/agenda/sessions/130),
* [how OpenAI trains their largest models](https://twitter.com/anyscalecompute/status/1562136159135973380),
* [how companies like HuggingFace and Cohere use Ray Train for scaling model training](https://docs.ray.io/en/latest/train/train.html).

Ray's [autoscaler](https://docs.ray.io/en/latest/cluster/key-concepts.html#autoscaling) implements automatic scaling of Ray clusters based on the resource demands of an application. The autoscaler will increase worker nodes when the Ray workload exceeds the cluster's capacity. Whenever worker nodes sit idle, the autoscaler will scale them down.

#### Support for heterogeneous hardware

<img src="../_static/assets/Overview_of_Ray/heterogeneous_hardware.jpeg" width="100px" loading="lazy">

One of the key properties of Ray is natively supporting heterogeneous hardware by allowing developers to specify such hardware when instantiating a Task or Actor. For example, a developer can specify in the same application that one Task needs 128 CPUs, while an Actor requires 36 CPUs and 8 Nvidia A100 GPU.

Fractional GPUs are also supported.

|<img src="../_static/assets/Overview_of_Ray/heterogeneous_hardware_code.png" width="70%" loading="lazy">|
|:--|
|Easily specify amount of resources needed, by using `num_cpus` and `num_gpus`|

An illustrative example is the [production deep learning pipeline at Uber](https://www.anyscale.com/ray-summit-2022/agenda/sessions/215). A heterogeneous setup of 8 GPU nodes and 9 CPU nodes improves the pipeline throughput by 50%, while substantially saving capital cost, compared with the legacy setup of 16 GPU nodes.

|<img src="../_static/assets/Overview_of_Ray/uber.png" width="70%" loading="lazy">|
|:--|
|Production deep learning pipeline at Uber.|

### Ray libraries

|<img src="../_static/assets/Overview_of_Ray/map.png" width="70%" loading="lazy">|
|:--|
|Stack of Ray libraries - unified toolkit for ML workloads.|

Here, we discuss the three *layers* that comprise Ray in the greater detail. Specifically we describe:

1. Ray Core,
1. Ray AI Runtime (native libraries),
1. integrations and ecosystem.

#### Ray Clusters

<div class="alert alert-info">
  <strong><a href="https://docs.ray.io/en/latest/cluster/getting-started.html" target="_blank">Ray cluster</a></strong> is a set of worker nodes connected to a common Ray head node. Ray clusters can be fixed-size, or they may autoscale up and down according to the resources requested by applications running on the cluster.
</div>

Notice that the bottom layer is [cluster](https://docs.ray.io/en/latest/cluster/getting-started.html). Ray sets up and manages clusters of computers so that you can run distributed applications on them.  You can deploy a Ray cluster on AWS, GCP or on Kubernetes via the officially supported [KubeRay](https://docs.ray.io/en/latest/cluster/kubernetes/index.html) project. Note that [Anyscale](https://www.anyscale.com/), the company behind Ray, builds enterprise-ready AI compute platform for running and managing ray applications.

#### Ray Core

<div class="alert alert-info">
  <strong><a href="https://docs.ray.io/en/latest/ray-core/walkthrough.html" target="_blank">Ray Core</a></strong> is an open-source, Python, general purpose, distributed computing library that enables ML engineers and Python developers to scale Python applications and accelerate machine learning workloads.
</div>

Ray Core is a foundation that Ray's ML libraries (Ray AIR) and third-party integrations (Ray Ecosystem) are built on. This library enables Python developer to easily build scalable, distributed systems that run on your laptop, cluster, cloud or Kubernetes.

Let's expand a bit on key abstractions mentioned before:

1. [Tasks](https://docs.ray.io/en/latest/ray-core/key-concepts.html#tasks): remote, stateless Python functions.  
They are arbitrary Python functions that are executed asychronously on separate Python workers. User can specify their resource requirements in terms of CPUs, GPUs, and custom resources which are used by the cluster scheduler to distribute tasks for parallelized execution.

1. [Actors](https://docs.ray.io/en/latest/ray-core/key-concepts.html#actors): remote stateful Python classes.  
What tasks are to functions, actors are to classes. An actor is a stateful worker and methods of the actor are scheduled on that specific worker and can access and mutate the state of that worker. Like tasks, actors support CPU, GPU, and custom resource requirements.

1. [Objects](https://docs.ray.io/en/latest/ray-core/key-concepts.html#objects): in-memory, immutable objects or value that can be accessed anywhere in the computing cluster.  
In Ray, tasks and actors create and compute on objects. These remote objects can be stored anywhere in a Ray cluster. Object IDs are used to refer to them, and they are cached in Ray's distributed shared memory: object store.

#### Ray AI Runtime

<div class="alert alert-info">
  <strong><a href="https://docs.ray.io/en/latest/ray-air/getting-started.html" target="_blank">Ray AI Runtime (AIR)</a></strong> is an open-source, Python, domain specific library that equip ML engineers, data scientists, and researchers with a scalable and unified toolkit for ML applications.
</div>

Ray AIR is built on top of Ray core. It caters for distributed data processing, model training, tuning, model serving, and reinforcement learning, all in Python. To that end it enables both individual workloads and end-to-end use cases to be implemented in the single unified library.

Ray AIR brings together an ever-growing ecosystem of integrations with your favorite machine learning frameworks.

|<img src="../_static/assets/Introduction_to_Ray_AIR/e2e_air.png" width="70%" loading="lazy">|
|:--|
|Ray AIR enables end-to-end ML development and provides multiple options to integrate with other tools and libraries form the MLOps ecosystem.|

Each of the five native libraries that Ray AIR wraps is focused on a specific ML task. Because this abstraction layer is built on top of Ray Core, it is distributed by nature.

1. [Ray Data](https://docs.ray.io/en/latest/data/dataset.html): scalable, framework-agnostic loading and transforming raw data across training and prediction
1. [Ray Train](https://docs.ray.io/en/latest/train/train.html): distributed multi-node model training with fault tolerance that integrates with your favorite training libraries
1. [Ray Tune](https://docs.ray.io/en/latest/tune/index.html): scales experiment execution and hyperparameter tuning to optimize model performance
1. [Ray Serve](https://docs.ray.io/en/latest/serve/index.html): deploys your model for online inference, with optional microbatching to improve performance
1. [Ray RLlib](https://docs.ray.io/en/latest/rllib/index.html): distributed reinforcement learning workloads that integrate with the other Ray AIR libraries above

#### Integrations and ecosystem libraries

Ray integrates with a [growing ecosystem](https://docs.ray.io/en/latest/ray-overview/ray-libraries.html) of the most popular Python and machine learning libraries and frameworks that you may already be familiar with. Instead of trying to create new standards, Ray allows you to scale existing workloads by unifying tools in a common interface. This interface enables you to run ML tasks in a distributed way, a property most of the respective backends don't have, or not to the same extent.

For example, Ray Datasets is backed by Arrow and comes with many integrations to other frameworks, such as Spark and Dask. Ray Train and RLlib are backed by the full power of Tensorflow and PyTorch. Ray Tune supports algorithms from practically every noteable HPO tool available, including Hyperopt, Optuna, Nevergrad, Ax, SigOpt, and many others. Ray Serve can be used with frameworks such as FastAPI, gradio, and Streamlit.

### Ray use cases

Business logic and model inference have traditionally been [handled by different systems](https://www.anyscale.com/blog/heres-what-you-need-to-look-for-in-a-model-server-to-build-ml-powered-services). Ray breaks down silos by supporting both of these workloads seamlessly, allowing developers to build and scale microservices as if they were a single Python application. Here, we will showcase some use cases and how companies use Ray for a wide range of applications.

#### ML practitioner use cases

- **Data Shuffling**: In Ray 2.0, [**Exoshuffle**](https://cs.paperswithcode.com/paper/exoshuffle-large-scale-shuffle-at-the) is integrated with the Ray Data library to provide an application level shuffle system that [outperforms Spark and achieves 82% of theoretical performance on a 100TB sort on 100 nodes](https://www.anyscale.com/ray-summit-2022/agenda/sessions/220).
- **Feature Engineering**: [**Hamilton**](https://github.com/stitchfix/hamilton), [**StitchFix's**](https://www.anyscale.com/ray-summit-2022/agenda/sessions/115) open source dataflow micro-framework, manages the complexities of feature engineering in time series forecasting. Its integration with Ray provides an out-of-the-box experience
- **Scaling Model Training**: [**Alpa**](https://ai.googleblog.com/2022/05/alpa-automated-model-parallel-deep.html) is a [Ray-native library](https://www.anyscale.com/ray-summit-2022/agenda/sessions/170) that automatically partitions, schedules, and executes the training and serving computation of very large deep learning models on hundreds of GPUs.
- **Reinforcement Learning**: Ray is powering reinforcement learning applications for [**FIFA World Cup Qatar 2022**](https://www.anyscale.com/ray-summit-2022/agenda/sessions/177) including optimizing the flow of millions of fans, managing vehicle traffic, and modeling congestion scenarios.
- **Multi-Model Composition**: [Ray Serve](https://docs.ray.io/en/latest/serve/index.html) supports [complex deployment patterns](https://www.anyscale.com/ray-summit-2022/agenda/sessions/224) requiring the orchestration of multiple Ray actors, where different actors provide inference for different models. It handles both batch and online inference (scoring) and can scale to thousands of models in production.

#### ML platform engineer use cases

- **Shopify**: [Merlin](https://shopify.engineering/merlin-shopify-machine-learning-platform), Shopify's ML platform built on Ray, enables fast-iteration and [scaling of distributed applications](https://www.anyscale.com/ray-summit-2022/agenda/sessions/171) such as product categorization and recommendations
- **Spotify**: Spotify [uses Ray for advanced applications](https://www.anyscale.com/ray-summit-2022/agenda/sessions/180)] that include personalizing content recommendations for home podcasts and personalizing Spotify Radio track sequencing
- **Intel**: Intel's BigDL 2.0 makes it easy for developers to build [end-to-end distributed AI applications on Ray](https://www.anyscale.com/ray-summit-2022/agenda/sessions/174) and seamlessly scale AI pipelines
- **Riot Games**: Riot Games reinforcement learning platform built on Ray, [drives key AI applications](https://www.anyscale.com/ray-summit-2022/agenda/sessions/148) that builds bots to play their games at various skill levels to provide additional signals to their designers to deliver the best experiences for players


### Part 1: Summary

|<img src="../_static/assets/Overview_of_Ray/map_layers_only.png" width="70%" loading="lazy">|
|:--|
|Stack of Ray libraries - unified toolkit for ML workloads.|

You just learned about:

* what is Ray?
* key Ray characteristics
* three layers of the Ray libraries: Core, native libraries, and integrations and ecosystem
* example Ray use cases and workloads

#### Concepts

Let's revisit key concepts introduced in Part 1:

<div class="alert alert-info">
  <strong><a href="https://www.ray.io/" target="_blank">Ray</a></strong> is an open-source unified compute framework that makes it easy to scale AI and Python workloads.
</div>

<div class="alert alert-info">
  <strong><a href="https://docs.ray.io/en/latest/cluster/getting-started.html" target="_blank">Ray cluster</a></strong> is a set of worker nodes connected to a common Ray head node. Ray clusters can be fixed-size, or they may autoscale up and down according to the resources requested by applications running on the cluster.
</div>

<div class="alert alert-info">
  <strong><a href="https://docs.ray.io/en/latest/ray-core/walkthrough.html" target="_blank">Ray Core</a></strong> is an open-source, Python, general purpose, distributed computing library that enables ML engineers and Python developers to scale Python applications and accelerate machine learning workloads.
</div>

<div class="alert alert-info">
  <strong><a href="https://docs.ray.io/en/latest/ray-air/getting-started.html" target="_blank">Ray AI Runtime (AIR)</a></strong> is an open-source, Python, domain specific library that equip ML engineers, data scientists, and researchers with a scalable and unified toolkit for ML applications.
</div>

#### APIs and technical abstractions

Let's revisit Ray Core three abstractions that enables parrallel computations:

1. [Tasks](https://docs.ray.io/en/latest/ray-core/key-concepts.html#tasks): remote, stateless Python functions
1. [Actors](https://docs.ray.io/en/latest/ray-core/key-concepts.html#actors): remote stateful Python classes
1. [Objects](https://docs.ray.io/en/latest/ray-core/key-concepts.html#objects): in-memory, immutable objects or value that can be accessed anywhere in the computing cluster

#### What's next?

You will run an illustrative code example that will give you better "feel" of Ray.

## Part 2: Code example - housing Prices with scikit-learn

### Goal

Let's take a look at simple example about how to can use Ray Core to scale a bare bones version of a common ML task: regression on the structured data.

### Data

Here, we have a dataset of [California Housing](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_california_housing.html) with 20,640 samples and features including `[longitude, latitude, housing_median_age, total_rooms, total_bedrooms, population, households, median_income, median_house_value, ocean_proximity]`. Given that we want to use a linear regression model, we want to assess how the results will generalize to an unseen independent dataset, say, on new housing data coming in this year. To do this, we would try cross-validation which is a model validation technique that resamples different portions of the data to train and test a model on different iterations. After we conduct these trials, we can average the error to get an estimate of the model's predictive performance.

Samples total
20640

Dimensionality
8

Features
real

Target 
real 0.15 - 5.

### Model

<ToDo - very short info that we train linear regression \>


<ToDo - add exercise - replace linreg model with other from scikit-learn (SVM[?] -> this is meant to be straightforward task and give people intuiton that they can really work as before and add parrallelism) \>

### Sequential vs. parallel execution

However, training the same model multiple times on different subsets of a dataset can take a long time, especially if you're working with a much more complex model and larger dataset. Pictured below in Figure 3, the sequential approach trains each model one after another in a series.


<img src="../_static/assets/Overview_of_Ray/sequential_timeline.png" width="70%" loading="lazy">

In this example, we will first implement the sequential approach, then improve it by distributing training with Ray Core, and finally compare the code differences to highlight how minimal the changes are.

<ToDo, all block in this sequence in this sequence - modify to: title - code - elucidations template>

### Preliminaries

In [None]:
# Imports
import pandas as pd
from sklearn import metrics
from sklearn.datasets import fetch_california_housing
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

In [None]:
# Prepare dataset
X, y = fetch_california_housing(return_X_y=True, as_frame=True)

In [None]:
num_models = 100  # set numbers of models to train

### Sequential implementation

#### Train 100 Models Sequentially
Here, we will define a function that randomly splits our housing dataset into testing and training subsets (in the style of Monte Carlo Cross-Validation, where subsets are generated without replacement and have non-unique subsets from round to round). `sequential()` then fits a model, generates predictions, and returns the R-squared score (closer to 1 = better performance, closer to 0 = worse performance).

Then, we'll train 100 models on these random splits, one after another, and finally print out the average of the rounds.

In [None]:
%%time

def sequential():
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=201)
    model = LinearRegression()
    model.fit(X_train, y_train)
    predictions = model.predict(X_test)

    return metrics.r2_score(y_test, predictions)

errors_seq = []

for i in range(num_models):
    errors_seq.append(sequential())

#### From Sequential -> Parallel

We just trained 100 linear regression models in a series and averaged their R-squared values in about ~1 second. Let's now leverage Ray to train these models in parallel (where multiple tasks may happen simultaneously) and see a runtime improvement. In Figure 4, you can visually inspect the difference where the scheduler assigns each available worker (in this timeline, we chose `n = 4` workers) a task. The scheduler itself has a nontrivial overhead involved with communicating between workers and other cluster management.

<img src="../_static/assets/Overview_of_Ray/distributed_timeline.png" width="70%" loading="lazy">

*Figure 4*

With just a few code changes, we will modify our existing Python program to distribute it among *n* number of workers. Of course, this is a lightweight example, but it's illustrative of the kind of user experience you get with Ray Core's lean API.

Notice in Figure 5 that we need to use four API calls:

1. `ray.init()` - initialize a Ray context
2. `@ray.remote` - a decorator that turns functions into tasks and classes into actors
3. `.remote()` - postfix to every remote function, remote class declaration, or invocation of a remote class method; returns an `ObjectRef` associated with the work to be done
4. `ray.get()` - returns an object or list of objects from the object reference

You may notice that instead of storing the result of `train.remote()` directly into a list of `errors`, we instead store it in a list called `obj_refs`. Once you run a Ray remote function, it will immediately return an `ObjectRef` (or 'Object Reference'). This `ObjectRef` is a *promise* of future work, meaning that the task is delegated to a worker, an `ObjectRef` is returned while the task executes in the background, and in order to access the expected output, you need to call `ray.get()` on the `ObjectRef`, which is a blocking call.

<img src="../_static/assets/Overview_of_Ray/housing_diff.png" width="70%" loading="lazy">

*Figure 5*

And with just a few lines of difference, we're able to parallelize training without having to concern ourselves with orchestration, fault tolerance, autoscaling, or anything else that requires specialized knowledge of distributed systems.

#### Train 100 Models in Parallel with Ray
To start, we'll import Ray (check out our [installation instructions](https://docs.ray.io/en/latest/ray-overview/installation.html)) and start a Ray cluster on our local machine that can utilize all the cores available on your computer as workers. We use `ray.is_initialized` to allow us to make sure that we only have one Ray cluster active.

In [None]:
import ray

if ray.is_initialized:
    ray.shutdown()

ray.init()

As illustrated above in Figure 5, we will:
1. Add the decorator `@ray.remote` to our function `distributed()` to specify that it is a task to be run remotely. 
2. Then we call that function as `distributed.remote()` in the `for` loop to append to our list of object references. 
3. Finally, we fetch the result outside of the loop to access the final error list (as to not *block* the launching of remote tasks asynchronously) and print out the average R-squared value.

In [None]:
%%time

@ray.remote
def distributed():
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

    model = LinearRegression()

    model.fit(X_train, y_train)
    predictions = model.predict(X_test)
    r2 = metrics.r2_score(y_test, predictions)

    return r2

obj_refs = []

for i in range (num_trials):
    obj_refs.append(distributed.remote())

errors_dist = ray.get(obj_refs)

print(sum(errors_dist) / num_trials)

And now you've done it! You have distributed the training of 100 models in a very through round of cross-validation on our California Housing dataset. Compare the runtime for each method of training 100 models. Is this what you expected?

### Summary
1. Introduced to Ray Core and Most Popular Workloads
2. Key Concepts of Ray Core
3. Sequential -> Distributed Training of 100 Models

#### [Key Concepts](https://docs.ray.io/en/latest/ray-core/key-concepts.html)
- Tasks
- Actors
- Objects

#### [Key API Elements in This Section](https://docs.ray.io/en/latest/ray-core/package-ref.html#python-api)
- `ray.init()`
- `@ray.remote`
- `.remote()`
- `ray.get()`
- `ray.put()`

#### Next
Now that we've covered the core engine, let's go up one layer of abstraction to look at a suite of data science libraries build on top of Ray Core to target specific machine learning workloads in the next notebook!

# Homework
---
If you would like to practice your new skills further with some in-depth examples beyond the embedded coding excercises, take a look at this list of suggested problems:

- [Look at More Ray Core Examples](https://docs.ray.io/en/latest/ray-core/examples/overview.html)
    - Walk through applied examples of using Ray Core with common machine learning workloads.
- [Read About Debugging and Profiling on Ray](https://docs.ray.io/en/latest/ray-core/troubleshooting.html)
    - Dig into how to observe Ray work by visualizing tasks in the Ray timeline, profiling using Python's CProfile, understanding crashes and suboptimal performance, and more in this user guide.
- [Distribute a Classical Algorithm with Ray](https://github.com/ray-project/hackathon5-algo)
    - In this excercise, go to the GitHub repo linked above for details on choosing a classic algorithm implemented in Python, editing the implementation to parallelize the work with Ray, and compare your results against the sequential implementation.


# Next Steps
---
Congratulations! You have completed your first tutorial on an Introduction to Ray and Ray Core! We introduced the three layers of Ray: Core, AIR, and the Ecosystem. In this notebook, we explored Ray Core's key concepts of tasks, actors, and objects along with key API elements through examples. In the next module, we will talk about Ray AI Runtime, a set of native libraries built on top of Ray Core specialized for machine learning workloads.

From here, you can learn and get more involved with our active community of developers and researchers by checking out the following resources:
- [Ray's "Getting Started" Guides](https://docs.ray.io/en/latest/ray-overview/index.html): A collection of QuickStart Guides for every library including installation walkthrough, examples, blogs, talks, and more!
- [Official Ray Website](https://www.ray.io/): Browse the ecosystem and use this site as a hub to get the information that you need to get going and building with Ray.
- [Join the Community on Slack](https://forms.gle/9TSdDYUgxYs8SA9e8): Find friends to discuss your new learnings in our Slack space.
- [Use the Discussion Board](https://discuss.ray.io/): Ask questions, follow topics, and view announcements on this community forum.
- [Join a Meetup Group](https://www.meetup.com/Bay-Area-Ray-Meetup/): Tune in on meet-ups to listen to compelling talks, get to know other users, and meet the team behind Ray.
- [Open an Issue](https://github.com/ray-project/ray/issues/new/choose): Ray is constantly evolving to improve developer experience. Submit feature requests, bug-reports, and get help via GitHub issues.