# Introduction to Ray Core: Getting Started

© 2025, Anyscale. All Rights Reserved

💻 **Launch Locally**: You can run this notebook locally.

🚀 **Launch on Cloud**: Think about running this notebook on a Ray Cluster (Click [here](http://console.anyscale.com/register) to easily start a Ray cluster on Anyscale)

This notebook provides a step-by-step quick tour of Ray Core basics.

<div class="alert alert-block alert-info">

<b> Here is the roadmap for this notebook </b>

<ol>
  <li>Overview</li>
  <li>Creating Remote Functions</li>
  <li>Executing Remote Functions</li>
  <li>Getting Results</li>
  <li>Putting It All Together
    <ul>
      <li>Note about Ray ID Specification</li>
      <li>Anti-pattern: Calling ray.get in a loop harms parallelism</li>
    </ul>
  </li>
</ol>
</div>

**Imports**

In [None]:
import os
import random
import sys
import time

import numpy as np
import ray

## 0. Overview

<div class="alert alert-info">
  <strong><a href="https://docs.ray.io/en/latest/ray-core/walkthrough.html" target="_blank">Ray Core</a></strong> is an open-source, Python, general purpose, distributed computing library that enables Engineers to scale Python apps.
</div>

Ray Core is about:
* distributing computation across many cores, nodes, or devices (e.g., accelerators)
* scheduling *arbitrary task graphs*
    * any code you can write, you can distribute, scale, and accelerate with Ray Core
* managing the overhead
    * At scale, distributed computation introduces growing "frictions":
        * data-specific overhead: serialization/deserialization, transfer costs.
        * scheduling overhead: managing the queue of tasks to run, deciding where to run them.
        * system-specific overhead: garbage collection, memory management, etc.
    * Ray Core addresses these issues as first-order concerns in its design via:
        * a distributed scheduler
        * distributed memory
        * distributed reference counting
 
For common technical use cases, Ray libraries and other components provide simple development experience and are built on top of Ray Core.

## 1. Creating Remote Functions

The first step in using Ray is to create remote functions. A remote function is a regular Python function that can be executed on any process in your cluster.

Given a simple Python function:

In [None]:
def add(a, b):
    return a + b

add

Decorate the function with @ray.remote to turn it into a remote function.

In [None]:
@ray.remote
def remote_add(a, b):
    return a + b

remote_add

<div class="alert alert-info">
  <strong><a href="https://docs.ray.io/en/latest/ray-core/key-concepts.html#tasks" target="_blank">Tasks</a></strong> is a remote, stateless Python function invocation.
</div>


## 2. Executing Remote Functions

Native python functions are invoked by calling them

In [None]:
add(1, 2)

Remote ray functions are executed as tasks by calling them with `.remote()` suffix

In [None]:
remote_add.remote(1, 2)

Here is what happens when you call `{remote_function}.remote`:
1. Ray schedules the function execution as a task in a separate process in the cluster
2. Ray returns an `ObjectRef` (a reference to the future result) to you **immediately** 
3. The cluster executes the actual computation in the background


In [None]:
ref = remote_add.remote(1, 2)
ref

## 3. Getting Results

If we want to wait (block) and retrieve the corresponding object, we can use `ray.get`

In [None]:
ray.get(ref)

## 4. Putting It All Together

Here are the three steps:
1. Create the remote function
2. Execute it remotely
3. Get the result when needed


<div class="alert alert-block alert-info">
    
__Activity: define and invoke a Ray task__

Define a remote function `sqrt_add` that accepts two arguments and performs the following steps:
1. computes the square-root of the first
2. adds the second
3. returns the result

Execute it with 2 different sets of parameters and collect the results

```python
# Hint: define the below as a remote function
def sqrt_add(a, b):
    ... 

# Hint: invoke it as a remote task and collect the results
```


</div>

In [None]:
# Write your solution here

<div class="alert alert-block alert-info">

<details>

<summary> Click to see solution </summary>

```python
import math

@ray.remote
def sqrt_add(a, b):
    return math.sqrt(a) + b

ray.get([sqrt_add.remote(2, 3), sqrt_add.remote(5, 4)])
```

</details>

</div>


### 4.1. Note about Ray ID Specification

IDs for tasks and objects are build according to the [ID specification in Ray](https://github.com/ray-project/ray/blob/master/src/ray/design_docs/id_specification.md).

### 4.2. Anti-pattern: Calling ray.get in a loop harms parallelism

|<img src="https://assets-training.s3.us-west-2.amazonaws.com/ray-core/ray-core/ray-get-in-a-loop.png" width="70%" loading="lazy">|
|:--|
|ray.get() is a blocking call. Avoid calling it on every item (left panel). Calling only on the final result improves performance (right panel).|

When trying to collect results for multiple remote function invocations (tasks), don't block and wait for each one individually. Let's consider this remote function:

In [None]:
@ray.remote
def expensive_square(x):
    time.sleep(5)
    return x**2

This implementation will block for each item in the loop:

In [None]:
results = []
for item in range(4):
    output = ray.get(expensive_square.remote(item))
    results.append(output)
results

Schedule all remote calls, which are then processed in parallel. After scheduling the work, we can then request all the results at once.

In [None]:
refs = []
for j in range(4):
    refs.append(expensive_square.remote(j))
results = ray.get(refs)
results

<div class="alert alert-info">
Read more about this <strong><a href="https://docs.ray.io/en/latest/ray-core/patterns/ray-get-loop.html" target="_blank">anti-pattern</a></strong>.
</div>

<!-- TODO: add Patterns/antipatterns based on above learnings-->
