In [1]:
import ray
import time 

### Comparing Regular Python Functions and Ray Remote Tasks

This section demonstrates the difference between a standard Python function and a Ray remote task:

- **`sum_of_squares(n)`**: A regular Python function that computes the sum of squares from 0 to `n`.
- **`ray_sum_of_squares(n)`**: The same logic, but decorated with `@ray.remote` to enable distributed execution on a Ray cluster.
- **`inputs`**: A sample list of values for testing both versions, highlighting Ray's ability to parallelize compute-intensive workloads easily.

In [2]:
# Normal python function
def sum_of_squares(n):
    total = sum(i*i for i in range(n+1))
    return  total

# Python function converted to a Ray Task
@ray.remote
def ray_sum_of_squares(n):
    total = sum(i*i for i in range(n+1))
    return  total

inputs = [ 3,10_000_000,20_000_000,30_000_000,40_000_000]

### Sequential Execution
Executes the computation for all input values sequentially in a single process.


In [3]:
%%timeit
# execute the normal python function
[sum_of_squares(n) for n in inputs]

8.53 s ± 80.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [4]:
# initialising Ray
if ray.is_initialized():
    ray.shutdown()
ray.init()

2025-08-03 12:02:27,614	INFO worker.py:1747 -- Connecting to existing Ray cluster at address: 100.106.5.102:6379...
2025-08-03 12:02:27,625	INFO worker.py:1918 -- Connected to Ray cluster. View the dashboard at [1m[32mhttps://session-xsclvf1y3h8ri22vxrxzy7b516.i.anyscaleuserdata.com [39m[22m
2025-08-03 12:02:27,643	INFO packaging.py:380 -- Pushing file package 'gcs://_ray_pkg_93a0d969393edb59acbf38635634e1fc17d0e385.zip' (3.65MiB) to Ray cluster...
2025-08-03 12:02:27,659	INFO packaging.py:393 -- Successfully pushed file package 'gcs://_ray_pkg_93a0d969393edb59acbf38635634e1fc17d0e385.zip'.


0,1
Python version:,3.12.11
Ray version:,2.48.0
Dashboard:,http://session-xsclvf1y3h8ri22vxrxzy7b516.i.anyscaleuserdata.com


### Executing Ray Task

When you call a Ray remote function (e.g., `ray_sum_of_squares.remote(n)`), Ray **immediately schedules** the task for execution and returns a *future* (an object reference or "promise") representing the result.  
However, the actual result is **not retrieved or transferred back** until you call `ray.get(futures)`.  

In [5]:
%%timeit
# this function is now ready to be executed in a distributed setting
futures = [ray_sum_of_squares.remote(input) for input in inputs]
ray.get(futures)

3.71 s ± 70 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


## Regular Python Class vs. Ray Actor

- **`SumCalculator`**: A normal Python class with a method to compute the sum of squares.
- **`RaySumCalculator`**: The same class, but decorated with `@ray.remote` to become a Ray Actor.
    - Ray Actors are stateful workers that can execute methods remotely and maintain internal state across method calls.
    - Actors enable parallel and persistent computations in a distributed system.

In [6]:
# Normal Python Class 
class SumCalculator:
    def sum_of_squares(self, n):
        total = sum(i * i for i in range(n + 1))
        return total

@ray.remote
class RaySumCalculator:
    def sum_of_squares(self, n):
        total = sum(i * i for i in range(n + 1))
        return total

In [7]:
%%timeit
sumclass = SumCalculator()
[sumclass.sum_of_squares(n) for n in inputs]

8.66 s ± 79.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [8]:
# initialising Ray
if ray.is_initialized():
    ray.shutdown()
ray.init()

2025-08-03 12:04:10,913	INFO worker.py:1747 -- Connecting to existing Ray cluster at address: 100.106.5.102:6379...
2025-08-03 12:04:10,920	INFO worker.py:1918 -- Connected to Ray cluster. View the dashboard at [1m[32mhttps://session-xsclvf1y3h8ri22vxrxzy7b516.i.anyscaleuserdata.com [39m[22m
2025-08-03 12:04:10,929	INFO packaging.py:380 -- Pushing file package 'gcs://_ray_pkg_b5847f8869067638fe5701a08eea03242f1a9911.zip' (3.65MiB) to Ray cluster...
2025-08-03 12:04:10,943	INFO packaging.py:393 -- Successfully pushed file package 'gcs://_ray_pkg_b5847f8869067638fe5701a08eea03242f1a9911.zip'.


0,1
Python version:,3.12.11
Ray version:,2.48.0
Dashboard:,http://session-xsclvf1y3h8ri22vxrxzy7b516.i.anyscaleuserdata.com


In [9]:
%%timeit
# here we are running multiple actors (ie processes)
raysumclass = [RaySumCalculator.remote() for _ in inputs]
ray.get([c.sum_of_squares.remote(n) for c,n in zip(raysumclass,inputs) ])

5.21 s ± 95.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


[36m(autoscaler +3m59s)[0m Tip: use `ray status` to view detailed cluster status. To disable these messages, set RAY_SCHEDULER_EVENTS=0.
[36m(autoscaler +3m59s)[0m [autoscaler] [1xT4:8CPU-32GB] Attempting to add 1 node to the cluster (increasing from 1 to 2).
[36m(autoscaler +4m4s)[0m [autoscaler] [1xT4:8CPU-32GB|g4dn.2xlarge] [us-east-2b] [on-demand] Launched 1 instance.


In the above cell, we launch multiple Ray Actors—each as its own Python process—to run `sum_of_squares` computations in parallel:

- Each instance of `RaySumCalculator.remote()` starts a separate Python worker process (an "actor") on the Ray cluster.
- By default, Ray uses processes (not threads) for actors to bypass Python’s **Global Interpreter Lock (GIL)**, which prevents true parallel execution of Python code within a single process.
- Spinning up multiple actors allows Ray to execute tasks simultaneously across CPU cores, fully utilizing multicore machines and distributed environments.

![image](../marimo_notebooks/resources/thread-processes.jpeg)
