# A Guided Tour of Ray Core: Remote Stateful Classes

Apopted from Anyscal unde Apache 2.0


In [None]:
print('NOTE: Intentionally crashing session to use the newly installed library.\n')

!pip uninstall -y pyarrow
!pip install ray[default]

# A hack to force the runtime to restart, needed to include the above dependencies.
import os
os._exit(0)

NOTE: Intentionally crashing session to use the newly installed library.

Collecting aiohttp-cors (from ray[default])
  Downloading aiohttp_cors-0.7.0-py3-none-any.whl (27 kB)
Collecting colorful (from ray[default])
  Downloading colorful-0.5.6-py2.py3-none-any.whl (201 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m201.4/201.4 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting py-spy>=0.2.0 (from ray[default])
  Downloading py_spy-0.3.14-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl (3.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.0/3.0 MB[0m [31m10.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting opencensus (from ray[default])
  Downloading opencensus-0.11.4-py2.py3-none-any.whl (128 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m128.2/128.2 kB[0m [31m15.4 MB/s[0m eta [36m0:00:00[0m
Collecting virtualenv!=20.21.1,>=20.0.24 (from ray[default])
  Downloading virtualenv-20.26.2-py3-none-any.

[*Remote Classes*](https://docs.ray.io/en/latest/walkthrough.html#remote-classes-actors)
involve using a `@ray.remote` decorator on a class.

This implements an [*actor*](https://patterns.eecs.berkeley.edu/?page_id=258) pattern, with properties: *stateful*, *message-passing semantics*

Actors are extremely powerful. They allow you to take a Python class and instantiate it as a stateful microservice that can be queried from other actors and tasks and even other Python applications.

When you instantiate a remote Actor, a separate worker process is created as a worker process and becomes an Actor process on the workder node, for the purpose of running methods called on the actor. Other Ray tasks and actors can invoke its methods on that process, mutating its internal state. Actors can also be terminated manually if needed. The examples code below show all these cases.

<img src="https://github.com/anyscale/academy/blob/main/images/ray_worker_actor_1.png?raw=1" height="30%" width="60%">
<img src="https://github.com/anyscale/academy/blob/main/images/ray_worker_actor_2.png?raw=1" height="30%" width="60%">

---

First, let's start Ray…

In [1]:
import logging
import time
import ray
import random
from random import randint
import numpy as np

In [2]:
context=ray.init()


2024-06-21 07:31:03,740	INFO worker.py:1761 -- Started a local Ray instance. View the dashboard at [1m[32m127.0.0.1:8265 [39m[22m


In [3]:
print(context.dashboard_url  )

127.0.0.1:8265


## 3. Remote Class as a Stateful Actor Pattern

To start, we'll define a class and use the decorator: `@ray.remote`

Let's use Python class and convert that to a remote Actor class actor service as a Parameter Server.
This is a common example in machine learning where you may have a central Parameter server updating gradients
from other worker processes computing individual gradients.

<img src="https://terrytangyuan.github.io/img/inblog/mpi-operator-1.png" width="40%" height="20%">

In [4]:
@ray.remote
class ParameterSever:
    def __init__(self):
        # Initialized our gradients to zero
        self.params = np.zeros(10)

    def get_params(self):
        # Return current gradients
        return self.params

    def update_params(self, grad):
        # Update the gradients
        self.params -= grad

Define work or task as a function for a remote Worker process. This could be a machine learning task that
computes gradients and sends them to the parameter server

In [5]:
@ray.remote
def worker(ps):
    # Iterate over some epoch
    for i in range(100):
        time.sleep(1.5)  # this could be your task computing gradients
        grad = np.ones(10)
        # update the gradients in the parameter server
        ps.update_params.remote(grad)

Start our Parameter Server actor. This will be scheduled as a process on a remote Ray Worker

In [6]:
param_server = ParameterSever.remote()
param_server

Actor(ParameterSever, 02f783ac20185e4060f4853c01000000)

Let's get the initial values of the parameter server

In [7]:
print(f"Initial params: {ray.get(param_server.get_params.remote())}")

Initial params: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]


### Create Workers Nodes Computing Gradients
Let's create three separate workers as our machine learning tasks that compute gradients.
These will be scheduled as tasks on a Ray cluster.

You can use list comprehension. Quite Pythonic!

If we need more workers to scale, we can always bump them up.

In [8]:
[worker.remote(param_server) for _ in range(3)]

[ObjectRef(c2668a65bda616c1ffffffffffffffffffffffff0100000001000000),
 ObjectRef(32d950ec0ccf9d2affffffffffffffffffffffff0100000001000000),
 ObjectRef(e0dc174c83599034ffffffffffffffffffffffff0100000001000000)]

Now, let's iterate over a loop and query the Parameter Server
as the workers are running independently and updating the gradients

In [9]:
for _i in range(20):
    print(f"Updated params: {ray.get(param_server.get_params.remote())}")
    time.sleep(1)

Updated params: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
Updated params: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
Updated params: [-2. -2. -2. -2. -2. -2. -2. -2. -2. -2.]
Updated params: [-4. -4. -4. -4. -4. -4. -4. -4. -4. -4.]
Updated params: [-4. -4. -4. -4. -4. -4. -4. -4. -4. -4.]
Updated params: [-6. -6. -6. -6. -6. -6. -6. -6. -6. -6.]
Updated params: [-8. -8. -8. -8. -8. -8. -8. -8. -8. -8.]
Updated params: [-8. -8. -8. -8. -8. -8. -8. -8. -8. -8.]
Updated params: [-10. -10. -10. -10. -10. -10. -10. -10. -10. -10.]
Updated params: [-12. -12. -12. -12. -12. -12. -12. -12. -12. -12.]
Updated params: [-12. -12. -12. -12. -12. -12. -12. -12. -12. -12.]
Updated params: [-14. -14. -14. -14. -14. -14. -14. -14. -14. -14.]
Updated params: [-16. -16. -16. -16. -16. -16. -16. -16. -16. -16.]
Updated params: [-16. -16. -16. -16. -16. -16. -16. -16. -16. -16.]
Updated params: [-18. -18. -18. -18. -18. -18. -18. -18. -18. -18.]
Updated params: [-20. -20. -20. -20. -20. -20. -20. -20. -20. -20.]
Updated para

# Tree of Actors Pattern

A common pattern used in Ray libraries [Ray Tune](https://docs.ray.io/en/latest/tune/index.html), [Ray Train](https://docs.ray.io/en/latest/train/train.html), and [RLlib](https://docs.ray.io/en/latest/rllib/index.html) to train models in a parallel or conduct distributed HPO.

In this common pattern, tree of actors, a collection of workers as actors, are managed by a supervisor. For example, you want to train multiple models at the same time, while being able to checkpoint/inspect its state.

<img src="https://docs.ray.io/en/latest/_images/tree-of-actors.svg" width="40%" height="20%">

Let's implement a simple example to illustrate this pattern.

In [16]:
STATES = ["RUNNING", "DONE"]

class Model:

    def __init__(self, m:str):
        self._model = m

    def train(self):
        # do some training work here
        time.sleep(1)

# Factory function to return an instance of a model type
def model_factory(m: str):
    return Model(m)

### Create a Worker Actor

In [17]:
@ray.remote
class Worker(object):
    def __init__(self, m:str):
        # type of a model: lr, cl, or nn
        self._model = m

    def state(self) -> str:
        return random.choice(STATES)
    # Do the work for this model
    def work(self) -> None:
        model_factory(self._model).train()

### Create Supervisor Actor

In [18]:
@ray.remote
class Supervisor:
    def __init__(self):
        # Create three Actor Workers, each by its unique model type
        self.workers = [Worker.remote(name) for name in ["lr", "cl", "nn"]]

    def work(self):
        # do the work
        [w.work.remote() for w in self.workers]

    def terminate(self):
        [ray.kill(w) for w in self.workers]

    def state(self):
        return ray.get([w.state.remote() for w in self.workers])

Create a Actor instance for supervisor and launch its workers

In [19]:
sup = Supervisor.remote()

# Launch remote actors as workers
sup.work.remote()

ObjectRef(347cc60e0bb3da74277ea84176b6f12456c2da1c0100000001000000)

### Look at the Ray Dashboard -

In [20]:
from IPython.core.magic import register_line_magic
import IPython
import subprocess

#write juopyter inline function to show website in iframe
@register_line_magic
def run_dashboard(line):
    handle = IPython.display.display(
            IPython.display.Pretty("Launching my server..."),
            display_id=True,
    )
    #subprocess.Popen(['mlflow', 'ui', '--port', '50000'])
    shell = """
        (async () => {
            const url = new URL(await google.colab.kernel.proxyPort(8265, {'cache': true}));
            const iframe = document.createElement('iframe');
            iframe.src = url;
            iframe.setAttribute('width', '100%');
            iframe.setAttribute('height', '750');
            iframe.setAttribute('frameborder', 0);
            document.body.appendChild(iframe);
        })();
    """
    script = IPython.display.Javascript(shell)
    handle.update(script)

In [21]:
%run_dashboard

<IPython.core.display.Javascript object>

In [22]:
# check their status
while True:
    # Fetch the states of all its workers
    states = ray.get(sup.state.remote())
    print(states)
    # check if all are DONE
    result = all('DONE' == e for e in states)
    if result:
        # Note: Actor processes will be terminated automatically when the initial actor handle goes out of scope in Python.
        # If we create an actor with actor_handle = ActorClass.remote(), then when actor_handle goes out of scope and is destructed,
        # the actor process will be terminated. Note that this only applies to the original actor handle created for the actor
        # and not to subsequent actor handles created by passing the actor handle to other tasks.

        # kill supervisors all worker manually, only for illustrtation and demo
        sup.terminate.remote()

        # kill the supervisor manually, only for illustration and demo
        ray.kill(sup)
        break

['RUNNING', 'DONE', 'DONE']
['DONE', 'RUNNING', 'RUNNING']
['DONE', 'RUNNING', 'RUNNING']
['DONE', 'DONE', 'DONE']


### Passing Actor handles to Ray Tasks

You can pass actor handle instances to remote Ray tasks, which can change its
state. The `MessageActor` keeps or clears messages, depending on the its method
invoked.

In [23]:
@ray.remote
class MessageActor(object):
    def __init__(self):
        # Keep the state of the messages
        self.messages = []

    def add_message(self, message):
        self.messages.append(message)

    # reset and clear all messages
    def get_and_clear_messages(self):
        messages = self.messages
        self.messages = []
        return messages

Define a remote function which loops around and pushes messages to the actor, having access to a handle instance as an argument. That is, we are sending it a `MessageActor` instance handle ref as an argument to it.

In [24]:
@ray.remote
def worker(message_actor, j):
    for i in range(10):
        time.sleep(1)
        message_actor.add_message.remote(
            f"Message {i} from worker {j}.")


Create a message actor.

In [25]:
message_actor = MessageActor.remote()

Start 3 tasks that push messages to the actor.

In [26]:
[worker.remote(message_actor, j) for j in range(3)]

[ObjectRef(ae46b8beecd25f3affffffffffffffffffffffff0100000001000000),
 ObjectRef(aa3d5d11e415fe88ffffffffffffffffffffffff0100000001000000),
 ObjectRef(a6d6d59239756144ffffffffffffffffffffffff0100000001000000)]

Periodically get the messages and print them.

In [27]:
for _ in range(10):
    new_messages = ray.get(message_actor.get_and_clear_messages.remote())
    print("New messages\n:", new_messages)
    time.sleep(1)

New messages
: []
New messages
: []
New messages
: []
New messages
: []
New messages
: []
New messages
: []
New messages
: []
New messages
: []
New messages
: []
New messages
: []


Finally, shutdown Ray

In [28]:
ray.shutdown()

---
## References

 * [Writing your First Distributed Python Application with Ray](https://www.anyscale.com/blog/writing-your-first-distributed-python-application-with-ray)
 * [Using and Programming with Actors](https://docs.ray.io/en/latest/actors.html)
 * [Advanced Patterns and Anti-Patterns in Ray](https://docs.ray.io/en/latest/ray-design-patterns/index.htmlhttps://docs.ray.io/en/latest/ray-design-patterns/index.html)