# Studio 0: Monarch Basics - Ping Pong Tutorial

Welcome to the Lightning Studios Monarch series! This is **Studio 0**, where you'll learn the fundamentals of Monarch's Actor API through simple, hands-on examples.

## What is Monarch?

**Monarch** is Meta's distributed actor framework for building scalable, distributed applications. It makes it easy to:
- Run code across multiple processes or machines
- Coordinate distributed computations
- Build complex distributed systems with simple Python code

## What You'll Learn

In this tutorial, you'll learn:
1. **Core Concepts**: Actors, Endpoints, and Process Meshes
2. **Hello World**: Creating and calling actors
3. **Calling Patterns**: Broadcasting vs. targeting specific actors
4. **Actor Communication**: How actors talk to each other (Ping Pong!)

## Prerequisites

- Basic Python knowledge
- Understanding of `async`/`await` (we'll provide a quick refresher)
- Monarch installed (see [installation guide](https://github.com/meta-pytorch/monarch))

## Lightning Studios Learning Path

This is the **foundation** studio. After completing this, you can progress to:

- **[Studio 1: Getting Started](./studio_1_getting_started.ipynb)** - Multi-node training with Lightning
- **[Studio 2: Workspace Sync](./studio_2_workspace_sync.ipynb)** - Hot-reload configs without restarting
- **[Studio 3: Interactive Debugging](./studio_3_interactive_debugging.ipynb)** - Debug distributed systems

Let's dive in! 🚀

---

# Part 1: Core Concepts

Before we write code, let's understand the key concepts.

## What is an Actor?

Think of an **Actor** as an independent worker that:
- Has its own state (variables)
- Runs in its own process (possibly on a different machine)
- Exposes **endpoints** (methods) that can be called remotely

```
┌─────────────────┐
│  Actor Instance │
│                 │
│  State:         │
│  - rank: 0      │
│  - data: [...]  │
│                 │
│  Endpoints:     │
│  - hello()      │
│  - process()    │
└─────────────────┘
```

## What is an Endpoint?

An **Endpoint** is a method on an Actor that can be called remotely. It's marked with the `@endpoint` decorator.

```python
class MyActor(Actor):
    @endpoint
    async def my_method(self, arg):
        # This can be called remotely!
        return f"Processed {arg}"
```

## What is a Process Mesh?

A **Process Mesh** (or ProcMesh) is a collection of processes where actors can be spawned. Think of it as a cluster of workers.

```
Process Mesh (4 GPUs)
┌────────┬────────┬────────┬────────┐
│ GPU 0  │ GPU 1  │ GPU 2  │ GPU 3  │
│        │        │        │        │
│ Actor  │ Actor  │ Actor  │ Actor  │
│ Rank 0 │ Rank 1 │ Rank 2 │ Rank 3 │
└────────┴────────┴────────┴────────┘
```

## Async/Await Quick Refresher

Monarch uses Python's `async`/`await` for non-blocking operations:

```python
# Calling an endpoint
result = await actor.my_method.call("hello")  # Wait for result

# Running multiple operations in parallel
results = await asyncio.gather(
    actor.method_1.call(),
    actor.method_2.call(),
)  # Wait for both to complete
```

---

# Part 2: Hello World

Let's create our first Monarch actor!

## Import Monarch

First, import the necessary components from Monarch.

In [None]:
import asyncio
from monarch.actor import Actor, current_rank, endpoint, proc_mesh

## Define a Simple Actor

Let's create a `ToyActor` that:
- Stores its rank (unique ID)
- Has a `hello_world` endpoint that prints a message

In [None]:
NUM_ACTORS = 4


class ToyActor(Actor):
    def __init__(self):
        # Get the rank (unique ID) of this actor instance
        self.rank = current_rank().rank

    @endpoint
    async def hello_world(self, msg):
        """A simple endpoint that prints a message."""
        print(f"Actor {self.rank}: Received message '{msg}'")

### Key Points

- `Actor` base class: All Monarch actors inherit from this
- `current_rank()`: Returns information about this actor's position in the mesh
- `@endpoint`: Decorator that makes a method remotely callable
- `async def`: Endpoints are async functions

## Create a Process Mesh and Spawn Actors

Now we'll:
1. Create a process mesh with 4 processes
2. Spawn 4 instances of `ToyActor` (one per process)

In [None]:
async def create_toy_actors():
    # Create a local process mesh with 4 GPU slots
    # Note: This works even without actual GPUs!
    local_proc_mesh = proc_mesh(gpus=NUM_ACTORS)
    
    # Spawn 4 instances of ToyActor (one per GPU slot)
    # This returns a "handle" to all instances
    toy_actor = await local_proc_mesh.spawn("toy_actor", ToyActor)
    
    print(f"✓ Spawned {NUM_ACTORS} ToyActor instances")
    
    return toy_actor, local_proc_mesh

### Understanding `proc_mesh(gpus=4)`

This creates 4 processes. The parameter is called `gpus` because Monarch is often used for GPU computing, but it works fine without GPUs - it just means "4 parallel processes."

### Understanding `spawn()`

When we call `spawn("toy_actor", ToyActor)`:
- Monarch creates 4 instances of `ToyActor`
- Each runs in its own process
- Each gets a unique rank (0, 1, 2, 3)
- We get back a handle to communicate with all of them

## Call All Actors at Once

The most common pattern: broadcast a call to **all** actor instances.

In [None]:
async def call_all_actors():
    toy_actor, local_proc_mesh = await create_toy_actors()
    
    # Call hello_world on ALL actor instances
    # .call() broadcasts to all instances
    await toy_actor.hello_world.call("Hello from main!")
    
    return toy_actor, local_proc_mesh

# Run it!
toy_actor, toy_mesh = await call_all_actors()

### Expected Output

You should see output from all 4 actors:
```
Actor 0: Received message 'Hello from main!'
Actor 1: Received message 'Hello from main!'
Actor 2: Received message 'Hello from main!'
Actor 3: Received message 'Hello from main!'
```

### What Just Happened?

```
       Main Process
            │
            ├──> toy_actor.hello_world.call("Hello")
            │
    ┌───────┼───────┬───────┬───────┐
    ▼       ▼       ▼       ▼       ▼
 Actor0  Actor1  Actor2  Actor3
  Rank0   Rank1   Rank2   Rank3
  print   print   print   print
```

---

# Part 3: Calling Specific Actors

Sometimes you want to call **specific** actor instances, not all of them. This is where `.slice()` comes in!

## The Slice API

`.slice()` lets you select specific actor instances:

```python
# Select actor at GPU 0
actor_0 = toy_actor.slice(gpus=0)

# Select actor at GPU 2
actor_2 = toy_actor.slice(gpus=2)

# Then call with .call_one()
await actor_0.hello_world.call_one("Hi from actor 0!")
```

## Example: Call Each Actor with a Unique Message

In [None]:
async def call_specific_actors():
    futures = []
    
    for idx in range(NUM_ACTORS):
        # Select the actor at index 'idx'
        actor_instance = toy_actor.slice(gpus=idx)
        
        # Call with a unique message for this actor
        future = actor_instance.hello_world.call_one(
            f"Unique message for actor {idx}"
        )
        futures.append(future)
    
    # Wait for all calls to complete (in parallel!)
    await asyncio.gather(*futures)
    
    print("\n✓ All specific actor calls completed")

# Run it!
await call_specific_actors()

### Expected Output

```
Actor 0: Received message 'Unique message for actor 0'
Actor 1: Received message 'Unique message for actor 1'
Actor 2: Received message 'Unique message for actor 2'
Actor 3: Received message 'Unique message for actor 3'
```

### Key Insight

We used `asyncio.gather()` to schedule all calls in parallel. Without `gather()`, they'd run sequentially (slower).

```
Sequential (slow):        Parallel with gather() (fast):
┌────┐                    ┌────┐
│ A0 │────┐               │ A0 │────┐
└────┘    │               ├────┤    │
          │               │ A1 │────┤
┌────┐    │               ├────┤    ├─> All complete!
│ A1 │────┤               │ A2 │────┤
└────┘    │               ├────┤    │
          │               │ A3 │────┘
┌────┐    │               └────┘
│ A2 │────┤
└────┘    │
          │
┌────┐    │
│ A3 │────┘
└────┘
```

## Comparison: `.call()` vs `.call_one()`

| Method | Use Case | Example |
|--------|----------|----------|
| `.call()` | Broadcast to **all** instances | `actor.method.call(arg)` |
| `.call_one()` | Call a **specific** instance (after `.slice()`) | `actor.slice(gpus=0).method.call_one(arg)` |

### When to Use Each

- **`.call()`**: When you want all actors to do the same thing
  - Example: Initialize all actors, broadcast data, synchronize state
  
- **`.call_one()` with `.slice()`**: When you want specific behavior per actor
  - Example: Assign different data partitions, target specific workers

---

# Part 4: Actor-to-Actor Communication (Ping Pong!)

So far, we've called actors from our main code. But actors can also **talk to each other**! This is powerful for building distributed systems.

## The Ping Pong Example

We'll create two groups of actors that send messages to each other:

```
Actor Group 0              Actor Group 1
┌──────────┐               ┌──────────┐
│ Actor 0  │──── Ping ───> │ Actor 0  │
│ Actor 1  │               │ Actor 1  │
└──────────┘               └──────────┘
                              │
                            Pong!
                              │
┌──────────┐               ┌──────────┐
│ Actor 0  │ <─── Ping ─── │ Actor 0  │
│ Actor 1  │               │ Actor 1  │
└──────────┘               └──────────┘
   │
 Pong!
```

## Define the PingPong Actor

This actor can:
- Store a reference to another actor
- Send messages to that actor
- Receive messages from that actor

In [None]:
class PingPongActor(Actor):
    def __init__(self, actor_name):
        """Initialize with a name to identify this actor group."""
        self.actor_name = actor_name
        self.identity = None
        self.other_actor = None
        self.other_actor_pair = None

    @endpoint
    async def init(self, other_actor):
        """
        Initialize this actor with a reference to another actor.
        
        Key insight: We store a 'slice' of the other actor that corresponds
        to our rank. So Actor 0 will talk to the other Actor 0, 
        Actor 1 to the other Actor 1, etc.
        """
        self.other_actor = other_actor
        
        # Get my rank
        self.identity = current_rank().rank
        
        # Slice the other actor to get my "pair" (same rank)
        self.other_actor_pair = other_actor.slice(**current_rank())
        
        print(f"[{self.actor_name}:{self.identity}] Initialized and paired with other actor")

    @endpoint
    async def send(self, msg):
        """Send a message to our paired actor in the other group."""
        await self.other_actor_pair.recv.call(
            f"Sender ({self.actor_name}:{self.identity}) says: {msg}"
        )

    @endpoint
    async def recv(self, msg):
        """Receive a message from our paired actor."""
        print(f"Pong! [{self.actor_name}:{self.identity}] received: {msg}")

### Understanding the Code

**The `init` endpoint:**
- Takes a reference to another actor group
- Uses `.slice(**current_rank())` to pair actors by rank
  - Actor 0 in group A pairs with Actor 0 in group B
  - Actor 1 in group A pairs with Actor 1 in group B

**The `send` endpoint:**
- Calls `recv` on the paired actor
- This demonstrates **actor-to-actor communication**!

**The `recv` endpoint:**
- Receives and prints the message
- The "Pong!"

## Create Two Actor Groups

In [None]:
async def create_ping_pong_actors():
    # Create first mesh with 2 actors
    local_mesh_0 = proc_mesh(gpus=2)
    actor_0 = await local_mesh_0.spawn(
        "actor_0",
        PingPongActor,
        "GroupA",  # This argument is passed to __init__
    )

    # Create second mesh with 2 actors
    local_mesh_1 = proc_mesh(gpus=2)
    actor_1 = await local_mesh_1.spawn(
        "actor_1",
        PingPongActor,
        "GroupB",  # This argument is passed to __init__
    )
    
    print("\n✓ Created two actor groups (2 actors each)")

    return actor_0, actor_1, local_mesh_0, local_mesh_1

# Create the actors
actor_group_a, actor_group_b, mesh_a, mesh_b = await create_ping_pong_actors()

### What We Have Now

```
Group A (actor_group_a)        Group B (actor_group_b)
┌──────────────────┐           ┌──────────────────┐
│ GroupA Actor 0   │           │ GroupB Actor 0   │
│ GroupA Actor 1   │           │ GroupB Actor 1   │
└──────────────────┘           └──────────────────┘

They don't know about each other yet!
```

## Initialize: Pair the Actors

Now we'll tell each actor group about the other.

In [None]:
async def init_ping_pong(actor_0, actor_1):
    # Initialize actors with references to each other
    # We do this in parallel using asyncio.gather
    await asyncio.gather(
        actor_0.init.call(actor_1),  # Group A learns about Group B
        actor_1.init.call(actor_0),  # Group B learns about Group A
    )
    
    print("\n✓ Actors are now paired and ready to communicate!")

# Initialize the pairing
await init_ping_pong(actor_group_a, actor_group_b)

### After Initialization

```
Group A                         Group B
┌──────────────────┐           ┌──────────────────┐
│ GroupA Actor 0   │ <──────>  │ GroupB Actor 0   │
│                  │   paired   │                  │
│ GroupA Actor 1   │ <──────>  │ GroupB Actor 1   │
└──────────────────┘   paired   └──────────────────┘

Each actor knows its "pair" in the other group!
```

## Send Messages Between Actors

Now for the exciting part - let's make them talk!

In [None]:
async def send_ping_pong(actor_0, actor_1):
    print("\n" + "="*60)
    print("Starting Ping Pong Communication")
    print("="*60 + "\n")
    
    # Group A sends "Ping!" to Group B
    print("📤 Group A sending 'Ping!' to Group B...\n")
    await actor_0.send.call("Ping!")
    
    print("\n" + "-"*60 + "\n")
    
    # Group B sends "Ping!" to Group A
    print("📤 Group B sending 'Ping!' to Group A...\n")
    await actor_1.send.call("Ping!")
    
    print("\n" + "="*60)
    print("✓ Ping Pong Complete!")
    print("="*60)

# Run the ping pong!
await send_ping_pong(actor_group_a, actor_group_b)

### Expected Output

```
📤 Group A sending 'Ping!' to Group B...

Pong! [GroupB:0] received: Sender (GroupA:0) says: Ping!
Pong! [GroupB:1] received: Sender (GroupA:1) says: Ping!

📤 Group B sending 'Ping!' to Group A...

Pong! [GroupA:0] received: Sender (GroupB:0) says: Ping!
Pong! [GroupA:1] received: Sender (GroupB:1) says: Ping!
```

### What Happened?

1. **Group A's Actor 0** called `send("Ping!")`
2. This invoked `recv()` on **Group B's Actor 0** (its pair)
3. Group B's Actor 0 printed "Pong!"
4. Same for Actor 1 in both groups
5. Then we reversed the direction!

```
     GroupA Actor 0  ──send()──>  GroupB Actor 0
                                       │
                                    recv()
                                       │
                                   Pong!
```

---

# 🎉 Congratulations! 🎉

You've learned the fundamentals of Monarch!

## What You Learned

### Core Concepts
- ✓ **Actors**: Independent workers with state and endpoints
- ✓ **Endpoints**: Remotely callable methods (with `@endpoint`)
- ✓ **Process Mesh**: Collection of processes for spawning actors

### Calling Patterns
- ✓ **`.call()`**: Broadcast to all actor instances
- ✓ **`.slice()`**: Select specific actor instances
- ✓ **`.call_one()`**: Call a specific sliced actor

### Communication
- ✓ **Main → Actor**: Call endpoints from your code
- ✓ **Actor → Actor**: Actors calling other actors' endpoints
- ✓ **Pairing**: Using `.slice(**current_rank())` to pair actors

## Key Takeaways

1. **Actors run independently** in separate processes
2. **Endpoints are async** - use `await` when calling them
3. **Use `.call()` for broadcast**, `.call_one()` for targeted calls
4. **Actors can reference other actors** for complex distributed systems
5. **`asyncio.gather()` runs operations in parallel** for better performance

## Next Steps: Lightning Studios Series

Now that you understand Monarch basics, continue your journey with the Lightning Studios:

### 🚀 Studio 1: Getting Started (Recommended Next!)
**[studio_1_getting_started.ipynb](./studio_1_getting_started.ipynb)**

Learn how to run distributed multi-node training:
- Launch multi-node jobs on Lightning AI
- Set up distributed process meshes across machines
- Run TorchTitan training for Llama-3-8B
- Scale from 2 to 16+ nodes

### 🔄 Studio 2: Workspace Synchronization
**[studio_2_workspace_sync.ipynb](./studio_2_workspace_sync.ipynb)**

Master hot-reloading for faster iteration:
- Sync local code/config changes to remote nodes
- Update training configs without restarting jobs
- 10x faster iteration cycles

### 🐛 Studio 3: Interactive Debugging
**[studio_3_interactive_debugging.ipynb](./studio_3_interactive_debugging.ipynb)**

Debug distributed systems like a pro:
- Set breakpoints in distributed actors
- Inspect environment variables across nodes
- Use `monarch debug` CLI for interactive debugging

---

## Additional Resources

### 📚 More Examples
Check out these examples in the docs:
- `getting_started.py` - More Monarch fundamentals
- `distributed_tensors.py` - Working with tensors across actors
- `debugging.py` - Debugging distributed actors
- `spmd_ddp.py` - Distributed data parallel training

### 📖 Documentation
- [Monarch GitHub](https://github.com/meta-pytorch/monarch)
- [Monarch Documentation](https://github.com/meta-pytorch/monarch/tree/main/docs)
- [TorchTitan with Monarch](https://github.com/pytorch/torchtitan)

---

## Practice Exercises

Here are some exercises to reinforce your learning:

1. **Modify `ToyActor`** to return a value instead of printing
2. **Create a chain** of 3 actor groups where A → B → C → A
3. **Add a counter** to `PingPongActor` that tracks messages sent/received
4. **Experiment with different mesh sizes** - try 8 or 16 actors

Ready for real-world distributed training? Head to **[Studio 1](./studio_1_getting_started.ipynb)** next!

Happy coding with Monarch! 🎊

---

# Cleanup

When you're done, it's good practice to stop the process meshes.

In [None]:
# Stop the meshes
await toy_mesh.stop()
await mesh_a.stop()
await mesh_b.stop()

print("✓ All process meshes stopped")