```{contents}
```
## Horizontal Scaling

**Horizontal scaling** in LangGraph refers to increasing system throughput and reliability by running **multiple concurrent LangGraph runtimes** across machines or containers while sharing execution state, memory, and coordination services.

It enables LangGraph systems to support **high traffic, multi-tenant workloads, and long-running autonomous agents**.

---

### **1. Why Horizontal Scaling Matters for LangGraph**

LangGraph workloads exhibit:

* High concurrency (many users, many agents)
* Long-running workflows
* Stateful execution with checkpoints
* Heavy LLM + tool calls

Vertical scaling (bigger machine) fails quickly under such loads.
LangGraph therefore relies on **distributed execution**.

---

### **2. Core Architecture Model**

```
Clients
   |
Load Balancer
   |
API Gateway
   |
Multiple LangGraph Runtimes  (Pods / VMs)
   |
Shared Infrastructure
   ├── State Store (Redis / Postgres)
   ├── Checkpoint Store
   ├── Vector Store
   └── Message Queue
```

Each LangGraph runtime is **stateless at the compute layer** and **stateful through shared stores**.

---

### **3. What Gets Scaled**

| Component         | Scaling Strategy |
| ----------------- | ---------------- |
| LangGraph Runtime | Horizontal       |
| LLM Gateways      | Horizontal       |
| Tool Workers      | Horizontal       |
| Agent Pools       | Horizontal       |
| Schedulers        | Horizontal       |

---

### **4. Key Production Components**

| Layer            | Technology          |
| ---------------- | ------------------- |
| Load Balancer    | NGINX / ALB         |
| Runtime          | FastAPI + LangGraph |
| State Store      | Redis / PostgreSQL  |
| Checkpoint Store | S3 / GCS            |
| Message Queue    | Kafka / RabbitMQ    |
| Vector DB        | Pinecone / Weaviate |
| Orchestration    | Kubernetes          |

---

### **5. Execution Model Under Horizontal Scaling**

Each request:

1. Hits any available LangGraph pod
2. Loads graph + state from shared store
3. Executes next step(s)
4. Writes updated state back
5. Can resume on **any** pod

This allows:

* Failover
* Migration
* Elastic scaling
* Zero-downtime upgrades

---

### **6. Concurrency Control**

LangGraph uses:

* **Thread IDs** → workflow identity
* **Optimistic locking** on state writes
* **Versioned checkpoints**
* **Idempotent node execution**

Preventing:

* Duplicate execution
* Race conditions
* Lost updates

---

### **7. Scaling Example (FastAPI)**

```python
from fastapi import FastAPI
from langgraph.graph import StateGraph

app = FastAPI()

@app.post("/run")
async def run_graph(input: dict):
    result = graph.invoke(input)
    return result
```

Deploy with Kubernetes:

```bash
kubectl scale deployment langgraph-runtime --replicas=10
```

---

### **8. Autoscaling Policies**

| Metric        | Action   |
| ------------- | -------- |
| Request rate  | Add pods |
| CPU usage     | Add pods |
| LLM latency   | Add pods |
| Queue backlog | Add pods |

Kubernetes HPA:

```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
```

---

### **9. Fault Tolerance Guarantees**

| Failure           | Behavior                 |
| ----------------- | ------------------------ |
| Pod crash         | Resume from checkpoint   |
| Node failure      | Reassigned automatically |
| LLM timeout       | Retry logic              |
| Partial execution | No corruption            |

---

### **10. Horizontal vs Vertical Scaling**

| Feature           | Horizontal | Vertical |
| ----------------- | ---------- | -------- |
| Capacity          | Unlimited  | Limited  |
| Resilience        | High       | Low      |
| Cost efficiency   | High       | Low      |
| Failure isolation | Strong     | Weak     |

---

### **11. Real-World Use Cases**

* Enterprise AI assistants
* Multi-agent research systems
* Autonomous monitoring platforms
* Large-scale customer support bots

---

### **12. Mental Model**

LangGraph becomes a **distributed state machine** where:

> **Compute is disposable, state is persistent.**

This is the fundamental principle enabling safe horizontal scaling.


### Demonstration

In [1]:
# One-cell demonstration: Horizontally-scaled LangGraph execution

from langgraph.graph import StateGraph, END
from typing import TypedDict
import random

# ----------------------------
# 1. Shared Persistent State (simulated DB)
# ----------------------------
GLOBAL_STATE_STORE = {}

# ----------------------------
# 2. Graph State Definition
# ----------------------------
class State(TypedDict):
    thread_id: str
    step: int
    done: bool

# ----------------------------
# 3. Nodes
# ----------------------------
def work(state: State) -> State:
    state["step"] += 1
    if state["step"] >= 5:
        state["done"] = True
    print(f"Worker {random.randint(1,3)} executed step {state['step']}")
    return state

# ----------------------------
# 4. Routing Logic
# ----------------------------
def router(state: State):
    return END if state["done"] else "work"

# ----------------------------
# 5. Graph Construction
# ----------------------------
builder = StateGraph(State)
builder.add_node("work", work)
builder.set_entry_point("work")
builder.add_conditional_edges("work", router, {"work": "work", END: END})

graph = builder.compile()

# ----------------------------
# 6. Simulated Distributed Execution
# ----------------------------
thread_id = "job-123"

# Load from shared store (like Redis/Postgres)
state = GLOBAL_STATE_STORE.get(thread_id, {"thread_id": thread_id, "step": 0, "done": False})

while not state["done"]:
    state = graph.invoke(state)
    # Save back to shared store (like DB)
    GLOBAL_STATE_STORE[thread_id] = state

print("\nFinal State:", GLOBAL_STATE_STORE[thread_id])


Worker 1 executed step 1
Worker 1 executed step 2
Worker 2 executed step 3
Worker 2 executed step 4
Worker 3 executed step 5

Final State: {'thread_id': 'job-123', 'step': 5, 'done': True}
