```{contents}
```
## Vertical Scaling

**Vertical scaling** in LangGraph refers to **increasing the capacity of a single LangGraph runtime instance** to handle larger workloads, higher concurrency, and more complex workflows by allocating **more CPU, memory, I/O bandwidth, and GPU resources** to that instance â€” rather than adding more instances (horizontal scaling).

It is the primary scaling strategy for **state-heavy, long-running, and highly coupled LLM workflows**.

---

### **1. Why Vertical Scaling Matters in LangGraph**

LangGraph workloads are **stateful** and often include:

* Long reasoning chains
* Large shared state
* Memory stores
* Multi-agent coordination
* Streaming and tool I/O

These characteristics benefit strongly from **high-resource single-node execution**.

| Bottleneck   | How Vertical Scaling Helps  |
| ------------ | --------------------------- |
| CPU          | Faster reasoning & routing  |
| Memory       | Large state + embeddings    |
| I/O          | Faster tool & DB calls      |
| GPU          | Higher throughput LLM calls |
| Context size | Supports bigger prompts     |

---

### **2. What Is Being Scaled**

| Component         | Scaling Target         |
| ----------------- | ---------------------- |
| LangGraph runtime | Process & thread pools |
| State store       | In-memory & persistent |
| Checkpoint engine | Faster serialization   |
| Agent scheduler   | Parallel agent control |
| LLM client        | Higher throughput      |
| Tool executors    | More concurrent calls  |

---

### **3. Vertical Scaling Architecture**

```
Client
   |
Load Balancer (optional)
   |
High-Capacity LangGraph Runtime
   |---- State Store (RAM + SSD)
   |---- Checkpoint Engine
   |---- Agent Scheduler
   |---- Tool Executor Pool
   |
LLM APIs / Local GPU Models
```

A **single powerful node** replaces multiple smaller nodes.

---

### **4. LangGraph Execution Model Under Vertical Scaling**

LangGraph internally relies on:

* **Async task scheduling**
* **Thread pools**
* **Event loops**
* **State locks**

Vertical scaling increases capacity of these mechanisms.

```python
graph.invoke(
    input,
    config={
        "max_concurrency": 64,
        "recursion_limit": 50
    }
)
```

---

### **5. Memory-Heavy Workflows Example**

```python
class State(TypedDict):
    history: list
    context: str
    embeddings: list
    plan: str
```

Large memory requires **RAM expansion** for efficient execution.

---

### **6. Performance Tuning Levers**

| Resource    | Tuning Strategy            |
| ----------- | -------------------------- |
| CPU         | Increase core count        |
| Memory      | Increase RAM               |
| Disk        | NVMe SSD                   |
| Network     | High-throughput NIC        |
| GPU         | More VRAM, multi-GPU       |
| Threads     | Larger thread pool         |
| Async tasks | Higher event-loop capacity |

---

### **7. When to Prefer Vertical Over Horizontal**

| Scenario                | Scaling Choice |
| ----------------------- | -------------- |
| Large shared state      | Vertical       |
| Heavy agent coupling    | Vertical       |
| Long-running sessions   | Vertical       |
| High per-request memory | Vertical       |
| Low-latency needs       | Vertical       |

---

### **8. Vertical Scaling vs Horizontal Scaling**

| Feature                 | Vertical           | Horizontal          |
| ----------------------- | ------------------ | ------------------- |
| State management        | Simple             | Complex             |
| Latency                 | Low                | Higher              |
| Cost efficiency         | High (medium load) | High (massive load) |
| Fault tolerance         | Lower              | Higher              |
| Architecture complexity | Low                | High                |

---

### **9. Production Configuration Example**

```bash
# Kubernetes pod spec
resources:
  requests:
    cpu: "16"
    memory: "64Gi"
  limits:
    cpu: "32"
    memory: "128Gi"
```

---

### **10. Risks & Safeguards**

| Risk                    | Mitigation      |
| ----------------------- | --------------- |
| Single point of failure | Checkpointing   |
| Memory leaks            | Monitoring      |
| Long recovery time      | Fast restart    |
| Resource exhaustion     | Limits & alerts |

---

### **11. Mental Model**

Vertical scaling in LangGraph transforms the system into a **high-performance AI engine** capable of running:

> **Complex agents + large memory + deep reasoning + real-time decision loops**

on a **single optimized runtime**.



### Demonstration

In [4]:
from langgraph.graph import StateGraph, END
from typing import TypedDict
import asyncio

# ------------------ State ------------------
class State(TypedDict):
    step: int
    history: list
    done: bool

# ------------------ Nodes ------------------
async def heavy_reasoning(state: State):
    payload = "x" * 1_000_000   # simulate high memory usage
    await asyncio.sleep(0.05)   # simulate compute
    state["history"].append(len(payload))
    return {"step": state["step"] + 1}

def control(state: State):
    return {"done": state["step"] >= 10}

def router(state: State):
    return END if state["done"] else "reason"

# ------------------ Graph ------------------
builder = StateGraph(State)
builder.add_node("reason", heavy_reasoning)
builder.add_node("control", control)

builder.set_entry_point("reason")
builder.add_edge("reason", "control")
builder.add_conditional_edges("control", router, {
    "reason": "reason",
    END: END
})

graph = builder.compile()

# ------------------ Execute (Notebook Safe) ------------------
result = await graph.ainvoke(
    {"step": 0, "history": [], "done": False},
    config={"recursion_limit": 50, "max_concurrency": 32}
)

print("Final Step:", result["step"])
print("Iterations:", len(result["history"]))
print("Sample State:", result["history"][:3])


Final Step: 10
Iterations: 10
Sample State: [1000000, 1000000, 1000000]
