```{contents}
```
## **Metadata State in LangGraph**

In LangGraph, the **Metadata State** is the structured, auxiliary information attached to a workflow’s execution state that describes **how**, **when**, **why**, and **under what conditions** each step of the graph executes.
It does **not** represent the task’s main data (e.g., messages, plans, results) but instead governs **execution control, observability, safety, reproducibility, and production behavior**.

---

### **1. Conceptual Definition**

Let the full state be:

[
\text{State} = \text{Task State} ; \cup ; \text{Metadata State}
]

| Component          | Role                                          |
| ------------------ | --------------------------------------------- |
| Task State         | Domain data (messages, plan, results, memory) |
| **Metadata State** | Execution context and governance data         |

Metadata makes LangGraph behave like a **production workflow engine**, not just an LLM pipeline.

---

### **2. What Belongs in Metadata State**

| Category                | Examples                                    |
| ----------------------- | ------------------------------------------- |
| **Execution Control**   | `step_id`, `node_id`, `run_id`, `thread_id` |
| **Timing & Cost**       | timestamps, token usage, latency, cost      |
| **Routing Signals**     | confidence, score, decision flags           |
| **Safety & Governance** | permissions, approval flags, risk level     |
| **Observability**       | logs, trace IDs, metrics                    |
| **Failure Handling**    | retries, error types, backoff count         |
| **Versioning**          | graph version, model version                |
| **Human-in-the-Loop**   | reviewer ID, comments, approvals            |

---

### **3. Designing Metadata State**

```python
class State(TypedDict):
    messages: list
    result: str
    
    # --- Metadata ---
    step: int
    run_id: str
    trace_id: str
    cost: float
    latency_ms: float
    approved: bool
    retries: int
    risk_level: str
```

This makes every node **self-aware of execution conditions**.

---

### **4. How Metadata Flows Through the Graph**

Nodes update metadata exactly like normal state:

```python
def reasoning_node(state):
    start = time.time()

    response = llm.invoke(state["messages"])

    latency = (time.time() - start) * 1000

    return {
        "messages": state["messages"] + [response],
        "latency_ms": latency,
        "step": state["step"] + 1
    }
```

LangGraph automatically merges metadata using **reducers**.

---

### **5. Metadata-Driven Routing**

Metadata often determines **control flow**.

```python
def router(state):
    if state["risk_level"] == "high" and not state["approved"]:
        return "human_review"
    if state["retries"] > 3:
        return END
    return "execute"
```

This enables:

* Approval gates
* Circuit breakers
* Budget enforcement
* Safety boundaries

---

### **6. Metadata & Production Guarantees**

| Problem            | Solved by Metadata         |
| ------------------ | -------------------------- |
| Infinite loops     | `step`, `max_steps`        |
| Cost overruns      | `cost`, `token_usage`      |
| Silent failures    | `error_type`, `retries`    |
| Audit requirements | `trace_id`, timestamps     |
| Compliance         | `approvals`, reviewer info |

---

### **7. Metadata & Checkpointing**

Every checkpoint stores:

```
Task State + Metadata State
```

Allowing:

* Full execution replay
* Partial rollback
* Compliance auditing
* Failure recovery

---

### **8. Example: Safety-Controlled Loop**

```python
def should_continue(state):
    if state["step"] >= 10: 
        return END
    if state["cost"] > 2.00:
        return END
    return "reason"
```

This transforms LangGraph into a **governed autonomous system**.

---

### **9. Mental Model**

Without metadata, a LangGraph system is:

> A smart script

With metadata, it becomes:

> A **self-regulating distributed control system**

---

### **10. Why Metadata State Is Fundamental**

Metadata State is the **operating system layer** of LangGraph.
It enables:

* Reliability
* Safety
* Governance
* Observability
* Scalability
* Enterprise readiness


### Demonstration

In [1]:
from typing import TypedDict, List
from langgraph.graph import StateGraph, END
import time, uuid

class State(TypedDict):
    # Task State
    messages: List[str]
    result: str

    # Metadata State
    step: int
    run_id: str
    cost: float
    latency_ms: float
    approved: bool
    risk_level: str


In [4]:
def reason(state: State):
    start = time.time()

    msg = f"Reasoning step {state['step']}"
    latency = (time.time() - start) * 1000

    return {
        "messages": state["messages"] + [msg],
        "latency_ms": latency,
        "cost": state["cost"] + 0.05,
        "step": state["step"] + 1
    }

def assess_risk(state: State):
    risk = "high" if state["step"] >= 3 else "low"
    return {"risk_level": risk}

def human_review(state: State):
    print("\nHuman approval required:")
    print(state["messages"][-1])
    approval = input("Approve? (y/n): ")
    return {"approved": approval == "y"}


def router(state: State):
    if state["risk_level"] == "high" and not state["approved"]:
        return "human_review"
    if state["cost"] > 0.25:
        return END
    if state["step"] >= 5:
        return END
    return "reason"


In [5]:
builder = StateGraph(State)

builder.add_node("reason", reason)
builder.add_node("assess_risk", assess_risk)
builder.add_node("human_review", human_review)

builder.set_entry_point("reason")
builder.add_edge("reason", "assess_risk")

builder.add_conditional_edges("assess_risk", router, {
    "human_review": "human_review",
    "reason": "reason",
    END: END
})

builder.add_edge("human_review", "reason")

graph = builder.compile()


In [6]:
initial_state: State = {
    "messages": [],
    "result": "",
    "step": 0,
    "run_id": str(uuid.uuid4()),
    "cost": 0.0,
    "latency_ms": 0.0,
    "approved": False,
    "risk_level": "low"
}

final_state = graph.invoke(initial_state)
print("\nFinal State:")
print(final_state)



Human approval required:
Reasoning step 2

Final State:
{'messages': ['Reasoning step 0', 'Reasoning step 1', 'Reasoning step 2', 'Reasoning step 3', 'Reasoning step 4'], 'result': '', 'step': 5, 'run_id': 'e426ed8c-7985-4eba-aac2-d75c52a2bfb0', 'cost': 0.25, 'latency_ms': 0.0, 'approved': True, 'risk_level': 'high'}
