```{contents}
```
## Performance Profiling

**Performance profiling** in LangGraph is the systematic measurement, analysis, and optimization of **execution time, resource usage, cost, and throughput** of LLM-based workflows.
It ensures that complex agentic systems remain **fast, reliable, scalable, and cost-efficient** in production.

---

### **1. Why Performance Profiling Is Critical**

LangGraph systems are **stateful, multi-step, tool-heavy, and often cyclic**.
Without profiling, they suffer from:

* Unbounded latency
* Excessive token cost
* Hidden bottlenecks in tools or agents
* Cascading failures under load

Profiling makes performance **observable and controllable**.

---

### **2. Performance Dimensions Measured**

| Dimension    | What It Measures           |
| ------------ | -------------------------- |
| Latency      | End-to-end execution time  |
| Node Time    | Time per node              |
| LLM Time     | Model inference delay      |
| Tool Time    | External call delay        |
| Token Usage  | Prompt + completion tokens |
| Cost         | Monetary cost per run      |
| Memory       | State + cache size         |
| Throughput   | Requests per second        |
| Concurrency  | Parallel execution load    |
| Failure Rate | Retries, errors, timeouts  |

---

### **3. Instrumentation Architecture**

```
Client
  |
LangGraph Runtime
  |
Execution Tracer
  |—— Node Timings
  |—— State Snapshots
  |—— Token Counters
  |—— Tool Metrics
  |
Observability Backend
(LangSmith / OpenTelemetry / Prometheus)
```

LangGraph exposes **hooks** for each execution stage.

---

### **4. Enabling Profiling in LangGraph**

```python
from langchain.callbacks import LangChainTracer
from langsmith import Client

tracer = LangChainTracer()
result = graph.invoke(input, config={"callbacks": [tracer]})
```

This automatically records:

* Node entry/exit
* Token counts
* Tool latency
* State transitions

---

### **5. Node-Level Timing Analysis**

Each node produces timing metadata:

| Metric      | Meaning                 |
| ----------- | ----------------------- |
| start_time  | Node execution start    |
| end_time    | Node execution end      |
| duration    | end_time - start_time   |
| retry_count | Failures before success |

Used to detect:

* Slow LLM calls
* Tool bottlenecks
* Infinite loops

---

### **6. Token & Cost Profiling**

```python
from langchain.callbacks import get_openai_callback

with get_openai_callback() as cb:
    graph.invoke(input)

print(cb.total_tokens, cb.total_cost)
```

| Metric            | Purpose             |
| ----------------- | ------------------- |
| Prompt tokens     | Context size        |
| Completion tokens | Output size         |
| Total cost        | Budget control      |
| Calls per run     | Throughput planning |

---

### **7. Profiling Cyclic & Agentic Workflows**

Cyclic graphs require **loop-aware profiling**.

Metrics tracked per iteration:

* Iteration count
* Cumulative latency
* Convergence rate
* Error accumulation

```python
graph.invoke(input, config={"recursion_limit": 20})
```

---

### **8. Visualization & Debugging**

| Tool         | Capability                       |
| ------------ | -------------------------------- |
| LangSmith    | Execution timelines, token usage |
| Graph Viewer | Structural bottlenecks           |
| Tracing UI   | State evolution                  |
| Logs         | Failure diagnosis                |

---

### **9. Performance Optimization Techniques**

| Bottleneck    | Solution               |
| ------------- | ---------------------- |
| Slow LLM      | Model routing, caching |
| Large prompts | State pruning          |
| Slow tools    | Async + batching       |
| High cost     | Token compression      |
| Long loops    | Convergence rules      |
| Memory bloat  | Checkpoint pruning     |

---

### **10. Production Performance Guardrails**

| Guardrail        | Purpose                |
| ---------------- | ---------------------- |
| Max recursion    | Prevent infinite loops |
| Timeouts         | Kill hung nodes        |
| Rate limits      | Prevent overload       |
| Budget caps      | Enforce cost ceiling   |
| Circuit breakers | Stop failure cascades  |

---

### **11. Performance Testing Workflow**

```
Define SLA → Instrument Graph → Run Load Tests →
Collect Metrics → Identify Bottlenecks → Optimize → Re-test
```

---

### **12. Mental Model**

> **LangGraph performance profiling treats your AI system like a distributed service.**

You do not optimize prompts.
You optimize **systems**.

---


### Demonstration

In [1]:
# =========================
# LangGraph Performance Profiling Demo (Single Cell)
# =========================

from typing import TypedDict
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from langchain_classic.callbacks import LangChainTracer, get_openai_callback
import time

# -------- 1. Define shared state --------

class State(TypedDict):
    question: str
    answer: str
    step: int

# -------- 2. Instrumented Nodes --------

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

def reason_node(state: State):
    start = time.time()
    response = llm.invoke(f"Answer briefly: {state['question']}")
    duration = time.time() - start
    print(f"[Reason Node] Time: {duration:.2f}s")
    return {
        "answer": response.content,
        "step": state["step"] + 1
    }

def check_node(state: State):
    print(f"[Check Node] Step: {state['step']}")
    if state["step"] >= 2:
        return {"done": True}
    return {"done": False}

# -------- 3. Build Cyclic Graph --------

builder = StateGraph(State)

builder.add_node("reason", reason_node)
builder.add_node("check", check_node)

builder.set_entry_point("reason")
builder.add_edge("reason", "check")

builder.add_conditional_edges(
    "check",
    lambda s: END if s["done"] else "reason",
    {"reason": "reason", END: END}
)

graph = builder.compile()

# -------- 4. Enable Tracing & Token Profiling --------

tracer = LangChainTracer()

with get_openai_callback() as cb:
    result = graph.invoke(
        {"question": "Why is the sky blue?", "step": 0},
        config={"callbacks": [tracer], "recursion_limit": 5}
    )

# -------- 5. Performance Report --------

print("\n=== FINAL RESULT ===")
print(result)

print("\n=== TOKEN & COST REPORT ===")
print(f"Total Tokens: {cb.total_tokens}")
print(f"Prompt Tokens: {cb.prompt_tokens}")
print(f"Completion Tokens: {cb.completion_tokens}")
print(f"Estimated Cost: ${cb.total_cost:.6f}")


Failed to send compressed multipart ingest: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"error":"Unauthorized"}\n')trace=019b6e0e-1369-7821-abe6-267a6454274f,id=019b6e0e-1369-7821-abe6-267a6454274f; trace=019b6e0e-1369-7821-abe6-267a6454274f,id=019b6e0e-1626-71a3-9247-635de9c1eb20; trace=019b6e0e-1369-7821-abe6-267a6454274f,id=019b6e0e-1627-7261-bb87-2b1b216824fc


[Reason Node] Time: 1.53s
[Check Node] Step: 1


Failed to send compressed multipart ingest: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"error":"Unauthorized"}\n')trace=019b6e0e-1369-7821-abe6-267a6454274f,id=019b6e0e-1627-7261-bb87-2b1b216824fc; trace=019b6e0e-1369-7821-abe6-267a6454274f,id=019b6e0e-1626-71a3-9247-635de9c1eb20; trace=019b6e0e-1369-7821-abe6-267a6454274f,id=019b6e0e-1c1f-73a1-9f0c-0b0eb3d1d464; trace=019b6e0e-1369-7821-abe6-267a6454274f,id=019b6e0e-1c20-7d01-8abe-a819c4329cc4; trace=019b6e0e-1369-7821-abe6-267a6454274f,id=019b6e0e-1c20-7d01-8abe-a819c4329cc4; trace=019b6e0e-1369-7821-abe6-267a6454274f,id=019b6e0e-1c1f-73a1-9f0c-0b0eb3d1d464; trace=019b6e0e-1369-7821-abe6-267a6454274f,id=019b6e0e-1c21-73f1-99da-19984461004e; trace=019b6e0e-1369-7821-abe6-267a6454274f,id=019b6e0e-1c22-7401-885c-54bd88f20ca7


[Reason Node] Time: 2.02s
[Check Node] Step: 2

=== FINAL RESULT ===
{'question': 'Why is the sky blue?', 'answer': "The sky appears blue due to Rayleigh scattering. When sunlight enters the Earth's atmosphere, shorter blue wavelengths are scattered in all directions by air molecules, making the sky look blue to our eyes.", 'step': 2}

=== TOKEN & COST REPORT ===
Total Tokens: 108
Prompt Tokens: 32
Completion Tokens: 76
Estimated Cost: $0.000050


Failed to send compressed multipart ingest: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"error":"Unauthorized"}\n')trace=019b6e0e-1369-7821-abe6-267a6454274f,id=019b6e0e-1c22-7401-885c-54bd88f20ca7; trace=019b6e0e-1369-7821-abe6-267a6454274f,id=019b6e0e-1c21-73f1-99da-19984461004e; trace=019b6e0e-1369-7821-abe6-267a6454274f,id=019b6e0e-240b-7920-8b6a-8aa28dc37dcb; trace=019b6e0e-1369-7821-abe6-267a6454274f,id=019b6e0e-240c-7be1-9d71-53433d0adc3c; trace=019b6e0e-1369-7821-abe6-267a6454274f,id=019b6e0e-240c-7be1-9d71-53433d0adc3c; trace=019b6e0e-1369-7821-abe6-267a6454274f,id=019b6e0e-240b-7920-8b6a-8aa28dc37dcb; trace=019b6e0e-1369-7821-abe6-267a6454274f,id=019b6e0e-1369-7821-abe6-267a6454274f
