```{contents}
```
## Metrics

In LangGraph, **metrics** are quantitative signals that measure the **health, performance, reliability, and cost-efficiency** of graph executions.
They transform LLM workflows from experimental pipelines into **production-grade systems**.

---

### **1. Why Metrics Are Essential**

LangGraph workflows are **long-running, stateful, multi-agent, and stochastic**.
Without metrics, you cannot answer:

| Question                     | Why It Matters       |
| ---------------------------- | -------------------- |
| Is the system reliable?      | Production stability |
| Where is time spent?         | Latency optimization |
| Which agents fail?           | Fault diagnosis      |
| How much does each run cost? | Budget control       |
| Is performance degrading?    | Regression detection |

---

### **2. Metric Categories**

| Category            | Purpose                 |
| ------------------- | ----------------------- |
| Execution Metrics   | Runtime behavior        |
| State Metrics       | Data evolution          |
| LLM Metrics         | Model efficiency        |
| Agent Metrics       | Multi-agent performance |
| Reliability Metrics | Failure behavior        |
| Cost Metrics        | Financial control       |
| User Metrics        | Experience quality      |

---

### **3. Execution Metrics**

| Metric        | Description                  |
| ------------- | ---------------------------- |
| Graph Latency | Total runtime per invocation |
| Node Latency  | Time per node                |
| Queue Time    | Scheduling delay             |
| Concurrency   | Parallel tasks               |
| Throughput    | Runs per second              |
| Step Count    | Total transitions            |
| Loop Count    | Cycles executed              |

---

### **4. State & Control Metrics**

| Metric               | Meaning              |
| -------------------- | -------------------- |
| State Size           | Memory footprint     |
| State Versions       | State evolution      |
| Checkpoint Frequency | Recovery granularity |
| Rollback Count       | Failure recovery     |
| Interrupt Count      | Human interventions  |

---

### **5. LLM & Tool Metrics**

| Metric             | Description           |
| ------------------ | --------------------- |
| Prompt Tokens      | Input tokens          |
| Completion Tokens  | Output tokens         |
| Total Tokens       | Cost driver           |
| LLM Latency        | Model response time   |
| Tool Calls         | External dependencies |
| Tool Failure Rate  | Tool reliability      |
| Hallucination Rate | Quality indicator     |

---

### **6. Agent Metrics**

| Metric             | Description             |
| ------------------ | ----------------------- |
| Agent Utilization  | Load per agent          |
| Agent Failure Rate | Robustness              |
| Delegation Depth   | Coordination complexity |
| Conflict Rate      | Consensus difficulty    |
| Consensus Time     | Decision latency        |

---

### **7. Reliability Metrics**

| Metric                | Meaning            |
| --------------------- | ------------------ |
| Success Rate          | Completed runs     |
| Error Rate            | Failures           |
| Retry Count           | Fault tolerance    |
| Timeout Count         | Performance issues |
| Circuit Breaker Trips | System stress      |

---

### **8. Cost & Efficiency Metrics**

| Metric              | Meaning              |
| ------------------- | -------------------- |
| Cost per Run        | Economic efficiency  |
| Cost per Node       | Expensive components |
| Cost per Agent      | Optimization target  |
| Cache Hit Rate      | Cost reduction       |
| Compute Utilization | Resource efficiency  |

---

### **9. User & Quality Metrics**

| Metric              | Purpose         |
| ------------------- | --------------- |
| User Satisfaction   | Outcome quality |
| Completion Rate     | Task success    |
| Correction Rate     | Model accuracy  |
| Human Approval Rate | Compliance      |

---

### **10. Instrumentation Workflow**

```python
from langchain.callbacks import StdOutCallbackHandler

metrics = StdOutCallbackHandler()

graph.invoke(input, config={"callbacks": [metrics]})
```

For production, integrate with:

* **Prometheus** → metrics storage
* **Grafana** → dashboards
* **OpenTelemetry** → distributed tracing
* **LangSmith** → LLM-specific telemetry

---

### **11. Example Metric Extraction**

```python
from langchain.callbacks.tracers import LangChainTracer

tracer = LangChainTracer()
graph.invoke(data, config={"callbacks": [tracer]})

run_id = tracer.latest_run.id
```

Metrics collected:

* Node timings
* Token usage
* Tool calls
* Error events
* State transitions

---

### **12. Metric-Driven Optimization Loop**

```
Measure → Analyze → Optimize → Validate → Deploy → Repeat
```

This transforms LangGraph from **workflow engine** into **autonomous production platform**.

---

### **13. Summary**

LangGraph metrics provide **observability for intelligence**:

> You cannot improve what you cannot measure.

Metrics enable:

* Reliability engineering
* Cost governance
* Performance tuning
* Safety monitoring
* Enterprise-grade operations


### Demonstration

In [2]:
# One-cell LangGraph metrics demonstration

from typing import TypedDict
from langgraph.graph import StateGraph, END
from langchain_classic.callbacks.base import BaseCallbackHandler
import time

# ----------------------------
# 1. Custom Metrics Collector
# ----------------------------

class MetricsCollector(BaseCallbackHandler):
    def __init__(self):
        self.node_times = {}
        self.total_start = time.time()
        self.token_usage = {"prompt": 0, "completion": 0}

    def on_chain_start(self, serialized, inputs, **kwargs):
        self.node_times[serialized.get("name", "unknown")] = time.time()

    def on_chain_end(self, outputs, **kwargs):
        name = list(self.node_times.keys())[-1]
        elapsed = time.time() - self.node_times[name]
        print(f"[METRIC] Node '{name}' latency: {elapsed:.3f}s")

    def on_llm_end(self, response, **kwargs):
        usage = response.llm_output["token_usage"]
        self.token_usage["prompt"] += usage["prompt_tokens"]
        self.token_usage["completion"] += usage["completion_tokens"]

    def summary(self):
        total = time.time() - self.total_start
        print("\n=== METRICS SUMMARY ===")
        print(f"Total execution time: {total:.3f}s")
        print(f"Prompt tokens: {self.token_usage['prompt']}")
        print(f"Completion tokens: {self.token_usage['completion']}")
        print(f"Total tokens: {self.token_usage['prompt'] + self.token_usage['completion']}")

# ----------------------------
# 2. Graph Definition
# ----------------------------

class State(TypedDict):
    x: int

def step(state):
    time.sleep(0.3)
    return {"x": state["x"] + 1}

def check(state):
    return {"done": state["x"] >= 3}

builder = StateGraph(State)
builder.add_node("step", step)
builder.add_node("check", check)

builder.set_entry_point("step")
builder.add_edge("step", "check")
builder.add_conditional_edges("check", lambda s: END if s["done"] else "step", {
    "step": "step",
    END: END
})

graph = builder.compile()

# ----------------------------
# 3. Run With Metrics
# ----------------------------

metrics = MetricsCollector()

result = graph.invoke({"x": 0}, config={"callbacks": [metrics]})
metrics.summary()

print("\nFinal State:", result)


Error in MetricsCollector.on_chain_start callback: AttributeError("'NoneType' object has no attribute 'get'")
Error in MetricsCollector.on_chain_start callback: AttributeError("'NoneType' object has no attribute 'get'")
Error in MetricsCollector.on_chain_end callback: IndexError('list index out of range')
Error in MetricsCollector.on_chain_start callback: AttributeError("'NoneType' object has no attribute 'get'")
Error in MetricsCollector.on_chain_start callback: AttributeError("'NoneType' object has no attribute 'get'")
Error in MetricsCollector.on_chain_end callback: IndexError('list index out of range')
Error in MetricsCollector.on_chain_end callback: IndexError('list index out of range')
Error in MetricsCollector.on_chain_start callback: AttributeError("'NoneType' object has no attribute 'get'")
Error in MetricsCollector.on_chain_end callback: IndexError('list index out of range')
Error in MetricsCollector.on_chain_start callback: AttributeError("'NoneType' object has no attribute 


=== METRICS SUMMARY ===
Total execution time: 0.928s
Prompt tokens: 0
Completion tokens: 0
Total tokens: 0

Final State: {'x': 3}
