```{contents}
```
## **Cache in LangGraph**

In LangGraph, **caching** is a production-critical optimization technique that **stores and reuses intermediate computation results**—especially LLM responses, tool outputs, and node executions—so that repeated or equivalent computations are not recomputed.
Caching improves **latency, cost efficiency, reliability, and scalability** of graph-based LLM systems.

---

### **1. Why Caching Is Essential in LangGraph**

LangGraph workflows are:

* **Stateful**
* **Iterative (often cyclic)**
* **Tool-heavy**
* **LLM-expensive**

Without caching:

| Problem              | Impact                |
| -------------------- | --------------------- |
| Repeated LLM calls   | High cost             |
| Loops & retries      | Exponential latency   |
| Multi-agent systems  | Unstable throughput   |
| Production workloads | Unpredictable scaling |

Caching converts LangGraph from a **research prototype** into a **production system**.

---

### **2. What Can Be Cached**

| Layer           | What is Cached              |
| --------------- | --------------------------- |
| Node Output     | LLM responses, tool results |
| Graph Step      | Entire node execution       |
| Subgraph        | Multi-step execution        |
| State Snapshots | Checkpoints                 |
| LLM Calls       | Prompt → response           |
| Tool Calls      | Input → output              |
| Retrieval       | Query → documents           |

---

### **3. Where Cache Lives**

| Cache Location  | Use Case                  |
| --------------- | ------------------------- |
| In-Memory       | Fast, per-process         |
| Redis           | Distributed, multi-worker |
| Database        | Durable                   |
| Vector DB       | Semantic cache            |
| LangChain Cache | LLM-level caching         |

---

### **4. Cache in LangGraph Architecture**

```
Client
  |
LangGraph Runtime
  |
[Cache Layer] ──► LLM / Tools
  |
State Store
```

When a node executes:

1. Generate **cache key**
2. Check cache
3. If hit → return stored result
4. If miss → compute → store → return

---

### **5. LLM Caching in LangGraph (LangChain Cache)**

```python
from langchain.globals import set_llm_cache
from langchain.cache import SQLiteCache

set_llm_cache(SQLiteCache("llm_cache.db"))
```

Now every LLM call inside LangGraph is cached.

---

### **6. Node-Level Caching Pattern**

```python
cache = {}

def expensive_node(state):
    key = state["query"]
    if key in cache:
        return cache[key]
    result = heavy_computation(state)
    cache[key] = result
    return result
```

---

### **7. Production Cache with Redis**

```python
import redis, json

r = redis.Redis()

def cached_node(state):
    key = f"node:{state['input']}"
    if r.exists(key):
        return json.loads(r.get(key))
    result = compute(state)
    r.set(key, json.dumps(result))
    return result
```

---

### **8. Cache with Cyclic Graphs**

In cyclic graphs, caching prevents infinite recomputation.

```
Reason → Act → Observe
   ↑                ↓
   └──── Cache ─────┘
```

Same state + same input = cached response → fast exit.

---

### **9. Semantic Cache (Advanced)**

Use vector similarity instead of exact match.

```
New Query
   ↓
Vector Search
   ↓
Similar Prompt Found?
   ↓
Return Cached Answer
```

Great for chatbots & assistants.

---

### **10. Cache Invalidation Strategies**

| Strategy          | When              |
| ----------------- | ----------------- |
| TTL               | Time-based expiry |
| Versioning        | Graph update      |
| State Hash Change | Input changed     |
| Manual Clear      | Debugging         |

---

### **11. Performance Impact**

| Metric     | Without Cache | With Cache |
| ---------- | ------------- | ---------- |
| Latency    | High          | Low        |
| Cost       | High          | Low        |
| Throughput | Low           | High       |
| Stability  | Fragile       | Robust     |

---

### **12. Mental Model**

Caching transforms LangGraph from:

> **Compute everything every time**

into:

> **Compute once, reuse everywhere**

This is fundamental for **scaling autonomous, cyclic, multi-agent workflows**.


### Demonstration

In [None]:
# ==============================
# LangGraph Cache Demonstration
# ==============================

from typing import TypedDict
from langgraph.graph import StateGraph, END
from langchain_classic.globals import set_llm_cache
from langchain_classic.cache import SQLiteCache
from langchain_openai import ChatOpenAI

# -------- LLM Cache Setup --------
set_llm_cache(SQLiteCache("llm_cache.db"))

# -------- State Schema --------
class State(TypedDict):
    query: str
    response: str

# -------- Model --------
llm = ChatOpenAI(model="gpt-4o-mini")

# -------- Node-Level Cache --------
node_cache = {}

def cached_llm_node(state: State):
    key = state["query"]
    
    # Node cache check
    if key in node_cache:
        print("Node cache hit")
        return {"response": node_cache[key]}
    
    print("LLM called")
    result = llm.invoke(state["query"]).content
    
    node_cache[key] = result
    return {"response": result}

# -------- Build Graph --------
builder = StateGraph(State)
builder.add_node("llm_node", cached_llm_node)
builder.set_entry_point("llm_node")
builder.add_edge("llm_node", END)

graph = builder.compile()

# -------- Run Twice --------
print("\nFirst run:")
out1 = graph.invoke({"query": "Explain transformers in one sentence."})

print("\nSecond run:")
out2 = graph.invoke({"query": "Explain transformers in one sentence."})

print("\nFinal Output:")
print(out2["response"])
