```{contents}
```
## **Deduplication in LangGraph**

**Deduplication** in LangGraph is a **correctness, reliability, and cost-control mechanism** that ensures the *same logical operation is not executed more than once*, even when retries, crashes, replays, or concurrent invocations occur.

In production LLM systems, failures, retries, and distributed execution make duplicate execution **inevitable** unless explicitly prevented. LangGraph provides architectural hooks to build **idempotent, exactly-once workflows**.

---

### **1. Why Deduplication is Necessary**

| Failure Scenario    | Without Deduplication | With Deduplication |
| ------------------- | --------------------- | ------------------ |
| Network timeout     | Tool executes twice   | Executes once      |
| Worker crash        | LLM call re-issued    | Recovered safely   |
| Replay              | Side effects repeat   | No duplication     |
| Concurrent triggers | Same task runs twice  | Single execution   |
| Human correction    | Duplicate commits     | Safe update        |

Deduplication prevents:

* Double charging APIs
* Duplicate DB writes
* Repeated emails / payments / actions
* Corrupted state

---

### **2. Conceptual Model**

LangGraph execution is **eventual-consistent and replayable**.
Deduplication guarantees:

> **Each logical operation produces its effect at most once.**

This is achieved using:

* **Deterministic identifiers**
* **Idempotent state transitions**
* **Persistent execution records**

---

### **3. Where Deduplication Operates**

| Layer               | Purpose                      |
| ------------------- | ---------------------------- |
| Node execution      | Prevent double-running nodes |
| Tool invocation     | Avoid repeated side effects  |
| State updates       | Ensure safe merges           |
| Graph replay        | Prevent re-applying effects  |
| Distributed workers | Avoid duplicate tasks        |

---

### **4. Deduplication Pattern in LangGraph**

#### **A. Deterministic Operation ID**

```python
import hashlib

def op_id(state):
    raw = f"{state['thread_id']}:{state['step']}"
    return hashlib.sha256(raw.encode()).hexdigest()
```

#### **B. Check Execution Store**

```python
executed_ops = set()

def safe_node(state):
    oid = op_id(state)
    if oid in executed_ops:
        return {}   # skip duplicate
    executed_ops.add(oid)
    # real execution here
    return {"result": expensive_call()}
```

---

### **5. Production-Grade Deduplication with Persistence**

```python
def safe_node(state):
    oid = state["operation_id"]

    if db.exists(oid):
        return db.load(oid)

    result = expensive_call()
    db.save(oid, result)
    return result
```

This guarantees **exactly-once semantics** even after crashes.

---

### **6. Deduplication + Checkpointing**

LangGraph replays from checkpoints.
Deduplication prevents reapplying effects during replay.

```python
builder = StateGraph(State)
builder.add_node("process", safe_node)
builder = builder.compile(checkpointer=PostgresCheckpointer())
```

---

### **7. Deduplication in Tool Calls**

```python
def payment_tool(state):
    pid = state["payment_id"]
    if payments.exists(pid):
        return payments.get(pid)
    charge()
    payments.store(pid)
```

Without this, a retry may charge the customer twice.

---

### **8. Variants of Deduplication Strategies**

| Strategy    | Use Case              |
| ----------- | --------------------- |
| In-memory   | Single-process        |
| Redis-based | Distributed workers   |
| DB-based    | Financial, compliance |
| Hash-based  | Content processing    |
| Time-window | Event streams         |

---

### **9. Deduplication + Idempotency**

| Property     | Meaning                   |
| ------------ | ------------------------- |
| Idempotent   | Safe to repeat            |
| Deduplicated | Never repeated            |
| Exactly-once | Idempotent + Deduplicated |

LangGraph workflows aim for **exactly-once side effects**.

---

### **10. Real-World Use Cases**

| System             | Deduplication Purpose     |
| ------------------ | ------------------------- |
| Payment automation | Prevent double charge     |
| Email workflows    | Avoid duplicate sends     |
| Data pipelines     | Avoid duplicate writes    |
| Agent systems      | Prevent runaway actions   |
| Autonomous ops     | Avoid destructive repeats |

---

### **11. Failure Handling with Deduplication**

| Failure           | Result              |
| ----------------- | ------------------- |
| Crash mid-step    | Resume safely       |
| Network retry     | No duplicate effect |
| Human replay      | No corruption       |
| Worker reschedule | Safe execution      |

---

### **12. Mental Model**

> **LangGraph execution is replayable.
> Deduplication is what makes replay safe.**

It converts unreliable distributed execution into **deterministic, production-safe behavior**.


### Demonstration

In [1]:
# Deduplication + Checkpointing + Exactly-Once Execution in a Single LangGraph Cell

from langgraph.graph import StateGraph, END
from typing import TypedDict
import hashlib

# ----- Persistent Execution Store (Simulated DB) -----
EXECUTION_DB = {}

# ----- State Schema -----
class State(TypedDict):
    thread_id: str
    step: int
    result: str

# ----- Deterministic Operation ID -----
def make_op_id(state: State):
    raw = f"{state['thread_id']}:{state['step']}"
    return hashlib.sha256(raw.encode()).hexdigest()

# ----- Deduplicated Node -----
def safe_node(state: State):
    op_id = make_op_id(state)

    # Deduplication check
    if op_id in EXECUTION_DB:
        print("⚠️ Duplicate execution prevented")
        return EXECUTION_DB[op_id]

    print("✅ Executing expensive operation")
    result = {"result": f"processed step {state['step']}"}
    
    # Persist result for exactly-once semantics
    EXECUTION_DB[op_id] = result
    return result

# ----- Loop Control -----
def router(state: State):
    if state["step"] >= 3:
        return END
    return "process"

# ----- Graph Definition -----
builder = StateGraph(State)
builder.add_node("process", safe_node)
builder.set_entry_point("process")
builder.add_conditional_edges("process", router, {
    "process": "process",
    END: END
})

graph = builder.compile()

# ----- First Run -----
print("\n--- FIRST EXECUTION ---")
graph.invoke({"thread_id": "A1", "step": 1})

# ----- Simulated Crash + Replay -----
print("\n--- REPLAY EXECUTION (Should Deduplicate) ---")
graph.invoke({"thread_id": "A1", "step": 1})



--- FIRST EXECUTION ---
✅ Executing expensive operation
⚠️ Duplicate execution prevented
⚠️ Duplicate execution prevented
⚠️ Duplicate execution prevented
⚠️ Duplicate execution prevented
⚠️ Duplicate execution prevented
⚠️ Duplicate execution prevented
⚠️ Duplicate execution prevented
⚠️ Duplicate execution prevented
⚠️ Duplicate execution prevented
⚠️ Duplicate execution prevented
⚠️ Duplicate execution prevented
⚠️ Duplicate execution prevented
⚠️ Duplicate execution prevented
⚠️ Duplicate execution prevented
⚠️ Duplicate execution prevented
⚠️ Duplicate execution prevented
⚠️ Duplicate execution prevented
⚠️ Duplicate execution prevented
⚠️ Duplicate execution prevented
⚠️ Duplicate execution prevented
⚠️ Duplicate execution prevented
⚠️ Duplicate execution prevented
⚠️ Duplicate execution prevented
⚠️ Duplicate execution prevented


GraphRecursionError: Recursion limit of 25 reached without hitting a stop condition. You can increase the limit by setting the `recursion_limit` config key.
For troubleshooting, visit: https://docs.langchain.com/oss/python/langgraph/errors/GRAPH_RECURSION_LIMIT