```{contents}
```
## **Graceful Degradation in LangGraph**

**Graceful degradation** is a system design principle where an LLM workflow continues operating in a **reduced but stable mode** when parts of the system fail, rather than crashing or producing unsafe output.
In LangGraph, graceful degradation is implemented through **graph structure, state management, and fault-handling nodes**.

---

### **1. Why Graceful Degradation Matters in LLM Systems**

LLM systems fail frequently due to:

* Model timeouts
* Tool API failures
* Network errors
* Cost limits
* Bad intermediate outputs

Without graceful degradation:

> **One failure collapses the entire workflow.**

With graceful degradation:

> **The system returns the best possible answer given current constraints.**

---

### **2. Core Mechanisms in LangGraph**

| Mechanism         | Purpose                             |
| ----------------- | ----------------------------------- |
| Fallback nodes    | Alternate logic when failure occurs |
| Conditional edges | Dynamic routing around failures     |
| State flags       | Record degraded mode                |
| Retry nodes       | Controlled reattempt                |
| Timeout nodes     | Prevent hangs                       |
| Circuit breakers  | Stop repeated failures              |
| Partial outputs   | Preserve progress                   |
| Checkpointing     | Recoverable execution               |
| Human interrupt   | Manual override                     |

---

### **3. Conceptual Execution Model**

```
Primary Path
   ↓
[LLM Call] ──X── Failure
   ↓
[Fallback Model]
   ↓
[Reduced Capability Output]
```

The graph **never collapses**—it reroutes.

---

### **4. State Design for Degradation**

```python
class State(TypedDict):
    query: str
    answer: str
    degraded: bool
    error: str | None
```

The state explicitly records degradation:

* `degraded = True`
* `error` contains failure context

---

### **5. Minimal Graceful Degradation Example**

```python
from langgraph.graph import StateGraph, END

def primary_llm(state):
    raise TimeoutError("LLM timeout")

def fallback_llm(state):
    return {
        "answer": "Basic response due to system limitations.",
        "degraded": True
    }

def router(state):
    if state.get("degraded"):
        return END
    return "primary"

builder = StateGraph(dict)

builder.add_node("primary", primary_llm)
builder.add_node("fallback", fallback_llm)
builder.add_node("router", router)

builder.set_entry_point("primary")

builder.add_edge("primary", "fallback")
builder.add_edge("fallback", END)

graph = builder.compile()

print(graph.invoke({"query": "Explain quantum computing"}))
```

---

### **6. Production-Grade Degradation Patterns**

| Pattern               | Use Case                        |
| --------------------- | ------------------------------- |
| Model fallback        | Switch to cheaper/smaller model |
| Tool fallback         | Use cached or partial data      |
| Answer simplification | Provide high-level response     |
| Agent bypass          | Skip expensive agent loops      |
| Feature shedding      | Disable non-critical steps      |
| Human escalation      | Request manual review           |

---

### **7. Graceful Degradation in Multi-Agent Graphs**

```
Supervisor
   ↓
Research Agent (fails)
   ↓
Summarizer Agent
   ↓
Final Answer (marked degraded)
```

Only the failing agent is bypassed.
The system **still completes the task**.

---

### **8. Safety & Observability**

| Control          | Function                   |
| ---------------- | -------------------------- |
| Degradation flag | Makes behavior transparent |
| Audit logs       | Records failure            |
| Metrics          | Tracks degradation rate    |
| Alerts           | Detects instability        |
| Replay           | Debug degraded runs        |

---

### **9. Why LangGraph Is Ideal for Graceful Degradation**

Traditional pipelines stop on error.
LangGraph **routes around failure** using graph logic.

> **LangGraph treats failure as just another state transition.**

---

### **10. Mental Model**

Graceful degradation in LangGraph behaves like a **self-stabilizing control system**:

> **Failure → Detect → Reroute → Continue → Report**


### Demonstration

In [1]:
from typing import TypedDict
from langgraph.graph import StateGraph, END

# -------------------------
# 1. Define shared state
# -------------------------
class State(TypedDict):
    query: str
    answer: str
    degraded: bool
    error: str | None

# -------------------------
# 2. Primary node (fails)
# -------------------------
def primary_llm(state: State):
    raise TimeoutError("Primary model timeout")

# -------------------------
# 3. Fallback node
# -------------------------
def fallback_llm(state: State):
    return {
        "answer": f"(Degraded Mode) Basic answer for: {state['query']}",
        "degraded": True,
        "error": "Primary model unavailable"
    }

# -------------------------
# 4. Safe wrapper node
# -------------------------
def safe_primary(state: State):
    try:
        return primary_llm(state)
    except Exception as e:
        return {
            "error": str(e),
            "degraded": True
        }

# -------------------------
# 5. Router
# -------------------------
def router(state: State):
    if state.get("degraded"):
        return "fallback"
    return END

# -------------------------
# 6. Build graph
# -------------------------
builder = StateGraph(State)

builder.add_node("primary", safe_primary)
builder.add_node("fallback", fallback_llm)

builder.set_entry_point("primary")

builder.add_conditional_edges("primary", router, {
    "fallback": "fallback",
    END: END
})

builder.add_edge("fallback", END)

graph = builder.compile()

# -------------------------
# 7. Run
# -------------------------
result = graph.invoke({
    "query": "Explain quantum computing simply",
    "answer": "",
    "degraded": False,
    "error": None
})

print(result)


{'query': 'Explain quantum computing simply', 'answer': '(Degraded Mode) Basic answer for: Explain quantum computing simply', 'degraded': True, 'error': 'Primary model unavailable'}
