```{contents}
```
## LLM Invocation

**LLM invocation** in LangGraph is the controlled execution of a language model as a **node** inside a **stateful execution graph**, where inputs, outputs, failures, retries, and side-effects are managed through the **graph’s state machine**.

This design enables **deterministic orchestration of non-deterministic models**.

---

### **1. Conceptual Role**

In LangGraph, an LLM is **not called directly**.
It is invoked as part of a **node** whose behavior is governed by:

| Layer    | Responsibility               |
| -------- | ---------------------------- |
| Graph    | Controls *when* the LLM runs |
| State    | Controls *what data* it sees |
| Node     | Defines *how* it is invoked  |
| Edges    | Decide *what happens next*   |
| Policies | Enforce safety & reliability |

---

### **2. Basic LLM Node Anatomy**

```
State → [ LLM Node ] → Updated State
```

An LLM node is simply a function that:

1. Reads from the shared state
2. Calls the model
3. Returns partial state updates

```python
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")
```

```python
def llm_node(state):
    response = llm.invoke(state["prompt"])
    return {"output": response.content}
```

---

### **3. Typed State Integration**

```python
class State(TypedDict):
    prompt: str
    output: str
```

LangGraph enforces **structured flow**:
the LLM only receives data exposed by the state schema.

---

### **4. Graph Wiring**

```python
from langgraph.graph import StateGraph, END

builder = StateGraph(State)
builder.add_node("llm", llm_node)
builder.set_entry_point("llm")
builder.add_edge("llm", END)

graph = builder.compile()
```

```python
graph.invoke({"prompt": "Explain transformers in one sentence"})
```

---

### **5. Prompt Engineering Inside the Node**

```python
def llm_node(state):
    prompt = f"""
    You are an expert teacher.
    Question: {state['question']}
    Answer concisely.
    """
    result = llm.invoke(prompt)
    return {"answer": result.content}
```

This keeps **prompt logic local**, while **control flow remains global**.

---

### **6. Advanced Invocation Patterns**

| Pattern             | Purpose                     |
| ------------------- | --------------------------- |
| Multi-Model Routing | Cost / latency optimization |
| Tool-Calling        | Structured external actions |
| Streaming           | Real-time token flow        |
| Batch Inference     | High throughput             |
| Reflection          | Self-correction             |
| ReAct               | Reason → Act loops          |
| Fallback Models     | Reliability                 |

---

### **7. Conditional LLM Routing**

```python
def router(state):
    if state["difficulty"] == "hard":
        return "gpt4"
    return "gpt35"
```

```python
builder.add_conditional_edges("router", router, {
    "gpt4": "llm_gpt4",
    "gpt35": "llm_gpt35"
})
```

---

### **8. LLM + Tools Invocation**

```python
from langchain.tools import tool

@tool
def calculator(x: int, y: int):
    return x + y
```

```python
llm = ChatOpenAI().bind_tools([calculator])
```

The model decides **when** to call tools;
LangGraph controls **how and where** those effects propagate.

---

### **9. Reliability Controls**

| Control         | Mechanism                  |
| --------------- | -------------------------- |
| Retries         | Automatic retry nodes      |
| Timeout         | Hard execution limits      |
| Fallback        | Alternate model paths      |
| Circuit Breaker | Prevent cascading failures |
| Checkpointing   | Safe resume                |
| Audit Log       | Full traceability          |

```python
graph.invoke(input, config={"recursion_limit": 25})
```

---

### **10. Why LangGraph Invocation Is Different**

| Traditional LLM Call | LangGraph Invocation |
| -------------------- | -------------------- |
| Ad-hoc               | State-governed       |
| Hidden side effects  | Explicit transitions |
| No memory            | Persistent memory    |
| No recovery          | Fault tolerant       |
| Hard to debug        | Fully traceable      |

---

### **11. Mental Model**

> **LLM Invocation = A controlled, observable, recoverable computational step in a larger state machine**

This makes LangGraph suitable for **autonomous agents, enterprise systems, and long-running reasoning processes**.

### Demonstration

In [1]:
# ===== Complete LangGraph LLM Invocation Demo =====

from typing import TypedDict
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI

# ------------------------------
# 1. Define State Schema
# ------------------------------
class State(TypedDict):
    question: str
    difficulty: str
    answer: str

# ------------------------------
# 2. Initialize Models
# ------------------------------
fast_model = ChatOpenAI(model="gpt-3.5-turbo")
strong_model = ChatOpenAI(model="gpt-4o-mini")

# ------------------------------
# 3. Define LLM Nodes
# ------------------------------
def fast_llm(state: State):
    response = fast_model.invoke(f"Answer simply: {state['question']}")
    return {"answer": response.content}

def strong_llm(state: State):
    response = strong_model.invoke(f"Provide a detailed answer: {state['question']}")
    return {"answer": response.content}

# ------------------------------
# 4. Routing Logic
# ------------------------------
def route(state: State):
    return "strong" if state["difficulty"] == "hard" else "fast"

# ------------------------------
# 5. Build Graph
# ------------------------------
builder = StateGraph(State)

builder.add_node("fast", fast_llm)
builder.add_node("strong", strong_llm)

builder.set_entry_point("fast")

builder.add_conditional_edges("fast", route, {
    "fast": "fast",
    "strong": "strong"
})

builder.add_edge("strong", END)
builder.add_edge("fast", END)

graph = builder.compile()

# ------------------------------
# 6. Execute
# ------------------------------
result = graph.invoke({
    "question": "Explain transformers in one paragraph.",
    "difficulty": "hard"
})

print(result["answer"])


Transformers are a type of deep learning architecture introduced in the paper "Attention is All You Need" by Vaswani et al. in 2017, which primarily revolutionized the fields of natural language processing (NLP) and beyond. Unlike previous sequential models like RNNs and LSTMs, transformers leverage self-attention mechanisms to process data in parallel, enabling them to capture long-range dependencies in sequences more effectively. The architecture consists of an encoder-decoder structure: the encoder processes the input data to create a rich set of representations, while the decoder generates the output sequence based on the encoder's output. Transformers utilize multi-head attention and position-wise feedforward networks, allowing the model to focus on different parts of the input when generating outputs and to learn complex relationships within the data. Additionally, transformers use positional encodings to account for the order of input sequences, as the architecture itself does n