```{contents}
```
## **Streaming Tokens in LangGraph**

**Streaming tokens** in LangGraph refers to the ability to **emit partial model outputs incrementally as they are generated**, rather than waiting for the full response.
This enables **low-latency UX**, real-time agent monitoring, progressive rendering, and early intervention in long-running workflows.

---

### **1. Motivation and Intuition**

Without streaming, the system behaves as:

```
Input → [LLM computes silently] → Final Output
```

With streaming:

```
Input → token₁ → token₂ → token₃ → ... → Final Output
```

**Advantages**

| Property      | Impact                            |
| ------------- | --------------------------------- |
| Latency       | Immediate feedback                |
| UX            | Perceived responsiveness          |
| Control       | Can interrupt or modify execution |
| Observability | Inspect reasoning in real-time    |
| Cost          | Early termination possible        |

---

### **2. Where Streaming Lives in LangGraph**

Streaming operates at the **node execution layer**, primarily on:

* **LLM nodes**
* **Agent nodes**
* **Tool-calling nodes**

LangGraph **propagates streaming events** through the graph runtime.

---

### **3. Core Streaming Architecture**

```
Graph Runtime
   ↓
Node Executor
   ↓
LLM (stream=True)
   ↓
Token Events
   ↓
LangGraph Stream Channel
   ↓
User / UI / Logger
```

Each token becomes an **event** on the execution channel.

---

### **4. Minimal Working Example**

```python
from langgraph.graph import StateGraph, END
from typing import TypedDict
from langchain.chat_models import ChatOpenAI

class State(TypedDict):
    messages: list

llm = ChatOpenAI(streaming=True)

def chat_node(state):
    response = llm.invoke(state["messages"])
    return {"messages": state["messages"] + [response]}

builder = StateGraph(State)
builder.add_node("chat", chat_node)
builder.set_entry_point("chat")
builder.add_edge("chat", END)

graph = builder.compile()

for event in graph.stream({"messages": []}):
    print(event)
```

---

### **5. Consuming Token Streams**

Each event is structured:

```python
{
  "event": "on_llm_new_token",
  "token": "Hello",
  "node": "chat",
  "run_id": "..."
}
```

Typical consumer logic:

```python
for event in graph.stream(input):
    if event["event"] == "on_llm_new_token":
        render(event["token"])
```

---

### **6. Streaming with Tools and Agents**

During tool-augmented generation:

```
Tokens → Tool call → Tool result → More tokens
```

LangGraph preserves this full sequence as a **continuous stream**, enabling:

* Real-time agent debugging
* Live tool visibility
* Partial answer display

---

### **7. Streaming in Cyclic Graphs**

In loops:

```
Reason → Act → Observe → (loop)
```

Streaming remains **continuous across iterations**, allowing:

* Live trace of reasoning evolution
* Intervention before infinite loops
* Progressive refinement

---

### **8. Advanced Streaming Control**

| Feature             | Description                      |
| ------------------- | -------------------------------- |
| Backpressure        | Slow consumers regulate producer |
| Early termination   | Stop graph when condition met    |
| Selective streaming | Stream only chosen nodes         |
| Token filtering     | Remove sensitive tokens          |
| Batching            | Aggregate tokens for performance |

---

### **9. Production Use Cases**

| Use Case          | Benefit                           |
| ----------------- | --------------------------------- |
| Chat UI           | Instant typing effect             |
| Long analysis     | User confidence & engagement      |
| Agent debugging   | Observe failures early            |
| Human-in-the-loop | Mid-execution corrections         |
| Monitoring        | Detect hallucination in real-time |

---

### **10. Performance & Safety Considerations**

* Streaming slightly increases CPU overhead
* Must sanitize tokens before exposing
* Enforce output length & recursion limits
* Mask private data in live streams

---

### **11. Conceptual Summary**

LangGraph streaming transforms LLM execution from:

> **Batch inference → Event-driven computation**

This is essential for building **responsive, observable, and controllable AI systems**.


### Demonstration

In [8]:
# ====== LangGraph Token Streaming Demo (Robust & Version-Safe) ======

from langgraph.graph import StateGraph, END
from typing import TypedDict, List
from langchain_openai import ChatOpenAI
from langchain_classic.schema import HumanMessage

# ---- State Definition ----
class State(TypedDict):
    messages: List

# ---- Streaming LLM ----
llm = ChatOpenAI(model="gpt-4o-mini", streaming=True)

# ---- Node Definition ----
def chat_node(state: State):
    response = llm.invoke(state["messages"])
    return {"messages": state["messages"] + [response]}

# ---- Build Graph ----
builder = StateGraph(State)
builder.add_node("chat", chat_node)
builder.set_entry_point("chat")
builder.add_edge("chat", END)

graph = builder.compile()

# ---- Execute with Streaming ----
print("Assistant:", end=" ", flush=True)

for event in graph.stream(
    {"messages": [HumanMessage(content="Explain LangGraph streaming in one sentence.")]}
):
    # event may be dict, tuple, or object depending on LangGraph version
    if isinstance(event, dict):
        if event.get("type") == "on_llm_new_token":
            print(event["token"], end="", flush=True)

    elif isinstance(event, tuple) and len(event) == 2:
        event_type, payload = event
        if event_type == "on_llm_new_token":
            print(payload, end="", flush=True)

print("\n\nDone.")


Assistant: 

Done.
