```{contents}
```
## **Streaming Node in LangGraph**

A **Streaming Node** in LangGraph is a specialized execution node that emits **partial outputs incrementally** as the underlying LLM or tool produces them, rather than waiting for the entire computation to finish.
This enables **real-time feedback, low-latency UIs, conversational responsiveness, and long-running task monitoring**.

---

### **1. Motivation: Why Streaming Exists**

Traditional LLM nodes behave as **batch functions**:

```
input → compute → full output → return
```

Streaming nodes change the model to:

```
input → compute → token₁ → token₂ → token₃ → … → final output
```

**Benefits**

| Problem                | Solved by Streaming         |
| ---------------------- | --------------------------- |
| High perceived latency | Immediate partial responses |
| Long responses         | Progressive rendering       |
| User trust             | Visible progress            |
| Monitoring             | Real-time execution insight |

---

### **2. Conceptual Model**

```
State_in
   ↓
Streaming Node
   ↓
[ token → token → token → ... ]
   ↓
State_out (final aggregated result)
```

Streaming does **not** change the graph structure; it changes **how a node produces output**.

---

### **3. Where Streaming Fits in LangGraph**

Streaming nodes are typically:

* LLM calls
* Tool executions with long outputs
* Multi-step generators
* Agent reasoning traces

They integrate seamlessly with:

* Cyclic graphs
* Agent loops
* Human-in-the-loop systems
* UI applications

---

### **4. Core Streaming API in LangGraph**

LangGraph exposes streaming through **graph execution**, not special node types.

```python
for event in graph.stream(input_state):
    print(event)
```

Each `event` represents an **incremental state update**.

---

### **5. Example: Streaming an LLM Node**

```python
from langgraph.graph import StateGraph
from typing import TypedDict
from langchain.chat_models import ChatOpenAI

class State(TypedDict):
    query: str
    answer: str

llm = ChatOpenAI(streaming=True)

def llm_node(state):
    response = llm.invoke(state["query"])
    return {"answer": response.content}

builder = StateGraph(State)
builder.add_node("llm", llm_node)
builder.set_entry_point("llm")

graph = builder.compile()

for event in graph.stream({"query": "Explain transformers"}):
    print(event)
```

**What Happens Internally**

| Step                      | Description           |
| ------------------------- | --------------------- |
| Model produces token      | LLM streams token     |
| LangGraph captures update | Partial state emitted |
| Downstream consumers      | UI / logs update      |
| Final token               | State converges       |

---

### **6. Streaming with Agents (ReAct Loop)**

```python
for event in graph.stream(input):
    if "messages" in event:
        print(event["messages"][-1].content, end="", flush=True)
```

Used for:

* Live chat interfaces
* Debugging agent reasoning
* Tool execution progress

---

### **7. Event Types in Streaming**

| Event        | Meaning              |
| ------------ | -------------------- |
| State update | Partial state change |
| Token event  | LLM token            |
| Node start   | Execution begins     |
| Node end     | Execution completes  |
| Error event  | Failure              |

---

### **8. Streaming in Cyclic Graphs**

Streaming becomes crucial in loops:

```
Reason → Act → Observe → Reason → …
```

Each cycle streams intermediate reasoning and tool output, enabling:

* Live visualization of agent behavior
* Early termination by human
* Adaptive control

---

### **9. Production Use Cases**

| Use Case            | Why Streaming                 |
| ------------------- | ----------------------------- |
| Chatbots            | Real-time conversation        |
| Research assistants | Progressive report generation |
| Code assistants     | Show code while writing       |
| Monitoring agents   | Execution transparency        |
| Autonomous systems  | Live control & safety         |

---

### **10. Safety & Performance Controls**

| Control         | Purpose                |
| --------------- | ---------------------- |
| Token limits    | Prevent runaway output |
| Backpressure    | Avoid UI overload      |
| Throttling      | Rate control           |
| Human interrupt | Stop execution         |
| Checkpointing   | Resume after failure   |

---

### **11. Mental Model**

A Streaming Node converts LangGraph from:

> **"Batch workflow engine"**
> to
> **"Interactive real-time reasoning system"**

---

### **12. Comparison: Streaming vs Non-Streaming**

| Aspect    | Non-Streaming  | Streaming       |
| --------- | -------------- | --------------- |
| Latency   | High           | Low             |
| UX        | Blocking       | Interactive     |
| Debugging | Hard           | Transparent     |
| Autonomy  | Limited        | Strong          |
| Safety    | Low visibility | High visibility |


### Demonstration

In [14]:
from typing import TypedDict, List

class State(TypedDict):
    query: str
    answer: str

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(streaming=True)

def streaming_llm_node(state: State):
    response = llm.invoke(state["query"])
    return {"answer": response.content}



In [15]:
from langgraph.graph import StateGraph, END

builder = StateGraph(State)
builder.add_node("llm", streaming_llm_node)
builder.set_entry_point("llm")
builder.add_edge("llm", END)

graph = builder.compile()


In [16]:
for event in graph.stream({"query": "Explain attention mechanism"}):
    if "answer" in event["llm"].keys():
        print(event['llm']["answer"], end="", flush=True)


Attention mechanism is a technique used in neural networks to improve the performance of machine learning models, particularly in tasks that involve sequential data or have long-range dependencies. 

At its core, attention mechanism allows the model to focus on specific parts of the input data that are more relevant for the task at hand, rather than considering the entire input sequence at once. This is inspired by the way human brains process information, where we selectively focus our attention on certain elements while ignoring others.

In practice, attention mechanism works by assigning a weight to each input element based on its importance for the current step of the computation. These weights are then used to calculate a context vector, which is a weighted sum of all input elements. This context vector is then used in the computation of the model's output, allowing it to effectively capture dependencies and relationships between elements in the input data.

Overall, attention mec