# Session 13 : Streaming in LangGraph

https://youtu.be/D1PcZaeQ2eg?list=PLKnIA16_RmvYsvB8qkUQuJmJNuiCUJFPL

In LangChain, **streaming** means receiving the LLM’s output **incrementally (token by token or chunk by chunk)** as it’s generated, instead of waiting for the full response.

Think of it like **watching a YouTube video while it buffers** vs. **waiting for the full download**:

* **Without streaming** → The LLM finishes generating the whole response, then returns it to you in one chunk.


* **With streaming** → The LLM sends each token/word as soon as it’s generated, and you can process or display it immediately.

### 🔹 Why use Streaming?

1. **Faster user experience (real-time feel)**

   * Users start seeing the answer right away instead of waiting.
   * Especially useful in chatbots or apps where long responses are expected.



2. **Interactive / real-time applications**

   * You can update a web UI (like typing indicators in ChatGPT).
   * Great for conversational assistants, dashboards, or live coding assistants.



3. **Control and monitoring**

   * You can log tokens as they are generated.
   * Interrupt generation early (e.g., stop if unwanted text appears).



4. **Scalability and resource efficiency**

   * Start processing partial results instead of holding everything in memory.
   * Useful for long outputs like summarization or document generation.

### 🔹 Example in LangChain (Streaming with OpenAI)

```python
from langchain_openai import ChatOpenAI

# Enable streaming
llm = ChatOpenAI(
    model="gpt-4o-mini", 
    streaming=True
)

# Define a callback to handle streamed tokens
from langchain.callbacks.base import BaseCallbackHandler

class MyStreamHandler(BaseCallbackHandler):
    def on_llm_new_token(self, token: str, **kwargs):
        print(token, end="", flush=True)

handler = MyStreamHandler()

# Run the model with streaming enabled
response = llm.invoke("Write a short story about a robot learning to cook.", config={"callbacks": [handler]})
```

👉 With this, you’ll see the story **printed word by word** as the LLM generates it.

✅ **In short**:


Use streaming in LangChain when you want **real-time, token-by-token responses** for better user experience, faster feedback, and more control over LLM output.


### Python generator

![image.png](attachment:image.png)

![image.png](attachment:image.png)