# Streaming

<img src="https://github.com/jayyanar/agentic-ai-training/blob/lab-day-1/batch2/lca-langchainV1-essentials/assets/LC_streaming.png?raw=1" width="400">

Streaming reduces the latency between generating data and the user receiving it.
There are two types frequently used with Agents:

## Setup

Load and/or check for needed environmental variables

What we're doing: Install required packages for streaming examples in Colab.

In [None]:
!pip install -qU langchain-groq langgraph langchain-community

What we're doing: Load the GROQ API key from Colab userdata into the environment.

In [None]:
from google.colab import userdata
import os

os.environ["GROQ_API_KEY"] = userdata.get('GROQ_API_KEY')

What we're doing: Initialize the Groq LLM used in streaming examples.

In [None]:
from langchain.agents import create_agent
from langchain_groq import ChatGroq

# Initialize the Groq model
llm = ChatGroq(
    model="llama-3.1-8b-instant",
    temperature=0,
    max_retries=2,
)

What we're doing: Create an agent with a simple system prompt for streaming demonstrations.

In [None]:
agent = create_agent(
    #model="openai:gpt-5",
    model=llm,
    system_prompt="You are a full-stack comedian",
)

## No Streaming (invoke)

What we're doing: Invoke the agent synchronously (no streaming) to get a full response.

In [None]:
result = agent.invoke({"messages": [{"role": "user", "content": "Tell me a joke"}]})
print(result["messages"][1].content)

## values
You have seen this streaming mode in our examples so far.

What we're doing: Stream the agent's `values` mode to show chunked outputs as they arrive.

In [None]:
# Stream = values
for step in agent.stream(
    {"messages": [{"role": "user", "content": "Tell me a Dad joke"}]},
    stream_mode="values",
):
    step["messages"][-1].pretty_print()

## messages
Messages stream data token by token - the lowest latency possible. This is perfect for interactive applications like chatbots.

What we're doing: Stream token-level `messages` to demonstrate lowest-latency output (token-by-token).

In [None]:
for token, metadata in agent.stream(
    {"messages": [{"role": "user", "content": "Write me a family friendly poem."}]},
    stream_mode="messages",
):
    print(f"{token.content}", end="")

## Tools can stream too!
Streaming generally means delivering information to the user before the final result is ready. There are many cases where this is useful. A `get_stream_writer` writer allows you to easily stream `custom` data from sources you create.

What we're doing: Define a streaming tool (`get_weather`) that emits `custom` stream chunks and attach it to the agent.

In [None]:
from langchain.agents import create_agent
from langgraph.config import get_stream_writer


def get_weather(city: str) -> str:
    """Get weather for a given city."""
    writer = get_stream_writer()
    # stream any arbitrary data
    writer(f"Looking up data for city: {city}")
    writer(f"Acquired data for city: {city}")
    return f"It's always sunny in {city}!"


agent = create_agent(
    #model="openai:gpt-5-mini",
    model=llm,
    tools=[get_weather],
)

for chunk in agent.stream(
    {"messages": [{"role": "user", "content": "What is the weather in SF?"}]},
    stream_mode=["values", "custom"],
):
    print(chunk)

What we're doing: Stream only `custom` tool output to observe the tool's emitted chunks.

In [None]:
for chunk in agent.stream(
    {"messages": [{"role": "user", "content": "What is the weather in SF?"}]},
    stream_mode=["custom"],
):
    print(chunk)

## Try different modes on your own!
Modify the stream mode and the select to produce different results.

What we're doing: Filter the streamed chunks to handle `custom` tool output differently.

In [None]:
for chunk in agent.stream(
    {"messages": [{"role": "user", "content": "What is the weather in SF?"}]},
    stream_mode=["values", "custom"],
):
    if chunk[0] == "custom":
        print(chunk[1])