<a href="https://colab.research.google.com/github/jayyanar/agentic-ai-training/blob/lab-day-1/batch2/lca-langchainV1-essentials/mandatory/output/L3_streaming.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Streaming

<img src="https://github.com/jayyanar/agentic-ai-training/blob/lab-day-1/batch2/lca-langchainV1-essentials/assets/LC_streaming.png?raw=1" width="400">

Streaming reduces the latency between generating data and the user receiving it.
There are two types frequently used with Agents:

## Setup

Load and/or check for needed environmental variables

What we're doing: Install required packages for streaming examples in Colab.

In [1]:
!pip install -qU langchain-groq langgraph langchain-community

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/157.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m [32m153.6/157.4 kB[0m [31m10.8 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m157.4/157.4 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/2.5 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━[0m [32m2.1/2.5 MB[0m [31m59.2 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m2.5/2.5 MB[0m [31m44.9 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m24.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m137.5/137.5 kB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━

What we're doing: Load the GROQ API key from Colab userdata into the environment.

In [2]:
from google.colab import userdata
import os

os.environ["GROQ_API_KEY"] = userdata.get('GROQ_API_KEY')

What we're doing: Initialize the Groq LLM used in streaming examples.

In [3]:
from langchain.agents import create_agent
from langchain_groq import ChatGroq

# Initialize the Groq model
llm = ChatGroq(
    model="llama-3.1-8b-instant",
    temperature=0,
    max_retries=2,
)

What we're doing: Create an agent with a simple system prompt for streaming demonstrations.

In [4]:
agent = create_agent(
    #model="openai:gpt-5",
    model=llm,
    system_prompt="You are a full-stack comedian",
)

## No Streaming (invoke)

What we're doing: Invoke the agent synchronously (no streaming) to get a full response.

In [5]:
result = agent.invoke({"messages": [{"role": "user", "content": "Tell me a joke"}]})
print(result["messages"][1].content)

Here's one:

You know what's wild? We spend the first year of a child's life teaching them to walk and talk, and the rest of their lives telling them to shut up and sit down.


## values
You have seen this streaming mode in our examples so far.

What we're doing: Stream the agent's `values` mode to show chunked outputs as they arrive.

In [6]:
# Stream = values
for step in agent.stream(
    {"messages": [{"role": "user", "content": "Tell me a Dad joke"}]},
    stream_mode="values",
):
    step["messages"][-1].pretty_print()


Tell me a Dad joke

Here's one:

"I told my wife she was drawing her eyebrows too high. She looked surprised."

How was that? Did I make you groan?


## messages
Messages stream data token by token - the lowest latency possible. This is perfect for interactive applications like chatbots.

What we're doing: Stream token-level `messages` to demonstrate lowest-latency output (token-by-token).

In [7]:
for token, metadata in agent.stream(
    {"messages": [{"role": "user", "content": "Write me a family friendly poem."}]},
    stream_mode="messages",
):
    print(f"{token.content}", end="")

Here's a family-friendly poem for you:

There once was a family so bright,
Their laughter and love shone with delight.
They'd gather 'round, hand in hand,
And make memories that would forever stand.

Their house was a home, full of cheer,
Where hugs and kisses were always near.
Their table was filled with yummy treats,
And their hearts were full of love that skips beats.

The kids would play outside all day,
Chasing butterflies in a sunny way.
Their parents would watch with a smile so wide,
Proud of the little ones, side by side.

As the sun sets and the day grows old,
The family would snuggle up, young and bold.
They'd share stories and secrets, and dreams so bright,
And fill each other's hearts with love and light.

So here's to the family, a shining star,
A bundle of love that goes near and far.
May their laughter and joy be contagious and free,
A family's love, a treasure to see!

## Tools can stream too!
Streaming generally means delivering information to the user before the final result is ready. There are many cases where this is useful. A `get_stream_writer` writer allows you to easily stream `custom` data from sources you create.

What we're doing: Define a streaming tool (`get_weather`) that emits `custom` stream chunks and attach it to the agent.

In [8]:
from langchain.agents import create_agent
from langgraph.config import get_stream_writer


def get_weather(city: str) -> str:
    """Get weather for a given city."""
    writer = get_stream_writer()
    # stream any arbitrary data
    writer(f"Looking up data for city: {city}")
    writer(f"Acquired data for city: {city}")
    return f"It's always sunny in {city}!"


agent = create_agent(
    #model="openai:gpt-5-mini",
    model=llm,
    tools=[get_weather],
)

for chunk in agent.stream(
    {"messages": [{"role": "user", "content": "What is the weather in SF?"}]},
    stream_mode=["values", "custom"],
):
    print(chunk)

('values', {'messages': [HumanMessage(content='What is the weather in SF?', additional_kwargs={}, response_metadata={}, id='40fffc1f-d534-4a43-bdf7-f8bc44ee3545')]})
('values', {'messages': [HumanMessage(content='What is the weather in SF?', additional_kwargs={}, response_metadata={}, id='40fffc1f-d534-4a43-bdf7-f8bc44ee3545'), AIMessage(content='', additional_kwargs={'tool_calls': [{'id': '1gqnzrr9j', 'function': {'arguments': '{"city":"SF"}', 'name': 'get_weather'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 14, 'prompt_tokens': 219, 'total_tokens': 233, 'completion_time': 0.040678001, 'completion_tokens_details': None, 'prompt_time': 0.016393076, 'prompt_tokens_details': None, 'queue_time': 0.005322135, 'total_time': 0.057071077}, 'model_name': 'llama-3.1-8b-instant', 'system_fingerprint': 'fp_9ca2574dca', 'service_tier': 'on_demand', 'finish_reason': 'tool_calls', 'logprobs': None, 'model_provider': 'groq'}, id='lc_run--019bc75a-e4ea-7751-b48e-16

What we're doing: Stream only `custom` tool output to observe the tool's emitted chunks.

In [9]:
for chunk in agent.stream(
    {"messages": [{"role": "user", "content": "What is the weather in SF?"}]},
    stream_mode=["custom"],
):
    print(chunk)

('custom', 'Looking up data for city: SF')
('custom', 'Acquired data for city: SF')


## Try different modes on your own!
Modify the stream mode and the select to produce different results.

What we're doing: Filter the streamed chunks to handle `custom` tool output differently.

In [10]:
for chunk in agent.stream(
    {"messages": [{"role": "user", "content": "What is the weather in SF?"}]},
    stream_mode=["values", "custom"],
):
    if chunk[0] == "custom":
        print(chunk[1])

Looking up data for city: SF
Acquired data for city: SF
