# Streaming Agent Responses

This notebook demonstrates how to stream agent responses in real-time, allowing you to see the agent's thinking process as it happens.

## Key Concepts
- **Real-time Feedback**: See responses as they develop
- **Better UX**: Improved user experience for long-running tasks
- **Intermediate Steps**: Observe reasoning and tool calls
- **Early Termination**: Ability to stop if needed

## Stream Modes
- **"values"**: Get complete state at each step
- **"updates"**: Get only the changes/updates
- **"messages"**: Get only message updates

## What You Can Stream
- Model thinking/reasoning
- Tool calls being made
- Tool results/observations
- Final answers

## Prerequisites

Make sure you have the required packages installed:

```bash
pip install --pre langchain langchain-community langchain-core langgraph pydantic
ollama pull qwen3
ollama serve
```

In [1]:
# Import required modules
from langchain_ollama import ChatOllama
from langchain.agents import create_agent
import tools

## Basic Streaming Setup

Let's start with a basic streaming agent configuration:

In [2]:
print("=== Streaming Agent Responses ===")

model = ChatOllama(model="qwen3")
agent = create_agent(model, tools=[tools.web_search, tools.calculate])

print("✓ Streaming agent created with tools: web_search, calculate")
print("  Ready for real-time streaming responses")

=== Streaming Agent Responses ===
✓ Streaming agent created with tools: web_search, calculate
  Ready for real-time streaming responses


In [None]:
print("=== Testing Streaming Responses ===")

# Test streaming with a multi-step query
query = "Search for the latest AI news, then calculate 25 * 4 + 100"

print(f"\nStreaming response for: '{query}'")
print("Watching agent work in real-time:")
print("-" * 50)

try:
    # Stream the agent's execution
    for chunk in agent.stream({"messages": query}, stream_mode="values"):
        if "messages" in chunk and chunk["messages"]:
            latest_message = chunk["messages"][-1]
            
            # Handle different message types
            if hasattr(latest_message, 'content') and latest_message.content:
                if hasattr(latest_message, 'tool_calls') and latest_message.tool_calls:
                    tool_names = [tc['name'] for tc in latest_message.tool_calls]
                    print(f"Agent is calling tools: {tool_names}")
                else:
                    print(f"Agent thinking: {latest_message.content[:100]}...")
            
            elif hasattr(latest_message, 'tool_calls') and latest_message.tool_calls:
                tool_names = [tc['name'] for tc in latest_message.tool_calls]
                print(f"Agent calling tools: {tool_names}")
    
    print("-" * 50)
    print("Streaming completed")

except Exception as e:
    print(f"Streaming demo error: {e}")
    # Fallback to regular invoke
    result = agent.invoke({"messages": query})
    print(f"Fallback response: {result['messages'][-1].content}")

print("In production, you'd see each step as it happens")

=== Testing Streaming Responses ===

Streaming response for: 'Search for the latest AI news, then calculate 25 * 4 + 100'
Watching agent work in real-time:
--------------------------------------------------
Agent thinking: Search for the latest AI news, then calculate 25 * 4 + 100...
Agent is calling tools: ['web_search', 'calculate']
Agent thinking: 25 * 4 + 100 = 200...
Agent thinking: <think>
Okay, let me process the user's query. They first asked to search for the latest AI news, wh...
--------------------------------------------------
Streaming completed
In production, you'd see each step as it happens
