## LangGraph Open Deep Research - Supervisor-Researcher Architecture

In this notebook, we'll explore the **supervisor-researcher delegation architecture** for conducting deep research with LangGraph.

You can visit this repository to see the original application: [Open Deep Research](https://github.com/langchain-ai/open_deep_research)

<img style="max-width: 65%; height: 50%;" alt="full_diagram" src="https://github.com/user-attachments/assets/12a2371b-8be2-4219-9b48-90503eb43c69" />

Let's jump in!

## What We're Building

This implementation uses a **hierarchical delegation pattern** where:

1. **User Clarification** - Optionally asks clarifying questions to understand the research scope
2. **Research Brief Generation** - Transforms user messages into a structured research brief
3. **Supervisor** - A lead researcher that analyzes the brief and delegates research tasks
4. **Parallel Researchers** - Multiple sub-agents that conduct focused research simultaneously
5. **Research Compression** - Each researcher synthesizes their findings
6. **Final Report** - All findings are combined into a comprehensive report

<!-- img src="https://private-user-images.githubusercontent.com/181020547/465825499-052f2ed3-c664-4a4f-8ec2-074349dcaa3f.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NjAzODY3NjUsIm5iZiI6MTc2MDM4NjQ2NSwicGF0aCI6Ii8xODEwMjA1NDcvNDY1ODI1NDk5LTA1MmYyZWQzLWM2NjQtNGE0Zi04ZWMyLTA3NDM0OWRjYWEzZi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUxMDEzJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MTAxM1QyMDE0MjVaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1kMGMyN2JlYzQ4MjI5OTVhNWQ2ZGE2ZjYyMjg1YmU0YjlhNmIyYjY5ZjRjNTNjOWU3MjBlZjQ0YzE4OWNjODQ0JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.6Sx4OGFIVxY1281--A4kyfYsA2LedHUA-162lmmqYBQ" alt="Architecture Diagram" style="max-width: 50%; height: 50%;" / -->

<img src="./detailed_agent_graph2.png" alt="Architecture Diagram" style="max-width: 50%; height: 50%;" />


This differs from a section-based approach by allowing dynamic task decomposition based on the research question, rather than predefined sections.

## Dependencies

You'll need API keys for Anthropic (for the LLM) and Tavily (for web search). We'll configure the system to use Anthropic's Claude Sonnet 4 exclusively.

In [1]:
import os
import getpass

os.environ["ANTHROPIC_API_KEY"] = getpass.getpass("Enter your Anthropic API key: ")
os.environ["TAVILY_API_KEY"] = getpass.getpass("Enter your Tavily API key: ")

In [6]:
from uuid import uuid4
# Enable LangSmith tracing
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = getpass.getpass("LangChain API Key:")
os.environ["LANGCHAIN_PROJECT"] = f"AIM - Open Deep Research - {uuid4().hex[0:8]}"

## Task 1: State Definitions

The state structure is hierarchical with three levels:

### Agent State (Top Level)
Contains the overall conversation messages, research brief, accumulated notes, and final report.

### Supervisor State (Middle Level)
Manages the research supervisor's messages, research iterations, and coordinating parallel researchers.

### Researcher State (Bottom Level)
Each individual researcher has their own message history, tool call iterations, and research findings.

We also have structured outputs for tool calling:
- **ConductResearch** - Tool for supervisor to delegate research to a sub-agent
- **ResearchComplete** - Tool to signal research phase is done
- **ClarifyWithUser** - Structured output for asking clarifying questions
- **ResearchQuestion** - Structured output for the research brief

Let's import these from our library: [`open_deep_library/state.py`](open_deep_library/state.py)

In [2]:
# Import state definitions from the library
from open_deep_library.state import (
    # Main workflow states
    AgentState,           # Lines 65-72: Top-level agent state with messages, research_brief, notes, final_report
    AgentInputState,      # Lines 62-63: Input state is just messages
    
    # Supervisor states
    SupervisorState,      # Lines 74-81: Supervisor manages research delegation and iterations
    
    # Researcher states
    ResearcherState,      # Lines 83-90: Individual researcher with messages and tool iterations
    ResearcherOutputState, # Lines 92-96: Output from researcher (compressed research + raw notes)
    
    # Structured outputs for tool calling
    ConductResearch,      # Lines 15-19: Tool for delegating research to sub-agents
    ResearchComplete,     # Lines 21-22: Tool to signal research completion
    ClarifyWithUser,      # Lines 30-41: Structured output for user clarification
    ResearchQuestion,     # Lines 43-48: Structured output for research brief
)

#### ❓ Question 1:

 Explain the interrelationships between the three states.  Why don't we just make a single huge state?

##### ✅ Answer:
The states are defined in a hierarchical structure:
 * Agent (Top level)
 * Supervisor (Middle level)
 * Researcher (Botton level) 
 
 Each agent has it own state to manage only the context relevant to its individual task and its respective interactions with the LLM. This approach allow separation of concerns and avoid context polution where actions taken by one agent can confuse other agents in the same level. 

## Task 2: Utility Functions and Tools

The system uses several key utilities:

### Search Tools
- **tavily_search** - Async web search with automatic summarization to stay within token limits
- Supports Anthropic native web search and Tavily API

### Reflection Tools
- **think_tool** - Allows researchers to reflect on their progress and plan next steps (ReAct pattern)

### Helper Utilities
- **get_all_tools** - Assembles the complete toolkit (search + MCP + reflection)
- **get_today_str** - Provides current date context for research
- Token limit handling utilities for graceful degradation

These are defined in [`open_deep_library/utils.py`](open_deep_library/utils.py)

In [3]:
# Import utility functions and tools from the library
from open_deep_library.utils import (
    # Search tool - Lines 43-136: Tavily search with automatic summarization
    tavily_search,
    
    # Reflection tool - Lines 219-244: Strategic thinking tool for ReAct pattern
    think_tool,
    
    # Tool assembly - Lines 569-597: Get all configured tools
    get_all_tools,
    
    # Date utility - Lines 872-879: Get formatted current date
    get_today_str,
    
    # Supporting utilities for error handling
    get_api_key_for_model,          # Lines 892-914: Get API keys from config or env
    is_token_limit_exceeded,         # Lines 665-701: Detect token limit errors
    get_model_token_limit,           # Lines 831-846: Look up model's token limit
    remove_up_to_last_ai_message,    # Lines 848-866: Truncate messages for retry
    anthropic_websearch_called,      # Lines 607-637: Detect Anthropic native search usage
    openai_websearch_called,         # Lines 639-658: Detect OpenAI native search usage
    get_notes_from_tool_calls,       # Lines 599-601: Extract notes from tool messages
)

### ❓ Question 2:  

What are the advantages and disadvantages of importing these components instead of including them in the notebook?

##### ✅ Answer:
This approach provides many benefits like:
 * better maintainability: makes it easy to read and change.
 * code reuse: classes and functions defined this way can be reused in different Notebooks without code duplication.
 * code carity: the notebook is more clear and organized, easy to read and to follow along.

I can't see any disvatages from using this code importing approach.

## Task 3: Configuration System

The configuration system controls:

### Research Behavior
- **allow_clarification** - Whether to ask clarifying questions before research
- **max_concurrent_research_units** - How many parallel researchers can run (default: 5)
- **max_researcher_iterations** - How many times supervisor can delegate research (default: 6)
- **max_react_tool_calls** - Tool call limit per researcher (default: 10)

### Model Configuration
- **research_model** - Model for research and supervision (we'll use Anthropic)
- **compression_model** - Model for synthesizing findings
- **final_report_model** - Model for writing the final report
- **summarization_model** - Model for summarizing web search results

### Search Configuration
- **search_api** - Which search API to use (ANTHROPIC, TAVILY, or NONE)
- **max_content_length** - Character limit before summarization

Defined in [`open_deep_library/configuration.py`](open_deep_library/configuration.py)

In [4]:
# Import configuration from the library
from open_deep_library.configuration import (
    Configuration,    # Lines 38-247: Main configuration class with all settings
    SearchAPI,        # Lines 11-17: Enum for search API options (ANTHROPIC, TAVILY, NONE)
)

## Task 4: Prompt Templates

The system uses carefully engineered prompts for each phase:

### Phase 1: Clarification
**clarify_with_user_instructions** - Analyzes if the research scope is clear or needs clarification

### Phase 2: Research Brief
**transform_messages_into_research_topic_prompt** - Converts user messages into a detailed research brief

### Phase 3: Supervisor
**lead_researcher_prompt** - System prompt for the supervisor that manages delegation strategy

### Phase 4: Researcher
**research_system_prompt** - System prompt for individual researchers conducting focused research

### Phase 5: Compression
**compress_research_system_prompt** - Prompt for synthesizing research findings without losing information

### Phase 6: Final Report
**final_report_generation_prompt** - Comprehensive prompt for writing the final report

All prompts are defined in [`open_deep_library/prompts.py`](open_deep_library/prompts.py)

In [5]:
# Import prompt templates from the library
from open_deep_library.prompts import (
    clarify_with_user_instructions,                    # Lines 3-41: Ask clarifying questions
    transform_messages_into_research_topic_prompt,     # Lines 44-77: Generate research brief
    lead_researcher_prompt,                            # Lines 79-136: Supervisor system prompt
    research_system_prompt,                            # Lines 138-183: Researcher system prompt
    compress_research_system_prompt,                   # Lines 186-222: Research compression prompt
    final_report_generation_prompt,                    # Lines 228-308: Final report generation
)

## Task 5: Node Functions - The Building Blocks

Now let's look at the node functions that make up our graph. We'll import them from the library and understand what each does.

### The Complete Research Workflow

The workflow consists of 8 key nodes organized into 3 subgraphs:

1. **Main Graph Nodes:**
   - `clarify_with_user` - Entry point that checks if clarification is needed
   - `write_research_brief` - Transforms user input into structured research brief
   - `final_report_generation` - Synthesizes all research into final report

2. **Supervisor Subgraph Nodes:**
   - `supervisor` - Lead researcher that plans and delegates
   - `supervisor_tools` - Executes supervisor's tool calls (delegation, reflection)

3. **Researcher Subgraph Nodes:**
   - `researcher` - Individual researcher conducting focused research
   - `researcher_tools` - Executes researcher's tool calls (search, reflection)
   - `compress_research` - Synthesizes researcher's findings

All nodes are defined in [`open_deep_library/deep_researcher.py`](open_deep_library/deep_researcher.py)

### Node 1: clarify_with_user

**Purpose:** Analyzes user messages and asks clarifying questions if the research scope is unclear.

**Key Steps:**
1. Check if clarification is enabled in configuration
2. Use structured output to analyze if clarification is needed
3. If needed, end with a clarifying question for the user
4. If not needed, proceed to research brief with verification message

**Implementation:** [`open_deep_library/deep_researcher.py` lines 60-115](open_deep_library/deep_researcher.py#L60-L115)

In [7]:
# Import the clarify_with_user node
from open_deep_library.deep_researcher import clarify_with_user

### Node 2: write_research_brief

**Purpose:** Transforms user messages into a structured research brief for the supervisor.

**Key Steps:**
1. Use structured output to generate detailed research brief from messages
2. Initialize supervisor with system prompt and research brief
3. Set up supervisor messages with proper context

**Why this matters:** A well-structured research brief helps the supervisor make better delegation decisions.

**Implementation:** [`open_deep_library/deep_researcher.py` lines 118-175](open_deep_library/deep_researcher.py#L118-L175)

In [8]:
# Import the write_research_brief node
from open_deep_library.deep_researcher import write_research_brief

### Node 3: supervisor

**Purpose:** Lead research supervisor that plans research strategy and delegates to sub-researchers.

**Key Steps:**
1. Configure model with three tools:
   - `ConductResearch` - Delegate research to a sub-agent
   - `ResearchComplete` - Signal that research is done
   - `think_tool` - Strategic reflection before decisions
2. Generate response based on current context
3. Increment research iteration count
4. Proceed to tool execution

**Decision Making:** The supervisor uses `think_tool` to reflect before delegating research, ensuring thoughtful decomposition of the research question.

**Implementation:** [`open_deep_library/deep_researcher.py` lines 178-223](open_deep_library/deep_researcher.py#L178-L223)

In [9]:
# Import the supervisor node (from supervisor subgraph)
from open_deep_library.deep_researcher import supervisor

### Node 4: supervisor_tools

**Purpose:** Executes the supervisor's tool calls, including strategic thinking and research delegation.

**Key Steps:**
1. Check exit conditions:
   - Exceeded maximum iterations
   - No tool calls made
   - `ResearchComplete` called
2. Process `think_tool` calls for strategic reflection
3. Execute `ConductResearch` calls in parallel:
   - Spawn researcher subgraphs for each delegation
   - Limit to `max_concurrent_research_units` (default: 5)
   - Gather all results asynchronously
4. Aggregate findings and return to supervisor

**Parallel Execution:** This is where the magic happens - multiple researchers work simultaneously on different aspects of the research question.

**Implementation:** [`open_deep_library/deep_researcher.py` lines 225-349](open_deep_library/deep_researcher.py#L225-L349)

In [10]:
# Import the supervisor_tools node
from open_deep_library.deep_researcher import supervisor_tools

### Node 5: researcher

**Purpose:** Individual researcher that conducts focused research on a specific topic.

**Key Steps:**
1. Load all available tools (search, MCP, reflection)
2. Configure model with tools and researcher system prompt
3. Generate response with tool calls
4. Increment tool call iteration count

**ReAct Pattern:** Researchers use `think_tool` to reflect after each search, deciding whether to continue or provide their answer.

**Available Tools:**
- Search tools (Tavily or Anthropic native search)
- `think_tool` for strategic reflection
- `ResearchComplete` to signal completion
- MCP tools (if configured)

**Implementation:** [`open_deep_library/deep_researcher.py` lines 365-424](open_deep_library/deep_researcher.py#L365-L424)

In [11]:
# Import the researcher node (from researcher subgraph)
from open_deep_library.deep_researcher import researcher

### Node 6: researcher_tools

**Purpose:** Executes the researcher's tool calls, including searches and strategic reflection.

**Key Steps:**
1. Check early exit conditions (no tool calls, native search used)
2. Execute all tool calls in parallel:
   - Search tools fetch and summarize web content
   - `think_tool` records strategic reflections
   - MCP tools execute external integrations
3. Check late exit conditions:
   - Exceeded `max_react_tool_calls` (default: 10)
   - `ResearchComplete` called
4. Continue research loop or proceed to compression

**Error Handling:** Safely handles tool execution errors and continues with available results.

**Implementation:** [`open_deep_library/deep_researcher.py` lines 435-509](open_deep_library/deep_researcher.py#L435-L509)

In [12]:
# Import the researcher_tools node
from open_deep_library.deep_researcher import researcher_tools

### Node 7: compress_research

**Purpose:** Compresses and synthesizes research findings into a concise, structured summary.

**Key Steps:**
1. Configure compression model
2. Add compression instruction to messages
3. Attempt compression with retry logic:
   - If token limit exceeded, remove older messages
   - Retry up to 3 times
4. Extract raw notes from tool and AI messages
5. Return compressed research and raw notes

**Why Compression?** Researchers may accumulate lots of tool outputs and reflections. Compression ensures:
- All important information is preserved
- Redundant information is deduplicated
- Content stays within token limits for the final report

**Token Limit Handling:** Gracefully handles token limit errors by progressively truncating messages.

**Implementation:** [`open_deep_library/deep_researcher.py` lines 511-585](open_deep_library/deep_researcher.py#L511-L585)

In [13]:
# Import the compress_research node
from open_deep_library.deep_researcher import compress_research

### Node 8: final_report_generation

**Purpose:** Generates the final comprehensive research report from all collected findings.

**Key Steps:**
1. Extract all notes from completed research
2. Configure final report model
3. Attempt report generation with retry logic:
   - If token limit exceeded, truncate findings by 10%
   - Retry up to 3 times
4. Return final report or error message

**Token Limit Strategy:**
- First retry: Use model's token limit × 4 as character limit
- Subsequent retries: Reduce by 10% each time
- Graceful degradation with helpful error messages

**Report Quality:** The prompt guides the model to create well-structured reports with:
- Proper headings and sections
- Inline citations
- Comprehensive coverage of all findings
- Sources section at the end

**Implementation:** [`open_deep_library/deep_researcher.py` lines 607-697](open_deep_library/deep_researcher.py#L607-L697)

In [14]:
# Import the final_report_generation node
from open_deep_library.deep_researcher import final_report_generation

## Task 6: Graph Construction - Putting It All Together

The system is organized into three interconnected graphs:

### 1. Researcher Subgraph (Bottom Level)
Handles individual focused research on a specific topic:
```
START → researcher → researcher_tools → compress_research → END
               ↑            ↓
               └────────────┘ (loops until max iterations or ResearchComplete)
```

### 2. Supervisor Subgraph (Middle Level)
Manages research delegation and coordination:
```
START → supervisor → supervisor_tools → END
            ↑              ↓
            └──────────────┘ (loops until max iterations or ResearchComplete)
            
supervisor_tools spawns multiple researcher_subgraphs in parallel
```

### 3. Main Deep Researcher Graph (Top Level)
Orchestrates the complete research workflow:
```
START → clarify_with_user → write_research_brief → research_supervisor → final_report_generation → END
                 ↓                                       (supervisor_subgraph)
               (may end early if clarification needed)
```

Let's import the compiled graphs from the library.

In [15]:
# Import the pre-compiled graphs from the library
from open_deep_library.deep_researcher import (
    # Bottom level: Individual researcher workflow
    researcher_subgraph,    # Lines 588-605: researcher → researcher_tools → compress_research
    
    # Middle level: Supervisor coordination
    supervisor_subgraph,    # Lines 351-363: supervisor → supervisor_tools (spawns researchers)
    
    # Top level: Complete research workflow
    deep_researcher,        # Lines 699-719: Main graph with all phases
)

## Why This Architecture?

### Advantages of Supervisor-Researcher Delegation

1. **Dynamic Task Decomposition**
   - Unlike section-based approaches with predefined structure, the supervisor can break down research based on the actual question
   - Adapts to different types of research (comparisons, lists, deep dives, etc.)

2. **Parallel Execution**
   - Multiple researchers work simultaneously on different aspects
   - Much faster than sequential section processing
   - Configurable parallelism (1-20 concurrent researchers)

3. **ReAct Pattern for Quality**
   - Researchers use `think_tool` to reflect after each search
   - Prevents excessive searching and improves search quality
   - Natural stopping conditions based on information sufficiency

4. **Flexible Tool Integration**
   - Easy to add MCP tools for specialized research
   - Supports multiple search APIs (Anthropic, Tavily)
   - Each researcher can use different tool combinations

5. **Graceful Token Limit Handling**
   - Compression prevents token overflow
   - Progressive truncation in final report generation
   - Research can scale to arbitrary depths

### Trade-offs

- **Complexity:** More moving parts than section-based approach
- **Cost:** Parallel researchers use more tokens (but faster)
- **Unpredictability:** Research structure emerges dynamically

## Task 7: Running the Deep Researcher

Now let's see the system in action! We'll use it to analyze a PDF document about how people use AI.

### Setup

We need to:
1. Load the PDF document
2. Configure the execution with Anthropic settings
3. Run the research workflow

In [16]:
# Load the PDF document
from pathlib import Path
import PyPDF2

def load_pdf(pdf_path: str) -> str:
    """Load and extract text from PDF."""
    pdf_text = ""
    with open(pdf_path, 'rb') as file:
        pdf_reader = PyPDF2.PdfReader(file)
        for page in pdf_reader.pages:
            pdf_text += page.extract_text() + "\n\n"
    return pdf_text

# Load the PDF about how people use AI
pdf_path = "data/howpeopleuseai.pdf"
pdf_content = load_pdf(pdf_path)

print(f"Loaded PDF with {len(pdf_content)} characters")
print(f"First 500 characters:\n{pdf_content[:500]}...")

Loaded PDF with 112460 characters
First 500 characters:
NBER WORKING PAPER SERIES
HOW PEOPLE USE CHATGPT
Aaron Chatterji
Thomas Cunningham
David J. Deming
Zoe Hitzig
Christopher Ong
Carl Yan Shan
Kevin Wadman
Working Paper 34255
http://www.nber.org/papers/w34255
NATIONAL BUREAU OF ECONOMIC RESEARCH
1050 Massachusetts Avenue
Cambridge, MA 02138
September 2025
We acknowledge help and comments from Joshua Achiam, Hemanth Asirvatham, Ryan 
Beiermeister,  Rachel Brown, Cassandra Duchan Solis, Jason Kwon, Elliott Mokski, Kevin Rao, 
Harrison Satcher,  Gawe...


In [17]:
# Set up the graph with Anthropic configuration
from IPython.display import Markdown, display
import uuid

# Note: deep_researcher is already compiled from the library
# For this demo, we'll use it directly without additional checkpointing
graph = deep_researcher

print("✓ Graph ready for execution")
print("  (Note: The graph is pre-compiled from the library)")

✓ Graph ready for execution
  (Note: The graph is pre-compiled from the library)


### Configuration for Anthropic

We'll configure the system to use:
- **Claude Sonnet 4** for all research, supervision, and report generation
- **Tavily** for web search (you can also use Anthropic's native search)
- **Moderate parallelism** (3 concurrent researchers)
- **Clarification enabled** (will ask if research scope is unclear)

In [18]:
# Configure for Anthropic with moderate settings
config = {
    "configurable": {
        # Model configuration - using Claude Sonnet 4 for everything
        "research_model": "anthropic:claude-sonnet-4-20250514",
        "research_model_max_tokens": 10000,
        
        "compression_model": "anthropic:claude-sonnet-4-20250514",
        "compression_model_max_tokens": 8192,
        
        "final_report_model": "anthropic:claude-sonnet-4-20250514",
        "final_report_model_max_tokens": 10000,
        
        "summarization_model": "anthropic:claude-sonnet-4-20250514",
        "summarization_model_max_tokens": 8192,
        
        # Research behavior
        "allow_clarification": True,
        "max_concurrent_research_units": 1,  # 1 parallel researchers
        "max_researcher_iterations": 2,      # Supervisor can delegate up to 2 times
        "max_react_tool_calls": 3,           # Each researcher can make up to 3 tool calls
        
        # Search configuration
        "search_api": "tavily",  # Using Tavily for web search
        "max_content_length": 50000,
        
        # Thread ID for this conversation
        "thread_id": str(uuid.uuid4())
    }
}

print("✓ Configuration ready")
print(f"  - Research Model: Claude Sonnet 4")
print(f"  - Max Concurrent Researchers: 3")
print(f"  - Max Iterations: 4")
print(f"  - Search API: Tavily")

✓ Configuration ready
  - Research Model: Claude Sonnet 4
  - Max Concurrent Researchers: 3
  - Max Iterations: 4
  - Search API: Tavily


### Execute the Research

Now let's run the research! We'll ask the system to analyze the PDF and provide insights about how people use AI.

The workflow will:
1. **Clarify** - Check if the request is clear (may skip if obvious)
2. **Research Brief** - Transform our request into a structured brief
3. **Supervisor** - Plan research strategy and delegate to researchers
4. **Parallel Research** - Multiple researchers gather information simultaneously
5. **Compression** - Each researcher synthesizes their findings
6. **Final Report** - All findings combined into comprehensive report

In [20]:
# Create our research request with PDF context
research_request = f"""
I have a PDF document about how people use AI. Please analyze this document and provide insights about:

1. What are the main findings about how people are using AI?
2. What are the most common use cases?
3. What trends or patterns emerge from the data?
4. Which opportunities are there for practical AI applications that can be explored by a startup?

Here's the PDF content:

{pdf_content[:10000]}  # First 10k chars to stay within limits

...[content truncated for context window]
"""

# Execute the graph
async def run_research():
    """Run the research workflow and display results."""
    print("Starting research workflow...\n")
    
    async for event in graph.astream(
        {"messages": [{"role": "user", "content": research_request}]},
        config,
        stream_mode="updates"
    ):
        # Display each step
        for node_name, node_output in event.items():
            print(f"\n{'='*60}")
            print(f"Node: {node_name}")
            print(f"{'='*60}")
            
            if node_name == "clarify_with_user":
                if "messages" in node_output:
                    last_msg = node_output["messages"][-1]
                    print(f"\n{last_msg.content}")
            
            elif node_name == "write_research_brief":
                if "research_brief" in node_output:
                    print(f"\nResearch Brief Generated:")
                    print(f"{node_output['research_brief'][:500]}...")
            
            elif node_name == "supervisor":
                print(f"\nSupervisor planning research strategy...")
                if "supervisor_messages" in node_output:
                    last_msg = node_output["supervisor_messages"][-1]
                    if hasattr(last_msg, 'tool_calls') and last_msg.tool_calls:
                        print(f"Tool calls: {len(last_msg.tool_calls)}")
                        for tc in last_msg.tool_calls:
                            print(f"  - {tc['name']}")
            
            elif node_name == "supervisor_tools":
                print(f"\nExecuting supervisor's tool calls...")
                if "notes" in node_output:
                    print(f"Research notes collected: {len(node_output['notes'])}")
            
            elif node_name == "final_report_generation":
                if "final_report" in node_output:
                    print(f"\n" + "="*60)
                    print("FINAL REPORT GENERATED")
                    print("="*60 + "\n")
                    display(Markdown(node_output["final_report"]))
    
    print("\n" + "="*60)
    print("Research workflow completed!")
    print("="*60)

# Run the research
await run_research()

Starting research workflow...


Node: clarify_with_user

I have sufficient information to proceed with your analysis. You've provided a comprehensive PDF document (NBER Working Paper) titled "How People Use ChatGPT" and requested specific insights about:

1. Main findings about AI usage patterns
2. Most common use cases  
3. Trends and patterns from the data
4. Startup opportunities for practical AI applications

The document contains detailed research data about ChatGPT usage from November 2022 through July 2025, including user demographics, work vs. non-work usage patterns, conversation topics, and growth statistics. I will now analyze this document thoroughly to provide you with actionable insights across all four areas you've requested.

Node: write_research_brief

Research Brief Generated:
I need a comprehensive analysis of the NBER Working Paper "How People Use ChatGPT" (Working Paper No. 34255, September 2025) by Aaron Chatterji et al. Please provide detailed insights on: (1) Th

# Comprehensive Analysis of NBER Working Paper "How People Use ChatGPT"

## Main Findings About AI Usage Patterns

The NBER Working Paper No. 34255 "How People Use ChatGPT" represents the first comprehensive academic study analyzing ChatGPT usage patterns from its November 2022 launch through July 2025. The research reveals unprecedented adoption velocity and significant demographic shifts that fundamentally challenge assumptions about AI usage patterns.

### Growth Trajectory from November 2022 to July 2025

ChatGPT's growth trajectory represents an unprecedented rate of technology adoption in human history. By July 2025, the platform had been adopted by approximately 10% of the world's adult population, representing roughly 700 million weekly active users sending 18 billion messages per week [1][2]. The platform reached critical milestones at extraordinary speed: 1 million users within just 5 days of launch in December 2022, 100 million weekly active users by November 2023, and weekly active users doubling every 7-8 months thereafter [7].

The message volume statistics are particularly striking. By June 2025, users were sending more than 2.6 billion messages daily, equivalent to over 30,000 messages per second [7]. This represents a 5.8x increase in message volume within just the last year of the study period [7]. To contextualize this growth, the researchers note that ChatGPT reached 1 billion messages in December 2024, less than two years after release, while Google Search took eight years to reach 1 billion daily searches after its 1999 public launch [7].

### Demographic Shifts in Adoption

The study documents a dramatic transformation in user demographics over the study period. Early adopters were overwhelmingly male, with over 80% male users at launch [7]. However, this gender gap has essentially closed by July 2025, with 52% of active users having typically female names, representing a shift from just 37% feminine names in January 2024 [4][7]. This rapid demographic balancing suggests that initial barriers to female adoption were overcome as the technology matured and use cases diversified.

Age distribution shows that nearly half of all adult messages come from users under 26, indicating strong adoption among younger demographics [2][6]. The user base has become increasingly representative of the global population rather than reflecting the typical early-adopter demographics of previous technology launches.

### Evolution from 53% to 70% Non-Work Usage

One of the most significant findings challenges the prevailing narrative about AI as primarily a workplace productivity tool. The research documents that non-work-related messages have grown from 53% in mid-2024 to over 70% by mid-2025, while work-related usage has remained relatively stable in absolute terms but declined as a percentage of total usage [1][2][3][4]. This shift represents both faster growth in non-work applications and changing usage patterns within existing user cohorts rather than simply compositional changes from new users.

The implications of this finding are profound for economic analysis. While most AI research has focused on workplace productivity impacts, the study suggests that consumer surplus and home production effects may be equally or more significant. This aligns with research by Collis and Brynjolfsson (2025), who estimate consumer surplus of at least $97 billion in 2024 alone in the US from generative AI [1].

### Geographic Distribution Across Income Levels

The study reveals surprising patterns in global adoption that contradict expectations about digital divide effects. Higher growth rates are consistently observed in lower-income countries, suggesting a democratization effect rather than exacerbation of digital inequality [1][3][4]. By May 2025, adoption growth rates in the lowest income countries were over 4 times those in the highest income countries [4][7].

The convergence in usage rates across income levels is particularly striking. Brazil, South Korea, and the United States show similar ChatGPT usage rates despite having GDP per capita of $10,000, $34,000, and $86,000 respectively [7]. Countries at the 50th versus 90th percentile of GDP per capita now demonstrate similar usage rates, with the fastest growth occurring in middle-income countries [7].

## Most Common Use Cases and Their Analysis

The researchers employed an automated classification system using privacy-preserving methods to categorize conversations without human review of content. This methodology revealed that nearly 80% of all ChatGPT conversations fall into three dominant categories: Practical Guidance, Seeking Information, and Writing [1][2][3][4][6].

### Practical Guidance (29% of All Conversations)

Practical Guidance represents the largest single category at approximately 29% of all conversations [2][6]. This category encompasses personalized advice across diverse domains including workout planning, tutoring, teaching, and decision-making support. The tutoring subcategory alone represents 10% of all messages, indicating that educational applications constitute a major use case [2][6].

The prevalence of Practical Guidance reflects ChatGPT's unique ability to provide personalized, contextual advice that adapts to individual circumstances. Unlike traditional information retrieval systems, this category leverages the conversational interface to deliver tailored recommendations and step-by-step guidance. The educational component is particularly significant, as it suggests ChatGPT is functioning as a democratized tutoring resource accessible to users regardless of geographic location or economic circumstances.

### Seeking Information (24% of All Conversations)

The Seeking Information category has grown substantially from 14% to 24% over the study period [6][8], functioning essentially as a conversational search engine. This category includes searches for information about people, current events, products, and recipes, positioning ChatGPT as a direct substitute for traditional web search in many contexts.

The growth trajectory of this category is particularly notable because it represents a shift in how users approach information retrieval. Rather than keyword-based searches that require users to formulate precise queries, ChatGPT allows for natural language information requests that can be refined through dialogue. This conversational approach to information seeking may explain the category's rapid growth as users discover its advantages over traditional search engines.

### Writing (24% of All Conversations, 40% of Work-Related Messages)

Writing represents 24% of overall conversations but dominates work-related tasks, accounting for 40% of work-related messages as of June 2025 [2]. This category includes automated production of emails, documents, and communications, as well as editing, critiquing, summarizing, and translating user-provided text.

Significantly, about two-thirds of all Writing messages ask ChatGPT to modify, edit, or translate existing user text rather than creating entirely new content from scratch [2][8]. This finding suggests that users primarily view ChatGPT as an enhancement tool for their own writing rather than a replacement for human creativity. The emphasis on modification and editing indicates that users maintain agency over content creation while leveraging AI for refinement and improvement.

Despite its prominence in work contexts, the Writing category has declined from 36% to 24% of overall usage [8], reflecting the faster growth of other categories rather than an absolute decline in writing applications.

## Trends and Patterns Emerging from the Data

### Shift in Work vs Non-Work Usage Patterns

The most significant trend documented in the study is the accelerating growth of non-work applications relative to workplace usage. This shift represents a fundamental change in how AI technology is being integrated into daily life, moving beyond the initial focus on workplace productivity to encompass broader life enhancement applications.

The researchers found that this trend reflects changing usage patterns within existing user cohorts rather than compositional effects from new users with different preferences. All user cohorts, including early adopters from Q1 2023, showed increased usage beginning in early 2025, with early adopters sending 40% more messages per day by July 2025 compared to two years earlier [7]. This suggests that ChatGPT has become substantially more capable and user-friendly, leading existing users to expand their usage into non-work domains.

### Changes in User Demographics Over Time

Beyond the gender convergence already discussed, the study reveals broader demographic trends that challenge assumptions about AI adoption patterns. The rapid expansion beyond traditional early-adopter demographics suggests that ChatGPT has achieved mainstream acceptance across diverse population segments.

The age distribution, with nearly half of adult messages coming from users under 26, indicates strong generational adoption that may have long-term implications for how AI tools are integrated into various life domains. This younger user base is likely experimenting with novel applications and use cases that older users might not explore, potentially driving innovation in non-work applications.

### Differences in Usage by Education and Occupation Levels

Work usage correlates strongly with higher education levels and professional occupations [1][3][6]. Users in knowledge-intensive jobs are more likely to use ChatGPT for work-related tasks, particularly writing and document creation. This finding suggests that current AI tools are most readily applicable to jobs involving significant text-based work and decision-making.

However, the growth in non-work usage across all demographic segments indicates that ChatGPT's value proposition extends well beyond professional applications. The democratization effect observed across income levels suggests that educational and occupational barriers to adoption are diminishing as the technology matures and use cases diversify.

### Evolution of Conversation Topics and User Intent

The researchers developed a user intent classification system revealing that 49% of messages are "Asking" (seeking information or advice), 40% are "Doing" (performing specific tasks), and 11% are "Expressing" (social or emotional content) [2]. The prominence of "Asking" behaviors indicates that users primarily value ChatGPT as an advisory tool rather than merely for task automation.

For work-related messages specifically, "Doing" comprises 56% of usage, with most focused on writing tasks [2]. This distinction between work and non-work intent patterns suggests that professional usage emphasizes task completion while personal usage emphasizes consultation and advice-seeking.

Surprising findings include the relatively small share of certain anticipated use cases. Only 4.2% of messages are programming-related [2][6], despite widespread assumptions about AI coding applications. Similarly, just 1.9% involve relationships or personal reflection, and only 0.4% involve games or role-play [8], contradicting narratives about AI companions or entertainment applications.

## Startup Opportunities for Practical AI Applications

The comprehensive usage data from this study reveals significant market opportunities for startups developing practical AI applications. The patterns suggest both underserved market segments and emerging needs that current generative platforms may not fully address.

### Gaps in Current Usage Patterns

Despite ChatGPT's broad adoption, several usage categories remain surprisingly underrepresented, suggesting market opportunities for specialized solutions. Programming represents only 4.2% of overall usage [2][6], indicating potential for developer-focused AI tools that provide more specialized coding assistance than general-purpose chatbots. The relatively low share of programming usage, compared to 33% of work-related Claude conversations, suggests room for more targeted development tools.

The small percentage of social and emotional content (11% "Expressing" category, 1.9% relationships/personal reflection) indicates opportunities for AI applications specifically designed for mental health, relationship advice, or emotional support. Current general-purpose tools may not provide the specialized approaches needed for these sensitive applications.

Educational applications, while representing 10% of overall usage through tutoring, could be significantly expanded with purpose-built educational AI tools. The strong adoption of Practical Guidance suggests demand for more sophisticated, curriculum-aligned educational applications that go beyond basic tutoring.

### Underserved Use Cases and Market Niches

The dominance of three categories (Practical Guidance, Seeking Information, Writing) accounting for nearly 80% of usage suggests that the remaining 20% represents numerous smaller use cases that might benefit from specialized tools. These could include:

**Specialized Professional Tools**: While writing dominates work usage at 40% of work-related messages, other professional functions may be underserved. Legal document analysis, medical decision support, financial planning, and technical consultation represent potential niches for industry-specific AI applications.

**Enhanced Decision Support Systems**: The research emphasizes ChatGPT's primary value as a decision-support tool, particularly for knowledge-intensive jobs [1][6]. Startups could develop more sophisticated decision support systems for specific industries or decision types, incorporating domain-specific data, regulatory requirements, or decision frameworks.

**Localized and Cultural Applications**: The global adoption patterns suggest opportunities for AI tools that better serve specific cultural contexts, languages, or regional needs. While ChatGPT shows broad international adoption, specialized tools addressing local customs, languages, or regulatory environments could capture specific market segments.

### Potential Market Opportunities Based on Demographic and Usage Data

The demographic evolution documented in the study reveals several market opportunities:

**Elderly User Applications**: With nearly half of users under 26, there appears to be significant opportunity for AI tools specifically designed for older adults. These might emphasize ease of use, larger interfaces, voice interaction, or content relevant to retirement, health management, or family coordination.

**Educational Technology for Developing Markets**: The higher growth rates in lower-income countries suggest strong demand for educational AI tools in emerging markets. Startups could develop specialized educational applications optimized for mobile devices, low-bandwidth environments, or local educational systems.

**Small Business Productivity Tools**: The correlation between education/occupation levels and work usage suggests that less-educated workers or small business owners may be underserved by current AI tools. Simplified, task-specific AI applications for trades, retail, or service businesses could address this gap.

**Integration and Workflow Tools**: The growth in usage intensity among existing users suggests demand for AI tools that integrate more seamlessly with existing workflows and applications rather than requiring separate interactions with general-purpose chatbots.

### Areas of Unmet Demand and Emerging Needs

The study data suggests several areas where current AI tools may not fully meet user needs:

**Advanced Writing Collaboration**: While two-thirds of writing tasks involve modifying existing text rather than creating new content, current tools may not optimize for collaborative writing workflows. Specialized writing tools that better integrate with existing document systems or provide more sophisticated editing capabilities could serve this need.

**Complex Decision Support**: The prominence of Practical Guidance and the emphasis on decision support suggest demand for more sophisticated advisory tools that can handle complex, multi-factor decisions with higher stakes than general chatbots typically address.

**Privacy-Focused Applications**: The study's emphasis on privacy-preserving analysis methods suggests user concern about data privacy. Startups developing AI tools with enhanced privacy protections, local processing, or industry-specific compliance requirements could capture privacy-conscious market segments.

**Vertical Integration Opportunities**: The broad usage patterns suggest opportunities for AI tools that integrate multiple functions (information seeking, writing, guidance) within specific domains such as healthcare, finance, education, or legal services.

The research methodology itself, using automated classification and privacy-preserving analysis, suggests opportunities for startups developing AI analytics tools that help organizations understand their own AI usage patterns while maintaining privacy and compliance requirements.

### Sources

[1] How People Use ChatGPT | NBER: https://www.nber.org/papers/w34255
[2] [PDF] How People Use ChatGPT - National Bureau of Economic Research: https://www.nber.org/system/files/working_papers/w34255/w34255.pdf
[3] How People Use ChatGPT - SSRN: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5487080
[4] How people are using ChatGPT | OpenAI: https://openai.com/index/how-people-are-using-chatgpt/
[5] How People Actually Use ChatGPT — What 1.5M Conversations Tell Us About the Next Decade of Software: https://medium.com/@adnanmasood/how-people-actually-use-chatgpt-what-1-5m-conversations-tell-us-about-the-next-decade-of-software-ea603212b458
[6] How People Really Use ChatGPT: Findings from NBER Research: https://techmaniacs.com/2025/09/15/how-people-really-use-chatgpt-findings-from-nber-research/
[7] How People Use ChatGPT - by David Deming - Forked Lightning: https://forklightning.substack.com/p/how-people-use-chatgpt
[8] What Over 2.5 Billion Daily Messages Reveal About How People Use ChatGPT: https://c3.unu.edu/blog/what-over-2-5-billion-daily-messages-reveal-about-how-people-use-chatgpt


Research workflow completed!


## Understanding the Output

Let's break down what happened:

### Phase 1: Clarification
The system checked if your request was clear. Since you provided a PDF and specific questions, it likely proceeded without clarification.

### Phase 2: Research Brief
Your request was transformed into a detailed research brief that guides the supervisor's delegation strategy.

### Phase 3: Supervisor Delegation
The supervisor analyzed the brief and decided how to break down the research:
- Used `think_tool` to plan strategy
- Called `ConductResearch` multiple times to delegate to parallel researchers
- Each delegation specified a focused research topic

### Phase 4: Parallel Research
Multiple researchers worked simultaneously:
- Each researcher used web search tools to gather information
- Used `think_tool` to reflect after each search
- Decided when they had enough information
- Compressed their findings into clean summaries

### Phase 5: Final Report
All research findings were synthesized into a comprehensive report with:
- Well-structured sections
- Inline citations
- Sources listed at the end
- Balanced coverage of all findings

#### 🏗️ Activity #1: Try Different Configurations

You can experiment with different settings to see how they affect the research.  You may select three or more of the following settings (or invent your own experiments) and describe the results.

### Increase Parallelism
```python
"max_concurrent_research_units": 10  # More researchers working simultaneously
```

### Deeper Research
```python
"max_researcher_iterations": 8   # Supervisor can delegate more times
"max_react_tool_calls": 15      # Each researcher can search more
```

### Use Anthropic Native Search
```python
"search_api": "anthropic"  # Use Claude's built-in web search
```

### Disable Clarification
```python
"allow_clarification": False  # Skip clarification phase
```

## Key Takeaways

### Architecture Benefits
1. **Dynamic Decomposition** - Research structure emerges from the question, not predefined
2. **Parallel Efficiency** - Multiple researchers work simultaneously
3. **ReAct Quality** - Strategic reflection improves search decisions
4. **Scalability** - Handles token limits gracefully through compression
5. **Flexibility** - Easy to add new tools and capabilities

### When to Use This Pattern
- **Complex research questions** that need multi-angle investigation
- **Comparison tasks** where parallel research on different topics is beneficial
- **Open-ended exploration** where structure should emerge dynamically
- **Time-sensitive research** where parallel execution speeds up results

### When to Use Section-Based Instead
- **Highly structured reports** with predefined format requirements
- **Template-based content** where sections are always the same
- **Sequential dependencies** where later sections depend on earlier ones
- **Budget constraints** where token efficiency is critical

## Next Steps

### Extend the System
1. **Add MCP Tools** - Integrate specialized tools for your domain
2. **Custom Prompts** - Modify prompts for specific research types
3. **Different Models** - Try different Claude versions or mix models
4. **Persistence** - Use a real database for checkpointing instead of memory

### Learn More
- [LangGraph Documentation](https://langchain-ai.github.io/langgraph/)
- [Open Deep Research Repo](https://github.com/langchain-ai/open_deep_research)
- [Anthropic Claude Documentation](https://docs.anthropic.com/)
- [Tavily Search API](https://tavily.com/)

### Deploy
- Use LangGraph Cloud for production deployment
- Add proper error handling and logging
- Implement rate limiting and cost controls
- Monitor research quality and costs