[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain-academy/blob/main/module-4/research-assistant.ipynb) [![Open in LangChain Academy](https://cdn.prod.website-files.com/65b8cd72835ceeacd4449a53/66e9eba12c7b7688aa3dbb5e_LCA-badge-green.svg)](https://academy.langchain.com/courses/take/intro-to-langgraph/lessons/58239974-lesson-4-research-assistant)

# Research Assistant

## Review

We've covered a few major LangGraph themes:

* Memory
* Human-in-the-loop
* Controllability

Now, we'll bring these ideas together to tackle one of AI's most popular applications: research automation. 

Research is often laborious work offloaded to analysts. AI has considerable potential to assist with this.

However, research demands customization: raw LLM outputs are often poorly suited for real-world decision-making workflows. 

Customized, AI-based [research and report generation](https://jxnl.co/writing/2024/06/05/predictions-for-the-future-of-rag/#reports-over-rag) workflows are a promising way to address this.

## Goal

Our goal is to build a lightweight, multi-agent system around chat models that customizes the research process.

`Source Selection` 
* Users can choose any set of input sources for their research.
  
`Planning` 
* Users provide a topic, and the system generates a team of AI analysts, each focusing on one sub-topic.
* `Human-in-the-loop` will be used to refine these sub-topics before research begins.
  
`LLM Utilization`
* Each analyst will conduct in-depth interviews with an expert AI using the selected sources.
* The interview will be a multi-turn conversation to extract detailed insights as shown in the [STORM](https://arxiv.org/abs/2402.14207) paper.
* These interviews will be captured in a using `sub-graphs` with their internal state. 
   
`Research Process`
* Experts will gather information to answer analyst questions in `parallel`.
* And all interviews will be conducted simultaneously through `map-reduce`.

`Output Format` 
* The gathered insights from each interview will be synthesized into a final report.
* We'll use customizable prompts for the report, allowing for a flexible output format. 

![Screenshot 2024-08-26 at 7.26.33 PM.png](https://cdn.prod.website-files.com/65b8cd72835ceeacd4449a53/66dbb164d61c93d48e604091_research-assistant1.png)

## Setup

We'll use [LangSmith](https://docs.langchain.com/langsmith/home) for [tracing](https://docs.langchain.com/langsmith/observability-concepts).

In [1]:
from dotenv import load_dotenv
load_dotenv()

True

### 1: Core Logic and Graph Definitions

In [2]:
import operator
from typing import List, Annotated, TypedDict
from pydantic import BaseModel, Field
from IPython.display import Image, display, Markdown

from langgraph.graph import START, END, StateGraph, MessagesState
from langgraph.checkpoint.memory import MemorySaver
from langgraph.types import Send
from langchain_ollama import ChatOllama
from langchain_core.messages import AIMessage, HumanMessage, SystemMessage
from langchain_tavily import TavilySearch

# --- 1. SETUP ---
llm = ChatOllama(model="qwen3", temperature=0, format="json")
search_tool = TavilySearch(max_results=2)

# --- 2. STATES & MODELS ---
class Analyst(BaseModel):
    affiliation: str; name: str; role: str; description: str

class Perspectives(BaseModel):
    analysts: List[Analyst]

class SearchQuery(BaseModel):
    query: str

class InterviewState(MessagesState):
    analyst: Analyst # Must match key in Send()
    context: Annotated[list, operator.add]
    sections: list

class ResearchState(TypedDict):
    topic: str
    max_analysts: int
    human_analyst_feedback: str
    analysts: List[Analyst]
    sections: Annotated[list, operator.add]
    final_report: str

# --- 3. SUB-GRAPH NODES ---
def ask_question(state: InterviewState):
    analyst = state["analyst"]
    sys_msg = f"You are {analyst.name}, {analyst.role}. Ask a targeted research question."
    return {"messages": [llm.invoke([SystemMessage(content=sys_msg)] + state["messages"])]}

def search_node(state: InterviewState):
    structured_llm = llm.with_structured_output(SearchQuery)
    res = structured_llm.invoke([SystemMessage(content="Generate a search query JSON.")] + state["messages"])
    search_data = search_tool.invoke(res.query)
    content = "\n\n".join([f"Source: {r['url']}\nContent: {r['content']}" for r in search_data.get('results', [])])
    return {"context": [content]}

def answer_node(state: InterviewState):
    context_str = "\n".join(state["context"])
    sys_msg = f"Answer using ONLY this context: {context_str}. Cite sources [1], [2]."
    res = llm.invoke([SystemMessage(content=sys_msg)] + state["messages"])
    res.name = "expert"
    return {"messages": [res]}

# --- 4. BUILD SUB-GRAPH ---
itv_builder = StateGraph(InterviewState)
itv_builder.add_node("ask", ask_question)
itv_builder.add_node("search", search_node)
itv_builder.add_node("answer", answer_node)
itv_builder.add_edge(START, "ask")
itv_builder.add_edge("ask", "search")
itv_builder.add_edge("search", "answer")
itv_builder.add_edge("answer", END)
interview_graph = itv_builder.compile()

# --- 5. ORCHESTRATOR NODES ---
def create_analysts(state: ResearchState):
    prompt = f"Create a team of {state['max_analysts']} analysts for: {state['topic']}. Return JSON."
    res = llm.with_structured_output(Perspectives).invoke([SystemMessage(content=prompt)])
    return {"analysts": res.analysts}

def initiate_interviews(state: ResearchState):
    # CRITICAL FIX: The key "analyst" here MUST match InterviewState
    return [
        Send("conduct_interview", {
            "analyst": a, 
            "messages": [HumanMessage(content=f"Begin research on: {state['topic']}")]
        }) for a in state["analysts"]
    ]

def compile_report(state: ResearchState):
    all_sections = "\n\n".join(state["sections"])
    res = llm.invoke(f"Combine into a Markdown report:\n\n{all_sections}")
    return {"final_report": res.content}

# --- 6. BUILD PARENT GRAPH ---
builder = StateGraph(ResearchState)
builder.add_node("create_analysts", create_analysts)
builder.add_node("human_feedback", lambda s: None)
builder.add_node("conduct_interview", interview_graph) 
builder.add_node("write_report", compile_report)

builder.add_edge(START, "create_analysts")
builder.add_edge("create_analysts", "human_feedback")
builder.add_conditional_edges("human_feedback", 
    lambda s: "conduct_interview" if s.get("human_analyst_feedback") == "OK" else "create_analysts",
    {"conduct_interview": "conduct_interview", "create_analysts": "create_analysts"})
builder.add_edge("conduct_interview", "write_report")
builder.add_edge("write_report", END)

graph = builder.compile(checkpointer=MemorySaver(), interrupt_before=["human_feedback"])

### 2: Execute Workflow

In [3]:
# Change the ID to 'Harris_Fresh_Start' to ignore the old failed state
config = {"configurable": {"thread_id": "Harris_Fresh_Start"}}

# 1. Start the process
initial_input = {"topic": "LangGraph best practices", "max_analysts": 3}
for event in graph.stream(initial_input, config, stream_mode="values"):
    if "analysts" in event:
        print(f"Analysts Created: {[a.name for a in event['analysts']]}")

# 2. Approve manually
graph.update_state(config, {"human_analyst_feedback": "OK"}, as_node="human_feedback")

# 3. Finalize
print("Processing parallel interviews with Qwen3...")
for event in graph.stream(None, config, stream_mode="updates"):
    print(f"Finished: {list(event.keys())[0]}")

display(Markdown(graph.get_state(config).values['final_report']))

ResponseError: model 'qwen3' not found (status code: 404)

### 3: Provide Approval and Finalize

In [None]:
# 1. Update the state to trigger the conditional edge
graph.update_state(config, {"human_analyst_feedback": "OK"}, as_node="human_feedback")

print("Processing parallel interviews and compiling report...")

# 2. Resume execution
# Because of the fix in initiate_interviews, the sub-graph will now find the 'analyst' key
for event in graph.stream(None, config, stream_mode="updates"):
    node_name = list(event.keys())[0]
    print(f"-- Node: {node_name} completed --")

# 3. Display final output
final_state = graph.get_state(config)
report = final_state.values.get('final_report')

if report:
    display(Markdown(report))
else:
    # If report is missing, check if sections were actually gathered
    print("Report missing. Sections gathered:", len(final_state.values.get('sections', [])))