[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain-academy/blob/main/module-4/research-assistant.ipynb) [![Open in LangChain Academy](https://cdn.prod.website-files.com/65b8cd72835ceeacd4449a53/66e9eba12c7b7688aa3dbb5e_LCA-badge-green.svg)](https://academy.langchain.com/courses/take/intro-to-langgraph/lessons/58239974-lesson-4-research-assistant)

# Research Assistant

## Review

We've covered a few major LangGraph themes:

* Memory
* Human-in-the-loop
* Controllability

Now, we'll bring these ideas together to tackle one of AI's most popular applications: research automation. 

Research is often laborious work offloaded to analysts. AI has considerable potential to assist with this.

However, research demands customization: raw LLM outputs are often poorly suited for real-world decision-making workflows. 

Customized, AI-based [research and report generation](https://jxnl.co/writing/2024/06/05/predictions-for-the-future-of-rag/#reports-over-rag) workflows are a promising way to address this.

## Goal

Our goal is to build a lightweight, multi-agent system around chat models that customizes the research process.

`Source Selection` 
* Users can choose any set of input sources for their research.
  
`Planning` 
* Users provide a topic, and the system generates a team of AI analysts, each focusing on one sub-topic.
* `Human-in-the-loop` will be used to refine these sub-topics before research begins.
  
`LLM Utilization`
* Each analyst will conduct in-depth interviews with an expert AI using the selected sources.
* The interview will be a multi-turn conversation to extract detailed insights as shown in the [STORM](https://arxiv.org/abs/2402.14207) paper.
* These interviews will be captured in a using `sub-graphs` with their internal state. 
   
`Research Process`
* Experts will gather information to answer analyst questions in `parallel`.
* And all interviews will be conducted simultaneously through `map-reduce`.

`Output Format` 
* The gathered insights from each interview will be synthesized into a final report.
* We'll use customizable prompts for the report, allowing for a flexible output format. 

![Screenshot 2024-08-26 at 7.26.33 PM.png](https://cdn.prod.website-files.com/65b8cd72835ceeacd4449a53/66dbb164d61c93d48e604091_research-assistant1.png)

## Setup

We'll use [LangSmith](https://docs.langchain.com/langsmith/home) for [tracing](https://docs.langchain.com/langsmith/observability-concepts).

In [1]:
from dotenv import load_dotenv
load_dotenv()

True

## Core Logic and Graph Definitions

In [2]:
import operator
import json
from typing import List, Annotated, TypedDict
from pydantic import BaseModel, Field
from IPython.display import Image, display, Markdown

from langgraph.graph import START, END, StateGraph, MessagesState
from langgraph.checkpoint.memory import MemorySaver
from langgraph.types import Send
from langchain_ollama import ChatOllama
from langchain_core.messages import AIMessage, HumanMessage, SystemMessage
from langchain_tavily import TavilySearch

# --- 1. SETUP ---
llm = ChatOllama(model="qwen3:8b", temperature=0, format="json")
search_tool = TavilySearch(max_results=2)

class Analyst(BaseModel):
    affiliation: str; name: str; role: str; description: str

class Perspectives(BaseModel):
    analysts: List[Analyst]

class SearchQuery(BaseModel):
    query: str

class InterviewState(MessagesState):
    analyst: Analyst 
    context: Annotated[list, operator.add]
    sections: list

# --- 2. ROBUST NODES ---
def ask_question(state: InterviewState):
    analyst = state["analyst"]
    sys_msg = f"You are {analyst.name}, {analyst.role}. Ask a targeted research question."
    return {"messages": [llm.invoke([SystemMessage(content=sys_msg)] + state["messages"])]}

def search_node(state: InterviewState):
    sys_msg = SystemMessage(content="Output ONLY a JSON object with a 'query' key.")
    try:
        # Attempt structured output
        structured_llm = llm.with_structured_output(SearchQuery)
        res = structured_llm.invoke([sys_msg] + state["messages"])
        query_text = res.query
    except Exception:
        # Manual Fallback if JSON is messy
        res = llm.invoke([sys_msg] + state["messages"])
        clean = res.content.replace("```json", "").replace("```", "").strip()
        try:
            query_text = json.loads(clean)["query"]
        except:
            query_text = "LangGraph agentic patterns" # Last resort
    
    search_data = search_tool.invoke(query_text)
    content = "\n".join([f"Source: {r['url']}\nContent: {r['content']}" for r in search_data.get('results', [])])
    return {"context": [content]}

def answer_node(state: InterviewState):
    context_str = "\n".join(state["context"])
    sys_msg = f"Answer using ONLY this context: {context_str}. Be concise."
    res = llm.invoke([SystemMessage(content=sys_msg)] + state["messages"])
    return {"messages": [res], "sections": [res.content]}

# --- 3. BUILD SUB-GRAPH ---
itv_builder = StateGraph(InterviewState)
itv_builder.add_node("ask", ask_question); itv_builder.add_node("search", search_node); itv_builder.add_node("answer", answer_node)
itv_builder.add_edge(START, "ask"); itv_builder.add_edge("ask", "search"); itv_builder.add_edge("search", "answer"); itv_builder.add_edge("answer", END)
interview_graph = itv_builder.compile()

## Execute Workflow

In [3]:
class ResearchState(TypedDict):
    topic: str; max_analysts: int; human_analyst_feedback: str
    analysts: List[Analyst]; sections: Annotated[list, operator.add]; final_report: str

def create_analysts(state: ResearchState):
    prompt = f"Create a team of {state['max_analysts']} analysts for: {state['topic']}. Return JSON."
    try:
        res = llm.with_structured_output(Perspectives).invoke([SystemMessage(content=prompt)])
        return {"analysts": res.analysts}
    except Exception:
        # Fallback for analyst creation
        return {"analysts": [Analyst(name="Lead Researcher", role="Expert", affiliation="Local", description="Generalist")]}

def initiate_interviews(state: ResearchState):
    return [Send("conduct_interview", {"analyst": a, "messages": [HumanMessage(content=f"Topic: {state['topic']}")]}) for a in state["analysts"]]

def compile_report(state: ResearchState):
    all_sections = "\n\n".join(state["sections"])
    res = llm.invoke(f"Write a Markdown report based on these interviews:\n\n{all_sections}")
    return {"final_report": res.content}

builder = StateGraph(ResearchState)
builder.add_node("create_analysts", create_analysts); builder.add_node("human_feedback", lambda s: None)
builder.add_node("conduct_interview", interview_graph); builder.add_node("write_report", compile_report)

builder.add_edge(START, "create_analysts"); builder.add_edge("create_analysts", "human_feedback")
builder.add_conditional_edges("human_feedback", 
    lambda s: initiate_interviews(s) if s.get("human_analyst_feedback") == "OK" else "create_analysts",
    {"conduct_interview": "conduct_interview", "create_analysts": "create_analysts"})
builder.add_edge("conduct_interview", "write_report"); builder.add_edge("write_report", END)

memory = MemorySaver()
graph = builder.compile(checkpointer=memory, interrupt_before=["human_feedback"])

## Provide Approval and Finalize

In [4]:
# Unique ID to ensure we don't reload old broken states
config = {"configurable": {"thread_id": "Harris_Final_Stable_v1"}, "recursion_limit": 50}

# 1. Start
initial_input = {"topic": "Local LLM Parallelization with LangGraph", "max_analysts": 3}
for event in graph.stream(initial_input, config, stream_mode="values"):
    if "analysts" in event:
        print(f"Analysts Created: {[a.name for a in event['analysts']]}")

# 2. Update status for the gatekeeper
graph.update_state(config, {"human_analyst_feedback": "OK"}, as_node="human_feedback")

# 3. Resume (The Map-Reduce step)
print("\nProcessing interviews one by one (Safer for CPU)...")
for event in graph.stream(None, config, stream_mode="updates", max_concurrency=1):
    node_name = list(event.keys())[0]
    print(f"-- Completed Node: {node_name} --")

# 4. Show final output
final_state = graph.get_state(config)
if 'final_report' in final_state.values:
    display(Markdown(final_state.values['final_report']))
else:
    print("Report generation in progress or failed.")

Analysts Created: ['Dr. Elena Martinez', 'Dr. Raj Patel', 'Dr. Sarah Kim']

Processing interviews one by one (Safer for CPU)...
-- Completed Node: conduct_interview --
-- Completed Node: conduct_interview --
-- Completed Node: conduct_interview --
-- Completed Node: write_report --


{
  "report": {
    "title": "Optimizing Parallelization of Local LLM Workflows with LangGraph",
    "introduction": "LangGraph is a powerful framework for orchestrating and managing workflows in large language models (LLMs). As the demand for efficient and scalable LLM inference grows, optimizing parallelization becomes critical. This report explores how LangGraph can be leveraged to enhance the parallelization of local LLM workflows, addressing key factors such as task dependencies, resource allocation, and performance trade-offs. The analysis is based on insights from interviews with experts in the field.",
    "sections": [
      {
        "title": "1. Leveraging LangGraph for Parallelization of Local LLM Workflows",
        "content": "LangGraph's ability to model task dependencies as directed acyclic graphs (DAGs) enables efficient parallelization of LLM workflows. By explicitly defining dependencies between tasks, LangGraph ensures that tasks are executed in the correct order while maximizing concurrency. Key considerations include:\n\n- **Task Dependency Management**: LangGraph's stateful execution model allows for dynamic dependency resolution, ensuring that tasks are only executed when their prerequisites are met.\n- **Resource Allocation**: The framework supports dynamic resource allocation, enabling the system to scale compute resources (e.g., CPU/GPU) based on workload demands. This reduces idle time and optimizes throughput.\n- **Performance Trade-offs**: While parallelization improves throughput, it may introduce latency due to task scheduling overhead. LangGraph's prioritization of critical paths and caching mechanisms helps mitigate these trade-offs.\n\n**Recommendation**: Use LangGraph's DAG-based workflow management to balance task dependencies and resource allocation, while implementing caching and prioritization strategies to minimize latency."
      },
      {
        "":"",
        "content": "Optimizing the parallelization of local LLM inference tasks requires careful consideration of task granularity, resource contention, and throughput. LangGraph's modular architecture allows for fine-grained task decomposition, enabling parallel execution of independent subtasks. Key trade-offs include:\n\n- **Task Granularity**: Smaller tasks improve parallelism but may increase overhead due to task scheduling. LangGraph's ability to dynamically adjust granularity based on workload ensures optimal performance.\n- **Resource Contention**: Parallel execution can lead to resource contention (e.g., GPU memory, CPU cores). LangGraph's resource-aware scheduling and load balancing mechanisms help mitigate this by prioritizing high-priority tasks and distributing workloads evenly.\n- **Throughput Optimization**: Batch processing and pipelining techniques can be integrated into LangGraph workflows to maximize throughput while maintaining low latency.\n\n**Recommendation**: Implement dynamic task granularity and resource-aware scheduling in LangGraph to balance parallelism and resource utilization, while leveraging batch processing for throughput optimization."
      },
      {
        "title": "3. Optimizing LangGraph's Architecture for Parallelization Efficiency",
        "content": "To enhance the parallelization efficiency of local LLMs, LangGraph's architecture must be optimized for scalability, latency, and model accuracy. Key strategies include:\n\n- **Modular Architecture**: LangGraph's modular design allows for the integration of specialized components (e.g., model quantization, caching layers) that improve performance without sacrificing accuracy.\n- **Latency-Accuracy Trade-offs**: Techniques such as model pruning, quantization, and caching can reduce inference latency while maintaining acceptable accuracy. LangGraph's support for hybrid execution modes enables these optimizations.\n- **Distributed Execution**: For large-scale workflows, LangGraph can be extended to support distributed execution across multiple nodes, balancing computational load and minimizing bottlenecks.\n\n**Recommendation**: Optimize LangGraph's architecture by integrating model-specific optimizations (e.g., quantization) and extending it for distributed execution to balance latency, accuracy, and resource utilization."
      }
    ],
    "conclusion": "LangGraph offers a robust foundation for optimizing the parallelization of local LLM workflows. By addressing task dependencies, resource allocation, and performance trade-offs, LangGraph can significantly enhance throughput and reduce latency. Future work should focus on extending LangGraph's capabilities for distributed execution and integrating advanced model optimization techniques to further improve efficiency.",
    "references": [
      "LangGraph Documentation",
      "LLM Parallelization Best Practices",
      "Workflow Management Systems for AI"
    ]
  }
}

## Professional Formatted Report

In [8]:
# 1. Extract the report from the final state of the graph
final_state = graph.get_state(config)
report_raw = final_state.values.get('final_report', '')

# 2. Try to parse and format the JSON report
try:
    # Handle the case where the LLM wrapped it in Markdown code blocks
    clean_json = report_raw.replace("```json", "").replace("```", "").strip()
    data = json.loads(clean_json)
    
    # Check if 'report' is the top-level key (as seen in your output)
    content = data.get('report', data)
    
    markdown_out = f"# {content.get('title', 'Research Report')}\n\n"
    markdown_out += f"## Introduction\n{content.get('introduction', '')}\n\n"
    
    for section in content.get('sections', []):
        title = section.get('title') or "Analysis"
        markdown_out += f"### {title}\n{section.get('content', '')}\n\n"
        
    markdown_out += f"## Conclusion\n{content.get('conclusion', '')}\n\n"
    
    if 'references' in content:
        markdown_out += "## References\n* " + "\n* ".join(content['references'])
    
    display(Markdown(markdown_out))

except Exception as e:
    # Fallback: Just print the raw text if JSON parsing fails
    print("Parsing failed or report already in Markdown. Displaying raw output:")
    display(Markdown(report_raw))

# Optimizing Parallelization of Local LLM Workflows with LangGraph

## Introduction
LangGraph is a powerful framework for orchestrating and managing workflows in large language models (LLMs). As the demand for efficient and scalable LLM inference grows, optimizing parallelization becomes critical. This report explores how LangGraph can be leveraged to enhance the parallelization of local LLM workflows, addressing key factors such as task dependencies, resource allocation, and performance trade-offs. The analysis is based on insights from interviews with experts in the field.

### 1. Leveraging LangGraph for Parallelization of Local LLM Workflows
LangGraph's ability to model task dependencies as directed acyclic graphs (DAGs) enables efficient parallelization of LLM workflows. By explicitly defining dependencies between tasks, LangGraph ensures that tasks are executed in the correct order while maximizing concurrency. Key considerations include:

- **Task Dependency Management**: LangGraph's stateful execution model allows for dynamic dependency resolution, ensuring that tasks are only executed when their prerequisites are met.
- **Resource Allocation**: The framework supports dynamic resource allocation, enabling the system to scale compute resources (e.g., CPU/GPU) based on workload demands. This reduces idle time and optimizes throughput.
- **Performance Trade-offs**: While parallelization improves throughput, it may introduce latency due to task scheduling overhead. LangGraph's prioritization of critical paths and caching mechanisms helps mitigate these trade-offs.

**Recommendation**: Use LangGraph's DAG-based workflow management to balance task dependencies and resource allocation, while implementing caching and prioritization strategies to minimize latency.

### Analysis
Optimizing the parallelization of local LLM inference tasks requires careful consideration of task granularity, resource contention, and throughput. LangGraph's modular architecture allows for fine-grained task decomposition, enabling parallel execution of independent subtasks. Key trade-offs include:

- **Task Granularity**: Smaller tasks improve parallelism but may increase overhead due to task scheduling. LangGraph's ability to dynamically adjust granularity based on workload ensures optimal performance.
- **Resource Contention**: Parallel execution can lead to resource contention (e.g., GPU memory, CPU cores). LangGraph's resource-aware scheduling and load balancing mechanisms help mitigate this by prioritizing high-priority tasks and distributing workloads evenly.
- **Throughput Optimization**: Batch processing and pipelining techniques can be integrated into LangGraph workflows to maximize throughput while maintaining low latency.

**Recommendation**: Implement dynamic task granularity and resource-aware scheduling in LangGraph to balance parallelism and resource utilization, while leveraging batch processing for throughput optimization.

### 3. Optimizing LangGraph's Architecture for Parallelization Efficiency
To enhance the parallelization efficiency of local LLMs, LangGraph's architecture must be optimized for scalability, latency, and model accuracy. Key strategies include:

- **Modular Architecture**: LangGraph's modular design allows for the integration of specialized components (e.g., model quantization, caching layers) that improve performance without sacrificing accuracy.
- **Latency-Accuracy Trade-offs**: Techniques such as model pruning, quantization, and caching can reduce inference latency while maintaining acceptable accuracy. LangGraph's support for hybrid execution modes enables these optimizations.
- **Distributed Execution**: For large-scale workflows, LangGraph can be extended to support distributed execution across multiple nodes, balancing computational load and minimizing bottlenecks.

**Recommendation**: Optimize LangGraph's architecture by integrating model-specific optimizations (e.g., quantization) and extending it for distributed execution to balance latency, accuracy, and resource utilization.

## Conclusion
LangGraph offers a robust foundation for optimizing the parallelization of local LLM workflows. By addressing task dependencies, resource allocation, and performance trade-offs, LangGraph can significantly enhance throughput and reduce latency. Future work should focus on extending LangGraph's capabilities for distributed execution and integrating advanced model optimization techniques to further improve efficiency.

## References
* LangGraph Documentation
* LLM Parallelization Best Practices
* Workflow Management Systems for AI