
# Building a Powerful AI Research Agent with LangChain

This notebook demonstrates how to build an AI research agent using the LangChain framework. The agent will be able to:

1. **Plan Research:** Generate a structured research plan based on a user-provided topic.
2. **Generate Queries:** Create specific search queries for each section of the plan.
3. **Gather Information:** Use a web search tool (Tavily) to find relevant information.
4. **Analyze Data:** Perform basic data analysis (if applicable).
5. **Generate Report:** Compile the research into a well-formatted report.

We will break down the code into logical cells with explanations, allowing you to run each part step-by-step and understand the process.

---

### What are AI Agents?

AI agents are autonomous or semi-autonomous systems that can perceive their environment, make decisions, and take actions to achieve specific goals. LLM based AI agents take inspiration from traditional RL based AI Agents. They are built using large language models (LLMs) and can be equipped with tools to interact with the external world (e.g., search the web, access databases, or use APIs).

**Key Features of AI Agents:**

-   **Reasoning:** Agents can use LLMs to reason about a task, break it down into smaller steps, and decide on the best course of action.
-   **Tool Use:** Agents can be given access to tools, allowing them to perform actions beyond generating text (e.g., searching the web, analyzing data, interacting with APIs).
-   **Memory:** Agents can have memory to store information from previous interactions, enabling them to learn and adapt over time.
-   **Autonomy:** Agents can operate independently to a certain degree, making decisions and taking actions without constant human intervention.

**In this lab, we will build a research agent that demonstrates these capabilities.**

---

### The Research Agent's Workflow

Here's a diagram illustrating the workflow of our research agent:

```mermaid
graph TD
    A[User Input: Research Topic] --> B{Planning};
    B -- Research Plan (JSON) --> C{Query Generation};
    C -- Plan with Queries --> D{Information Gathering};
    D -- Gathered Data & Sources --> E{Report Generation};
    E -- Research Report (JSON) --> F[Output: Formatted Report];
    
    style B fill:#f9f,stroke:#333,stroke-width:2px
    style C fill:#ff9,stroke:#333,stroke-width:2px
    style D fill:#9ff,stroke:#333,stroke-width:2px
    style E fill:#f9f,stroke:#333,stroke-width:2px
    
    subgraph "Information Gathering (Simulated Parallel)"
        D --> D1[Search with Query 1];
        D --> D2[Search with Query 2];
        D --> D3[Search with Query 3];
        D1 --> D;
        D2 --> D;
        D3 --> D;
    end
```

**Step-by-Step Breakdown:**

1. **Planning:**
    
    -   The agent receives a research topic from the user.
    -   It uses the LLM to create a research plan in JSON format, outlining the report's title, sections, and placeholders for key findings and sources.
    -   Each section in the plan has a name, a description, and a flag (`research`) indicating whether web research is needed.
2. **Query Generation:**
    
    -   The agent examines the research plan.
    -   For each section marked with `research=True`, it generates 2-3 specific search queries designed to find relevant information from high-quality sources.
    -   These queries are added to the `queries` field of each section.
3. **Information Gathering (Simulated Parallel):**
    
    -   The agent iterates through the sections of the plan.
    -   For each section requiring research:
        -   It uses the `tavily_search_results_json` tool (which wraps the Tavily Search API) to perform web searches for each of the section's queries.
        -   To simulate parallel research, the agent will make multiple calls to the search tool within this step, one for each query. Although the calls are made sequentially, they are grouped together in this phase. Other agent-based libraries (LangGraph, PydanticAI, SmolAgents, etc. ) have functionality to directly do parallel research, but since we are using LangChain, it does not have that capability yet.
        -   The agent then compiles the search results into the `content` field of the section.
        -   It also keeps track of the sources used, storing them in the `sources` field of the plan.
4. **Report Generation:**
    
    -   The agent uses the LLM to generate the final research report in JSON format.
    -   For sections that required research, it uses the gathered content.
    -   For sections that didn't require research (e.g., Introduction, Conclusion), it synthesizes information from the other sections and writes in an appropriate style.
    -   The report includes the title, sections with content, a list of sources, and key findings.
5. **Output:**
    
    -   The JSON report is formatted and presented to the user.

---


**1: Import Libraries and Define Data Models**


-   **Import Statements:** We import the required modules:
    
    -   **`langchain.agents`:** For creating and managing AI agents.
    -   **`langchain_openai`:** For using OpenAI language models.
    -   **`langchain_core.prompts`:** For creating prompts to guide the language model.
    -   **`langchain_community.tools.tavily_search`:** For using the Tavily search tool.
    -   **`langchain_core.output_parsers`:** For parsing the language model's output into structured formats (like JSON).
    -   **`langchain_core.exceptions`:** For handling potential errors.
    -   **`pydantic`:** For creating data models with validation.
    -   **`typing`:** For type hinting.
    
-   **`ResearchReport` Class:** This Pydantic model defines the structure of the final research report.
    
    -   **`title`:** The main research question or title (string).
    -   **`sections`:** A list of dictionaries, where each dictionary represents a section and contains the section's title and content.
    -   **`sources`:** A list of URLs or identifiers for cited sources.
    -   **`key_findings`:** A list of key findings in bullet-point format.
    
    Using Pydantic models like this helps ensure that the generated report has the correct format and data types.
    
-   **`Section` Class:** This Pydantic model defines the structure of a single section within the research plan.
    
    -   **`name`:** The name or title of the section (string).
    -   **`description`:** A brief overview of the section's content (string).
    -   **`research`:** A boolean flag indicating whether web research is required for this section.
    -   **`queries`:** A list of search queries specifically tailored to this section (used if `research` is `True`).





In [1]:
import os
import json
from langchain.agents import AgentExecutor, create_openai_tools_agent, tool
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_core.output_parsers import StrOutputParser, JsonOutputParser
from langchain_core.exceptions import OutputParserException
from pydantic import BaseModel, Field, ValidationError
from typing import List, Dict

In [2]:
# ---- Structured Report Model ----
class ResearchReport(BaseModel):
    title: str = Field(..., description="Main research question/title")
    sections: List[Dict] = Field(..., description="Organized report sections. Each section should have a 'title' and 'content'")
    sources: List[str] = Field(..., description="Cited references (URLs or clear identifiers)")
    key_findings: List[str] = Field(..., description="Bullet-point summary of most important findings")

# ---- Section Definition ----
class Section(BaseModel):
    name: str = Field(..., description="Name of the section")
    description: str = Field(..., description="Brief overview of the section's content")
    research: bool = Field(..., description="Whether web research is required for this section")
    queries: List[str] = Field(default_factory=list, description="Search queries specific to this section, if research is needed. Max of 3 queries for each section.")

**2: Configure the Language Model**


-   We initialize the language model using `ChatOpenAI` from `langchain_openai`.
-   **`model="gpt-4o"`:** Specifies the OpenAI model to use. Here, we're using `gpt-4o`. You can change this to other models like `gpt-3.5-turbo` if needed, but `gpt-4o` generally provides better reasoning and output quality.
-   Replace **`os.environ["TAVILY_API_KEY"] = "<your-api-key>"` with tavily API and Replace `"sk-proj-..."` with your actual OpenAI service account API key.** You need an OpenAI account and an API key to use their models.
-   **`temperature=0.7`:** This parameter controls the randomness or creativity of the language model's output.
    -   A temperature of 0 makes the output deterministic and focused on the most likely responses.
    -   A temperature of 1 makes the output more creative and varied, but potentially less coherent.
    -   A value of 0.7 provides a good balance between creativity and coherence for many tasks.


In [3]:

os.environ["TAVILY_API_KEY"] = "<your-api-key>"

llm = ChatOpenAI(
    model="gpt-4o",
    api_key="sk-proj-..",  # Replace with your service account API key
    temperature=0.7,
)

**3: Define Research Tools**

-   **`SafeTavilySearchResults` Class:**
    
    -   This class inherits from `TavilySearchResults`, which is a LangChain tool for interacting with the Tavily Search API.
    -   The `_run` method is overridden to add error handling. It wraps the original `_run` method (which performs the actual search) in a `try-except` block.
    -   If any exception occurs during the search, it prints an error message and returns an empty list (`[]`) to prevent the agent from crashing. This makes the agent more robust to network issues or problems with the Tavily API.
-   **`search_tool`:**
    
    -   An instance of the `SafeTavilySearchResults` tool is created.
    -   `max_results=5` limits the number of search results returned to 5. You can adjust this number based on your needs.
-   **`data_analyzer` Function:**
    
    -   This function is decorated with `@tool`, which makes it available to the LangChain agent as a tool it can use.
    -   It takes a string `data` as input, which is expected to be in JSON format.
    -   **Data Parsing and Validation:** It first tries to parse the input `data` as JSON using `json.loads()`. If successful, it checks if the parsed data is a list of numbers (integers or floats).
    -   **Statistical Analysis:** If the data is a list of numbers, it calculates the mean and median. You can extend this part to perform more sophisticated statistical analysis (e.g., standard deviation, variance, percentiles) or use a dedicated statistical library like `NumPy` or `SciPy`.
    -   **Error Handling:** It handles two types of errors:
        -   `json.JSONDecodeError`: If the input data is not valid JSON, it returns an error message.
        -   `Exception`: It catches any other exceptions that might occur during data analysis and returns a generic error message.
    -   **Return Value:** The function returns a string summarizing the analysis or an error message if something went wrong.


In [4]:
# ---- Web Research Tool ----
class SafeTavilySearchResults(TavilySearchResults):
    def _run(self, query: str, **kwargs):
        try:
            return super()._run(query, **kwargs)
        except Exception as e:
            print(f"Error in Tavily Search: {e}")
            return []

search_tool = SafeTavilySearchResults(max_results=5)

@tool
def data_analyzer(data: str) -> str:
    """
    Analyze provided data using statistical methods and summarize findings.
    
    Args:
        data: A string containing data to analyze. Ideally in JSON or a structured format.
    
    Returns:
        A string summarizing the analysis and any key insights.
    """
    try:
        # Attempt to parse data as JSON
        data_json = json.loads(data)
        
        # Check if data is a list of numbers for basic statistical analysis
        if isinstance(data_json, list) and all(isinstance(item, (int, float)) for item in data_json):
            # Perform basic statistical analysis
            mean_value = sum(data_json) / len(data_json)
            median_value = sorted(data_json)[len(data_json) // 2]
            
            # Prepare a summary of the analysis
            summary = f"Data analysis summary:\n- Mean: {mean_value}\n- Median: {median_value}"
            
            # Add more statistical measures as needed
            
            return summary
        else:
            return "Data is not in a suitable format for statistical analysis. Provide a list of numbers."
            
    except json.JSONDecodeError:
        # If data is not JSON, handle as a plain string or other formats
        # Here you can implement other data analysis methods or return an error message
        return "Data is not in JSON format. Provide data in JSON format for analysis."
    except Exception as e:
        return f"An error occurred during data analysis: {e}"


**4: Define the Research Planning Prompt and Chain**


-   **`planning_prompt`:**
    
    -   This is a `ChatPromptTemplate` that defines the instructions for the language model when generating the research plan.
    -   **System Message:** The system message sets the role of the language model as a "senior research analyst" and provides detailed instructions on what the research plan should contain and its format (JSON). It specifies the fields for the overall plan (`title`, `key_findings`, `sources`) and for each section (`name`, `description`, `research`, `queries`).
    -   **Human Message:** The human message is simply `"{input}"`, which will be replaced with the user's research topic.
-   **`planning_chain`:**
    
    -   This chain combines the `planning_prompt`, the language model (`llm`), and a `JsonOutputParser`.
    -   **`planning_prompt | llm`:** This part sends the prompt to the language model to generate the research plan.
    -   **`| JsonOutputParser()`:** This part takes the output from the language model (which is expected to be a JSON string) and parses it into a Python dictionary using the `JsonOutputParser`. This makes it easier to work with the structured data.


In [5]:
# ---- Phase 1: Research Planning ----
planning_prompt = ChatPromptTemplate.from_messages([
    ("system", """You are a senior research analyst.
    
    Create a detailed research plan in JSON format for the given topic. 
    
    The plan should include:
    - title: (string) The main research question or title of the report.
    - key_findings: (list of strings) Placeholder for key findings (leave empty for now).
    - sources: (list of strings) Placeholder for sources (leave empty for now).
    - sections: (list of objects) An outline of the report's sections.
        - Each section should have:
            - name: (string) A concise title for the section.
            - description: (string) A brief overview of what will be covered in the section.
            - research: (boolean) Indicate whether this section requires web research (True) or not (False).
            - queries: (list of strings) Placeholder for section-specific search queries (leave empty for now).
    
    Example Sections:
    -   {{
            "name": "Introduction",
            "description": "Introduce the topic and provide background information.",
            "research": false,
            "queries": []
        }}
    -   {{
            "name": "Key Factors of Population Growth",
            "description": "Examine the main factors influencing population growth in urban areas.",
            "research": true,
            "queries": []
        }}
    -   {{
            "name": "Conclusion",
            "description": "Summarize the findings and provide concluding remarks.",
            "research": false,
            "queries": []
        }}
    
    Ensure the plan is comprehensive and well-structured, setting the stage for in-depth research."""),
    ("human", "{input}")
])

planning_chain = planning_prompt | llm | JsonOutputParser()


**5: Define the Query Generation Prompt and Chain**


-   **`query_generation_prompt`:**
    
    -   This `ChatPromptTemplate` defines the instructions for generating search queries.
    -   **System Message:** The system message sets the role of the language model as a "research assistant" and instructs it to generate 2-3 specific search queries for each section that has `research=True`. It emphasizes creating queries that are likely to yield high-quality results from reliable sources.
    -   **Human Message:** The human message provides the research plan (generated in the previous step) as context using `"Research Plan: {plan}"`.
-   **`QueryGenerationOutputParser`:**
    
    -   This is a Pydantic model that defines the expected structure of the output from the query generation chain.
    -   It expects a list of `Section` objects, where each `Section` now includes the generated `queries`.
-   **`query_generation_chain`:**
    
    -   This chain combines the `query_generation_prompt`, the language model (`llm`), and the `QueryGenerationOutputParser`.
    -   **`query_generation_prompt | llm`:** This sends the prompt and the research plan to the language model.
    -   **`.with_structured_output(QueryGenerationOutputParser)`:** This part tells the language model to generate output that conforms to the `QueryGenerationOutputParser` schema. This helps ensure that the output is structured correctly and can be easily parsed.


In [6]:
# ---- Phase 2: Query Generation (for sections requiring research) ----
query_generation_prompt = ChatPromptTemplate.from_messages([
    ("system", """You are a research assistant skilled in crafting precise search queries.
    
    For each section in the provided research plan that requires web research (research=True), generate 2-3 specific search queries that will help gather relevant information.
    
    Consider the section's name and description when creating queries. Aim for queries that target high-quality sources like academic papers, reputable news outlets, and official reports.
    
    Update the 'queries' field within each relevant section with the generated search queries."""),
    ("user", "Research Plan: {plan}")
])

# Modify output parser to handle the updated structure
class QueryGenerationOutputParser(BaseModel):
    sections: List[Section] = Field(..., description="List of sections with generated queries")

query_generation_chain = query_generation_prompt | llm.with_structured_output(QueryGenerationOutputParser)


**6: Define the Information Gathering Prompt**


-   **`research_prompt`:**
    
    -   This `ChatPromptTemplate` defines the instructions for the information gathering phase.
    -   **System Message:** The system message sets the role of the language model as a "diligent research assistant" and provides step-by-step instructions:
        1. **Focus on Sections with Research:** Emphasizes that only sections with `research=True` should be processed.
        2. **Use Provided Queries:** Instructs the agent to use the queries generated in the previous step.
        3. **Utilize Search Tool:** Explicitly tells the agent to use the `tavily_search_results_json` tool (which we defined earlier).
        4. **Compile Information:** Instructs the agent to gather the search results and compile them into the `content` field of each section.
        5. **Cite Sources:** Tells the agent to keep track of the sources and store them in the `sources` field of the research plan.
    -   **Human Message:**
        -   `"Research Plan with Queries: {plan_with_queries}"`: Provides the research plan (now with generated queries) as context.
        -   `MessagesPlaceholder(variable_name="agent_scratchpad")`: This is a placeholder for the agent's "scratchpad" – a space where the agent can keep track of its intermediate thoughts, actions, and observations during the execution process. It's a crucial part of how LangChain agents work.


In [7]:
# ---- Phase 3: Information Gathering (simulated parallel research) ----
research_prompt = ChatPromptTemplate.from_messages([
    ("system", """You are a diligent research assistant.

Your task is to gather information for each section in the research plan that requires web research (research=True).

1. **Focus on Sections with Research**: Only gather information for sections where 'research' is set to 'True'.
2. **Use Provided Queries**: For each of these sections, use the queries provided in the 'queries' field to conduct web searches.
3. **Utilize Search Tool**: Use the 'tavily_search_results_json' tool to perform web searches for each query.
4. **Compile Information**: Gather the information from the search results and compile it in the 'content' field for each respective section.
5. **Cite Sources**: Keep track of the sources used and store them in the 'sources' field of the research plan.

Remember to ignore sections that do not require research (research=False). Focus solely on sections where web research is marked as needed.

Current date: {current_date}"""),
    ("user", "Research Plan with Queries: {plan_with_queries}"),
    MessagesPlaceholder(variable_name="agent_scratchpad")
])


**7: Define the Report Generation Prompt**


-   **`report_prompt`:**
    
    -   This `ChatPromptTemplate` defines the instructions for generating the final research report.
    -   **System Message:**
        -   Sets the role of the language model as a "research report writer."
        -   Provides a detailed JSON structure that the report should follow, including `title`, `sections` (with `title` and `content`), `sources`, and `key_findings`.
        -   Gives specific instructions on how to handle different types of sections:
            -   For sections with `research=True`, use the content gathered during the information gathering phase.
            -   For sections with `research=False` (like Introduction and Conclusion), synthesize information from other sections and write in a style appropriate for that section type. It also suggests using Markdown formatting for clarity when needed.
        -   Emphasizes citing sources accurately and including key findings.
    -   **Human Message:**
        -   `"Research Question: {input}"`: Provides the original research question.
        -   `"Research Plan (with Content):\n{research_plan}"`: Provides the research plan, which now includes the gathered content in the `content` field of each section.



In [8]:
# ---- Phase 4: Report Generation ----
report_prompt = ChatPromptTemplate.from_messages([
    ("system", """You are a research report writer.
    
    Compile the gathered data and analysis into a structured JSON report.
    
    The report should follow this structure:
    {{
        "title": "Main research question/title",
        "sections": [
            {{"title": "Section Title 1", "content": "Content of section 1, including data analysis results"}},
            {{"title": "Section Title 2", "content": "Content of section 2, including data analysis results"}},
            ...
        ],
        "sources": ["Source URL 1", "Source URL 2", ...],
        "key_findings": ["Key finding 1", "Key finding 2", ...]
    }}
    
    Use the provided research plan as a guide.
    
    - For sections that required research (research=True), use the content you gathered and stored in the 'content' field.
    - For sections that did not require research (research=False), like the Introduction and Conclusion:
        - Synthesize information from the other sections.
        - Write in a style appropriate for that section type (e.g., narrative for Introduction, summary and comparison for Conclusion).
        - You can use Markdown formatting for tables or lists if it helps present information clearly (especially in the Conclusion).
    
    - Cite sources accurately using URLs or clear identifiers.
    - Include the key findings as a bullet-point list.
    - Write in a clear, concise, and objective style."""),
    ("user", "Research Question: {input}\n\nResearch Plan (with Content):\n{research_plan}\n\n")
])

**8: Create the Research Agent Class**



-   **`ResearchAgent` Class:** This class encapsulates the entire research agent, bringing together all the components we've defined so far.
    
    -   **`__init__`:**
        
        -   **`self.tools = [search_tool, data_analyzer]`:** Initializes the agent with the tools it can use: `search_tool` (for web research) and `data_analyzer` (for basic data analysis).
        -   **`self.memory = []`:** Creates a simple list-based memory to store the conversation history (we'll keep it basic in this example, but you could use more sophisticated memory mechanisms provided by LangChain).
        -   **`self.agent_prompt`:** Defines the main prompt that will be used to instruct the agent during the information gathering phase. It sets the agent's role as a "helpful research assistant," tells it to use tools, think step-by-step, and keep track of sources.
        -   **`self.agent = create_openai_tools_agent(...)`:** Creates the agent using `create_openai_tools_agent`. This function takes the language model (`llm`), the tools (`tools`), and the prompt (`agent_prompt`) as input and returns an agent object.
        -   **`self.executor = AgentExecutor(...)`:** Creates an `AgentExecutor`, which is responsible for running the agent and managing the conversation flow. It takes the agent, the tools, and several configuration options:
            -   **`verbose=True`:** Enables verbose logging, so you can see the agent's thought process and actions.
            -   **`handle_parsing_errors=True`:** Tells the executor to handle errors that might occur when parsing the agent's output.
            -   **`max_iterations=10`:** Limits the number of agent iterations to prevent infinite loops. You might need to adjust this based on the complexity of the task.
            -   **`return_intermediate_steps=True`:** Tells the executor to return the intermediate steps taken by the agent, which is useful for debugging and understanding the agent's reasoning process.
        
    -   **`_extract_sources`:**
        
        -   This is a helper function that extracts source URLs from the agent's intermediate steps.
        -   It iterates through the steps and looks for actions performed by the `tavily_search_results_json` tool.
        -   If it finds such an action, it extracts the URLs from the tool's output and adds them to a list of sources.
        
    -   **`execute_research`:**
        
        -   This is the main method of the `ResearchAgent` class. It orchestrates the entire research workflow.
        -   **Phase 1: Planning:** It invokes the `planning_chain` to generate the initial research plan based on the user's `query`.
        -   **Phase 2: Query Generation:** It invokes the `query_generation_chain` to generate search queries for sections that require research. The plan is updated with these queries.
        -   **Phase 3: Information Gathering:**
            -   It iterates through each section of the research plan.
            -   If a section requires research (`research=True`):
                -   It invokes the `executor` (which runs the agent) to gather information for that section using the generated queries.
                -   It extracts the content and sources from the agent's output and intermediate steps.
                -   It updates the section in the plan with the gathered content and sources.
            -   If a section does not require research (`research=False`), its content is left empty

In [9]:
# ==== 4. INTEGRATED RESEARCH AGENT ====
class ResearchAgent:
    def __init__(self):
        self.tools = [search_tool, data_analyzer]
        self.memory = []

        # Agent and Executor
        self.agent_prompt = ChatPromptTemplate.from_messages([
            ("system", """You are a helpful research assistant.
             Use the available tools to answer user queries as effectively as possible.
             Think step-by-step.
             Keep track of the sources you consult.
             If the user asks for data analysis, provide the data in JSON format to the data_analyzer tool."""),
            ("user", "{input}"),
            MessagesPlaceholder(variable_name="agent_scratchpad")
        ])
        
        self.agent = create_openai_tools_agent(
            llm=llm,
            tools=self.tools,
            prompt=self.agent_prompt
        )

        self.executor = AgentExecutor(
            agent=self.agent,
            tools=self.tools,
            verbose=True,
            handle_parsing_errors=True,
            max_iterations=10, # Increased for potentially more complex tasks
            return_intermediate_steps=True
        )
        
    def _extract_sources(self, steps):
        """Extract sources from agent actions, handling potential errors."""
        sources = []
        for step in steps:
            if step[0].tool == "tavily_search_results_json":
                try:
                    # Access the tool input directly as a dictionary
                    tool_input = step[0].tool_input
                    # Check if 'queries' key exists
                    if 'queries' in tool_input:
                        # step[1] is already parsed so no need to parse it again.
                        for result in step[1]:
                            if 'url' in result:
                                sources.append(result['url'])
                except (TypeError, KeyError) as e:
                    print(f"Error extracting sources: {e}")
        return sources

    def execute_research(self, query: str) -> ResearchReport:
        """End-to-end research workflow with error handling and validation"""
        try:
            # Phase 1: Planning
            plan = planning_chain.invoke({"input": query})
            self.memory.append({"role": "user", "content": f"Research Plan: {plan}"})
            
            # Phase 2: Query Generation
            plan_with_queries = query_generation_chain.invoke({"plan": plan})
            plan_with_queries = plan_with_queries.model_dump()
            sections = plan_with_queries['sections']

            # Update the plan with the sections that now contain queries
            plan['sections'] = sections

            self.memory.append({"role": "user", "content": f"Research Plan with Queries: {plan_with_queries}"})

            # Phase 3: Information Gathering
            # Create a new list to store updated sections
            updated_sections = []
            sources = []

            # Iterate through each section in the plan
            for section in plan["sections"]:
                updated_section = section.copy()  # Create a copy to modify

                # Check if the section requires research
                if section["research"]:
                    # Gather information for sections that require research
                    section_research_result = self.executor.invoke({
                        "input": f"Gather information for the section: {section['name']}. Use these queries: {', '.join(section['queries'])}",
                        "current_date": "2025-02-03",
                        "plan_with_queries": plan_with_queries,
                        "agent_scratchpad": [],
                    })

                    # Extract content and sources from the research result
                    section_content = section_research_result["output"]
                    intermediate_steps = section_research_result["intermediate_steps"]
                    section_sources = self._extract_sources(intermediate_steps)
                    
                    # Update the section with the gathered information and sources
                    updated_section["content"] = section_content
                    sources.extend(section_sources)
                else:
                    # For sections that do not require research, leave the content empty
                    updated_section["content"] = ""

                # Add the updated section to the list
                updated_sections.append(updated_section)

            # Update the plan with the new sections that have content
            plan["sections"] = updated_sections

            # Phase 4: Report Generation
            report_chain = report_prompt | llm.with_structured_output(ResearchReport)
            report = report_chain.invoke({
                "input": query,
                "research_plan": plan,
            })

            # Add sources to report
            report.sources = list(set(sources))  # Remove duplicates

            # Basic Validation (could be expanded)
            if not report.sections or not report.key_findings:
                raise ValueError("Report is missing key sections or findings.")
            
            self.memory.append({"role": "assistant", "content": f"Generated Report: {report.model_dump_json()}"})
            return report

        except (ValidationError, ValueError, OutputParserException) as e:
            print(f"Error during research execution: {e}")
            # Fallback: Return a basic report with the error message
            return ResearchReport(
                title=f"Error in Research: {query}",
                sections=[{"title": "Error", "content": str(e)}],
                sources=[],
                key_findings=[]
            )
        except Exception as e:
            print(f"An unexpected error occurred: {e}")
            return ResearchReport(
                title=f"Error in Research: {query}",
                sections=[{"title": "Unexpected Error", "content": str(e)}],
                sources=[],
                key_findings=[]
            )

**9: Execute the Research and Print the Report**


-   **`agent = ResearchAgent()`:** Creates an instance of the `ResearchAgent` class. This initializes the agent with its tools, prompt, and the language model.
-   **`report = agent.execute_research(...)`:** Calls the `execute_research` method of the `ResearchAgent` instance, passing in the research query as a string. This triggers the entire research workflow:
    1. **Planning:** The agent generates a research plan based on the query.
    2. **Query Generation:** The agent generates search queries for sections requiring research.
    3. **Information Gathering:** The agent uses the `tavily_search_results_json` tool to perform web searches and gather information.
    4. **Report Generation:** The agent compiles the gathered information into a structured JSON report.
-   **`print(...)`:** This code takes the generated `report` (which is a `ResearchReport` object) and prints it in a nicely formatted way:
    -   It prints the report's title.
    -   It prints the key findings, each on a new line.
    -   It iterates through the sections of the report and prints each section's title and content.
    -   It prints the cited sources, each on a new line.





In [14]:
# ==== 5. EXECUTION EXAMPLE ====
agent = ResearchAgent()
report = agent.execute_research(
    "Analyze population growth patterns in Paris vs Tokyo since 2020, "
    "including economic and environmental factors. "
    "Focus on data related to housing, employment, and pollution levels."
)




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `tavily_search_results_json` with `{'query': 'Paris population growth trends 2020 to 2023'}`


[0m[36;1m[1;3m[{'url': 'https://www.macrotrends.net/global-metrics/cities/20985/paris/population', 'content': 'United Nations population projections are also included through the year 2035. The current metro area population of Paris in 2025 is 11,347,000, a 0.62% increase from 2024. The metro area population of Paris in 2024 was 11,277,000, a 0.62% increase from 2023. The metro area population of Paris in 2023 was 11,208,000, a 0.59% increase from 2022.'}, {'url': 'https://www.macrotrends.net/global-metrics/countries/ROU/france/population-growth-rate', 'content': 'Chart and table of France population from 1950 to 2024. United Nations projections are also included through the year 2100. The current population of France in 2024 is 64,881,830, a 0.19% increase from 2023.; The population of France in 2023 was 64,756,584, a

In [15]:
from IPython.display import display, Markdown

def display_report_as_markdown(report: ResearchReport):
    """Displays the research report as Markdown in the notebook."""
    markdown_output = f"# {report.title}\n\n"
    markdown_output += f"## Key Findings\n"
    for finding in report.key_findings:
        markdown_output += f"- {finding}\n"
    markdown_output += "\n## Detailed Analysis\n"
    for section in report.sections:
        markdown_output += f"### {section['title']}\n{section['content']}\n\n"
    

    display(Markdown(markdown_output))

display_report_as_markdown(report)

# Analysis of Population Growth Patterns in Paris vs Tokyo Since 2020: Economic and Environmental Factors

## Key Findings
- Paris has experienced modest population growth since 2020, while Tokyo has seen a slight decline in overall population numbers despite attracting young professionals and students.
- Economic factors, including housing availability and employment opportunities, significantly influence population dynamics in both cities, with Tokyo facing housing shortages and Paris experiencing high housing costs.
- The housing market in Paris has seen a slowdown in transactions, with a focus on energy-efficient properties, while Tokyo's real estate market remains resilient with increasing property prices.
- Employment trends show a recovery in Paris post-pandemic with a decline in unemployment rates, whereas Tokyo's labor market remains stable with marginal increases in employment.
- Environmental factors, such as pollution and green initiatives, play a crucial role in urban development, with Paris facing higher pollution levels compared to Tokyo.
- Local government policies, including urban planning and economic incentives, are pivotal in managing population growth and sustainability in both cities.

## Detailed Analysis
### Demographic Trends
### Paris Population Growth Trends (2020-2023)
- The metro area population of Paris in 2023 was approximately 11,208,000, marking a 0.59% increase from 2022. In 2024, it was 11,277,000, a 0.62% rise from 2023. By 2025, the population is projected to be 11,347,000, maintaining a 0.62% growth rate from the previous year. [Source](https://www.macrotrends.net/global-metrics/cities/20985/paris/population)

### Tokyo Population Growth Trends (2020-2023)
- The Tokyo metropolitan area experienced a population decline over these years. In 2023, the population was approximately 37,194,000, reflecting a 0.21% decline from 2022. In 2024, the population is expected to be 37,115,000, continuing a similar decline. [Source](https://www.macrotrends.net/global-metrics/cities/21671/tokyo/population)

### Demographic Changes in Paris Post-2020
- Paris has seen a slowing of population growth, influenced by various factors including the impact of the COVID-19 pandemic. The demographic dynamics in Paris show changes in long-term trends, with a noted decline in population growth on a metropolitan scale. [Source](https://www.apur.org/en/our-works/demographic-and-social-dynamics-paris)

### Demographic Changes in Tokyo Post-2020
- Despite a general population decline, Tokyo has experienced a net population influx, recovering to pre-pandemic levels. This influx is driven by migration into the city, particularly among young people for work and studies. [Source](https://japantoday.com/category/national/net-population-influx-into-tokyo-recovers-to-pre-pandemic-level)

### Comparison of Population Growth in Paris vs Tokyo
- Paris is experiencing a modest population growth, whereas Tokyo is seeing a slight decline in population numbers. However, Tokyo continues to attract a high influx of residents, particularly young professionals and students, which contrasts with the overall population decline. The Greater Tokyo Area remains one of the most populous metropolitan areas globally, with a significant urban concentration. [Source](https://visitfranceguide.com/is-tokyo-bigger-than-paris/)

### Economic Factors
### Economic Factors Affecting Population Growth in Paris and Tokyo

1. **Tokyo**:
   - Tokyo's overpopulation is significantly influenced by rural-to-urban migration, natural population increase, and international migration. The consequences include housing shortages, infrastructure strain, and increased competition for resources, affecting the quality of life ([source](https://www.ncesc.com/geographic-pedia/why-is-tokyo-facing-overpopulation/)).
   - The Tokyo metropolitan area fits the rank size rule, indicating balanced urban distribution, unlike Paris, which is larger than expected for its rank. Human capital, indicated by wages, correlates with urban population dynamics ([source](https://www.sciencedirect.com/science/article/pii/S0166046297800051)).

2. **Paris**:
   - Paris is a significant global city with diverse tradable industries, but challenges include low SME participation in trade and a competitive global business environment ([source](https://www.brookings.edu/articles/global-paris-profiling-the-regions-international-competitiveness-and-connections/)).

### Impact of Economic Policies on Population Growth in Tokyo and Paris

1. **Tokyo**:
   - Economic policies addressing population growth focus on managing overconcentration by promoting regional development and improving housing affordability. The aging population affects labor input and economic growth ([source](https://www.japantimes.co.jp/news/2024/06/11/japan/tokyo-overconcentration/)).
   - The Greater Tokyo Area's GDP growth is moderate compared to other Japanese cities, influenced by its size and economic policies ([source](https://www.oxfordeconomics.com/resource/tokyo-hotspots-and-some-optimism-in-a-slow-growing-city/)).

2. **Paris**: 
   - Paris's economic policies focus on enhancing global competitiveness through trade, innovation, talent, infrastructure, and governance. The region's prosperity depends on effectively deploying these factors ([source](https://www.brookings.edu/articles/global-paris-profiling-the-regions-international-competitiveness-and-connections/)).

### Paris and Tokyo: Housing and Employment Influence on Population Growth

1. **Tokyo**:
   - Tokyo's housing demand is driven by demographic transitions, with challenges in managing rental housing demands due to finance-led growth and an aging population ([source](https://www.sciencedirect.com/science/article/pii/S0264837719306921)).
   - The inner city districts of Tokyo have seen housing and population growth post-1995, contrasting with earlier employment contractions ([source](https://journals.sagepub.com/doi/10.1111/j.1540-6040.2006.00181.x)).

2. **Paris**: 
   - Comparative studies on housing in major cities, including Paris, indicate that housing supply and policy affect urban growth and population dynamics ([source](https://www.jlgc.org.uk/en/international-cooperation/housing-in-four-world-cities-london-new-york-paris-and-tokyo/)).

### Housing Market Analysis
### Housing Market Trends (2020-2023)

#### Paris
1. **Market Trends**: Paris experienced a significant slowdown in real estate transactions in 2023, with a 22% decline compared to 2022. Despite this, Paris fared better than other French markets during the downturn. Prices of old properties in Paris slightly decreased in 2022. ([Paris Perfect](https://www.parisperfect.com/blog/2024/03/paris-real-estate-insights-from-2023-and-forecasts-for-2024/))
   
2. **Energy Efficiency**: There was a notable increase in the sale of energy-inefficient properties, with apartments rated F or G rising sharply in 2022 and continuing in 2023. ([Adrian Leeds](https://adrianleeds.com/subscribe-to-our-publications/french-property-insider/fpi-archives/the-real-estate-market/))

3. **Economic Factors**: The rise in building costs due to inflation and new environmental standards has put pressure on the housing market. ([Notaires de France](https://www.notaires.fr/en/housing-tax-system/french-property-market/french-property-market-analysis))

#### Tokyo
1. **Market Trends**: Tokyo's real estate market has shown resilience, with property prices on an upward trend. The vacancy rate for offices increased from 2020 to 2022, influencing market dynamics. ([Tokyo Portfolio](https://tokyoportfolio.com/japan-real-estate-market-prices-trends-forecasts-2023/))

2. **Investment and Prices**: The market is influenced by economic, demographic, and global factors, with property prices rising due to increased building material costs and foreign investments. ([Real Estate Tokyo](https://www.realestate-tokyo.com/news/market-trends-japan-2023/))

3. **Housing Supply**: 2023 saw a peak in the number of housing units, driven by various economic and demographic factors. ([Akiyaz.io](https://akiyaz.io/japan-housing-market-2023-report/))

### Affordability and Availability

#### Paris
1. **Social Housing Initiatives**: Paris aims to construct over 4,000 social and affordable housing units annually to address affordability issues. ([Dormakaba](https://blog.dormakaba.com/how-paris-plans-to-turbo-charge-affordable-housing/))

2. **Market Prices**: Average property prices in certain Paris neighborhoods offer relatively affordable options, with prices around €7,500 to €8,000 per square meter. ([Kurby](https://blog.kurby.ai/the-10-most-affordable-neighborhoods-in-paris-france-for-first-time-homebuyers/))

#### Tokyo
1. **Housing Affordability**: Tokyo remains affordable by rapidly increasing housing supply and maintaining flexible construction rules. Two full-time workers on minimum wage can afford rent in several wards. ([NY Times](https://www.nytimes.com/2023/09/11/opinion/editorials/tokyo-housing.html))

2. **Neighborhoods**: Some Tokyo neighborhoods offer affordable housing options, with prices around ¥600,000 per square meter being common in specific areas. ([Kurby](https://blog.kurby.ai/the-10-most-affordable-neighborhoods-in-tokyo-japan-for-first-time-homebuyers/))

### Impact of Housing Policies on Population Growth

#### Paris
1. **Population Dynamics**: High housing costs and a lack of rental properties have driven residents to the suburbs, leading to a population decline in Paris. ([RFI](https://www.rfi.fr/en/france/20250103-paris-population-drops-as-housing-costs-drive-residents-to-the-suburbs))

2. **Policy Influence**: The "Grand Paris Law" aims to build 70,000 housing units annually, expected to increase the population of Île-de-France significantly by 2035. ([APUR](https://www.apur.org/en/our-works/conjoined-evolution-housing-stock-and-population-greater-paris-ile-france))

#### Tokyo
1. **Urban Development**: Neoliberal housing policies in the 1990s, including deregulation and urban development, facilitated population recovery in central Tokyo. ([Springer](https://link.springer.com/chapter/10.1007/978-981-10-7799-9_7))

2. **Demographic Changes**: The aging population and low birth rates significantly impact Tokyo's housing market, leading to a surplus of housing in some areas. ([E-Housing](https://e-housing.jp/post/tokyo-residential-real-estate-market-analysis-2024-impact-of-low-birth-rate))

### Employment Trends
### Paris
1. **Employment Trends:**
   - Paris saw a 5% increase in employment numbers in 2022 compared to 2021, indicating a strong demand for workers post-pandemic ([London Property Alliance](https://www.londonpropertyalliance.com/press-release-paris-edging-ahead-of-london-as-economic-indicators-point-to-stronger-recovery/)).
   - The economic crisis and lockdowns in 2020 significantly impacted employment, leading to a decrease in jobs, especially in market services like tourism and culture ([Apur](https://www.apur.org/en/our-works/social-impact-crisis-paris-statistical-trends-and-field-work-feedback)).

2. **Job Growth & Unemployment Rates:**
   - The unemployment rate in Paris was around 6.9% in 2021, showing a decrease from previous years ([Statista](https://www.statista.com/statistics/1049920/unemployment-rate-paris-france/)).
   - In 2020, there was a significant loss of 45,700 jobs, with the market service sector being the hardest hit ([Apur](https://www.apur.org/en/our-works/impact-crisis-parisian-economy-2020)).

3. **Sectoral Employment Changes:**
   - The Paris region has seen a shift towards more independent jobs and a decline in salaried employment in recent years ([Institut Paris Region](https://en.institutparisregion.fr/resources/publications/dynamics-of-the-paris-region-economy/)).
   - There is a notable employment dynamic in sectors like renewable energy and local economies post-COVID ([Racetozero UNFCCC](https://racetozero.unfccc.int/wp-content/uploads/2021/06/The-Paris-Effect_SYSTEMIQ_Executive-Summary_December-2020.pdf)).

### Tokyo
1. **Employment Trends:**
   - Tokyo experienced a sharp drop in job availability in 2020 due to the pandemic, marking the biggest decline since 1975 ([Kyodo News](https://english.kyodonews.net/news/2021/01/e472be931d3d-update2-japans-2020-job-availability-logs-sharpest-drop-in-45-yrs.html)).
   - The unemployment rate was relatively low at about 2.5% in 2023, reflecting a stable labor market post-pandemic ([Statista](https://www.statista.com/statistics/1330451/japan-unemployment-rate-tokyo-prefecture/)).

2. **Job Growth & Unemployment Rates:**
   - Japan's unemployment rate hovered around 2.8% in 2020, with a gradual decline in subsequent years as the economy recovered ([Macrotrends](https://www.macrotrends.net/global-metrics/countries/JPN/japan/unemployment-rate)).
   - The pandemic led to an increase in redundancies, affecting the labor force significantly ([GlobalData](https://www.globaldata.com/data-insights/macroeconomic/the-unemployment-rate-of-japan-220055/)).

3. **Sectoral Employment Changes:**
   - The labor market in Tokyo showed a divide between regular and non-regular workers, with non-regular workers facing more precarious conditions ([Tandfonline](https://www.tandfonline.com/doi/full/10.1080/18692729.2022.2028229)).
   - There has been a noticeable shift in employment patterns, with an increased focus on maintaining long-term stable employment ([JIL](https://www.jil.go.jp/english/jli/documents/2024/Series_01.2017-046.2024.pdf)).

### Environmental Factors
### Environmental Impact on Population Growth

1. **General Insights:**
   - Population growth impacts environmental degradation, such as resource depletion and increased greenhouse gas emissions. Urban areas like Paris and Tokyo face challenges in infrastructure and environmental sustainability due to population density ([BiologyInsights](https://biologyinsights.com/global-population-dynamics-environmental-and-resource-impacts/)).

2. **Specific Effects:**
   - Population growth in urban areas leads to increased pollution and resource consumption, affecting biodiversity and contributing to climate change ([Enviroliteracy](https://enviroliteracy.org/how-does-population-growth-affect-the-environment/)).
   - Studies in Europe, including Paris, show that urban land use and CO2 emissions grow alongside population increases ([PMC](https://pmc.ncbi.nlm.nih.gov/articles/PMC6497702/)).

### Green Initiatives and Population Trends

1. **Tokyo:**
   - Tokyo's green initiatives include projects like "Tokyo Green Biz" aimed at sustainable urban living and enhancing green spaces. These initiatives seek to address social issues and climate adaptation ([Guidable](https://guidable.co/society/tokyos-green-initiatives-exploring-urban-nature-in-the-metropolis/), [Tokyo Metropolitan Government](https://www.metro.tokyo.lg.jp/english/media/factsheets/documents/en20240403_01_01.pdf)).

2. **Paris:**
   - Although specific details on Paris' initiatives were less highlighted, Paris, along with Tokyo, is part of a global movement towards urban development and sustainability ([CNN](https://sponsorcontent.cnn.com/int/tmg-green-biz/tokyo-evolution/)).

### Pollution and Population Growth Correlation

1. **Comparative Pollution Levels:**
   - Pollution indices show that Paris has a higher pollution index compared to Tokyo, with significant differences in air and water pollution levels ([Numbeo](https://www.numbeo.com/pollution/compare_cities.jsp?country1=Japan&city1=Tokyo&country2=France&city2=Paris)).

2. **Correlation Insights:**
   - The relationship between population growth and pollution is complex, influenced by consumption patterns and technological advancements. This relationship varies based on city size, with negligible correlations in large cities like Paris and Tokyo ([Enviroliteracy](https://enviroliteracy.org/how-does-population-growth-affect-pollution/), [ScienceDirect](https://www.sciencedirect.com/science/article/pii/S2212095524001317)).

### Pollution Levels
### Pollution Level Changes (2020 to 2023)
1. **Paris**:
   - The air quality in Paris has seen fluctuations, with a slight increase in NO2, PM10, and PM2.5 levels compared to 2020, a year heavily impacted by COVID-19 lockdowns which had temporarily reduced pollution levels. [Source](https://www.rfi.fr/en/france/20220405-the-level-of-air-pollution-is-falling-in-paris-but-it-s-still-too-high-research-shows).
   - In 2023, about 70% of the population in the Ile-de-France region was exposed to fine particle levels exceeding WHO recommendations. [Source](https://www.connexionfrance.com/news/13-stations-in-paris-above-recommended-pollution-levels-which-are-they/683883).

2. **Tokyo**:
   - There is limited specific data from 2020 to 2023, but historical data from 2005-2021 indicates variable trends in pollutants like NO2, influenced by urban activities and policy measures. [Source](https://airquality.gsfc.nasa.gov/no2/world/east-asia/tokyo).

### Impact of Pollution on Population Dynamics
1. **General Insights**:
   - Urban areas worldwide, including Paris and Tokyo, face challenges of increasing pollution levels due to population concentration. This affects infrastructure, housing, and environmental sustainability. [Source](https://biologyinsights.com/global-population-dynamics-environmental-and-resource-impacts/).
   - Exposure to pollution, especially air and noise pollution, affects a significant portion of urban populations, impacting health and quality of life. In Paris, nearly 80% of the population is affected. [Source](https://www.lemonde.fr/en/environment/article/2024/05/28/nearly-10-million-paris-region-residents-are-exposed-to-noise-and-air-pollution-exceeding-recommendations_6672938_114.html).

### Air Quality Trends and Population Growth
1. **Tokyo vs. Paris**:
   - Comparatively, Paris exhibits higher pollution indices than Tokyo, with significant concerns about air pollution despite efforts to manage it. [Source](https://www.numbeo.com/pollution/compare_cities.jsp?country1=Japan&city1=Tokyo&country2=France&city2=Paris).
   - Urban growth contributes to worsening air quality, although various factors including policy interventions and technological advancements can mediate these effects. Urban resilience is crucial to manage rapid changes in environmental quality. [Source](https://www.sciencedirect.com/science/article/pii/S0048969708005068).

### Policy Impact Analysis
### Impact of Urban Planning on Population Growth

1. **Paris**:
   - **Strategic Urban Planning**: Paris has undergone strategic urban planning to address post-war changes, including strong population growth and urban sprawl. Deficient infrastructure has been a key issue, prompting the need for developed plans and strategies to balance economic globalization with local needs, ecological sustainability, and urban density ([ResearchGate](https://www.researchgate.net/publication/373439770_Strategic_Urban_Planning_in_the_Paris_Metropolitan_Region_A_historic_overview_of_the_applied_instruments), [Institut Paris Region](https://en.institutparisregion.fr/know-how/urban-planning/cities-change-the-world/transforming-xxl-cities-strategies-and-projects/)).

2. **Tokyo**:
   - **Neoliberal Housing Policies**: In the 1990s, Tokyo's urban planning was shaped by neoliberal housing policies, which included deregulation of urban development and the mortgage market. These policies promoted private-oriented urban development, impacting condominium growth and population recovery in central Tokyo ([Springer](https://link.springer.com/chapter/10.1007/978-981-10-7799-9_7)).

### Government Policies Affecting Population Growth

1. **Tokyo**:
   - **Population Decline**: Japan's government faces challenges with population decline, projecting a significant reduction in the working-age population by 2070. Government policies have been aimed at addressing slow economic growth and encouraging population movement to rural areas ([Tokyo Foundation](https://www.tokyofoundation.org/research/detail.php?id=958), [Japan Times](https://www.japantimes.co.jp/news/2024/06/11/japan/tokyo-overconcentration/)).

2. **General**:
   - **Population Policies**: Various international policies have been implemented to control population growth, often through programs aimed at improving the quality of the population ([JSTOR](https://www.jstor.org/stable/2736579)).

### Economic Incentives and Environmental Regulations

1. **Paris**:
   - **Paris Agreement**: Economic incentives are crucial for achieving sustainability goals, as highlighted by the Paris Agreement, which calls for radical changes to meet environmental targets ([Brookings](https://www.brookings.edu/articles/global-economic-and-environmental-outcomes-of-the-paris-agreement/)).

2. **Tokyo**:
   - **Tax Incentives**: Japan offers tax incentives for companies expanding operations outside the Tokyo Metropolitan Area, aiming to strengthen local business facilities and promote regional economic growth ([JETRO](https://www.jetro.go.jp/en/invest/support_programs/incentive/)).



---

# Chatbot

Now that we have successfully created a research agent, lets try to integrate it into a chatbot.

The provided code snippet builds upon the research agent by integrating it into a conversational chatbot interface. The main goal is to allow users to interact with the research agent through natural language and receive structured research reports in a user-friendly format (Markdown). Here's a breakdown of the key components and how they contribute to the chatbot functionality:

**1. Refactored Research Agent Chains with `@chain`:**

- The code uses the `@chain` decorator from `langchain_core.runnables` to define the core components of the research agent as modular chains. This makes the code more readable and easier to understand. The individual chains (planning, query generation, research, report generation) are now clearly defined and can be tested/modified independently.

**2. Clearer Data Models:**

- Improved `ResearchReport` and `Section` models for structured data handling.

**3. LLM Configuration:**

- Separate `ChatOpenAI` instances are created for the chatbot and the underlying research agent (`chatbot_llm` and `research_llm`). This allows for different configurations (e.g., temperature, model) for the conversational interface versus the research tasks. Streaming is enabled for the chatbot LLM, making it more responsive.

**4. Safe Tools:**

- The custom `SafeTavilySearchResults` class wraps the `TavilySearchResults` tool to handle potential exceptions during web searches. This prevents the entire chatbot from crashing if the search tool encounters an error.

**5. Research Agent Chains:**

- **`planning_chain`:** Takes the user input and generates a research plan in JSON format, outlining the sections of the report, whether research is needed for each section, and initial descriptions.
- **`query_generation_chain`:** Takes the research plan as input and generates specific search queries for each section that requires web research.
- **`research_agent_executor`:** This is the core research execution component. It takes the plan with queries and orchestrates the web searches using the `search_tool` for the specific sections, extracts and summarizes the data from the search results, and creates a report structured with titles, descriptions, search data and URLs of the sources. It uses `create_openai_tools_agent` and `AgentExecutor`
- **`report_generation_chain`:** Takes all of the research results and produces a structured `ResearchReport` object ready for the user. It builds the title, section's content, source URLs and key findings.

**6. Markdown Report Formatting:**

- The `_format_report_as_markdown` function converts the structured `ResearchReport` into a Markdown format, making it easier to read and display in a chatbot interface.

**7. Chatbot Class:**

- The `Chatbot` class encapsulates the chatbot's logic.
    - It uses `ConversationBufferMemory` to store the conversation history, allowing the chatbot to maintain context across multiple turns.
    - The `chat` method is the core of the chatbot. It takes the user input, checks if research is required, invokes the research agent chains if necessary, and returns the final report as a Markdown string. If no research is needed, it simply answers the user's query.
    - The initial prompt instructs the LLM to respond with "[RESEARCH]" if research is requested. This avoids the need to parse the LLM's response for research intent.

**8. Main Chat Loop:**

- The main chat loop continuously prompts the user for input, sends the input to the chatbot, and prints the chatbot's response. It handles the "exit" command to terminate the program.

**Overall Flow:**

1.  **User Input:** The user enters a question or request.
2.  **Chatbot Response:** The chatbot determines if the question requires research by analyzing the response of an LLM call to the user input.
3.  **Research Agent (if needed):**
    - If research is required:
        - The `planning_chain` creates a research plan.
        - The `query_generation_chain` generates search queries for the sections needing research.
        - The `research_agent_executor` performs web searches using the tools.
        - The `report_generation_chain` compiles the search results into a structured report.
    - The report is formatted as Markdown.
4.  **Chatbot Output:** The chatbot displays the research report in Markdown format or a regular chat response, depending on the user request.




In [3]:
import os
import json
from langchain.agents import AgentExecutor, create_openai_tools_agent, tool
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_core.output_parsers import StrOutputParser, JsonOutputParser
from langchain_core.exceptions import OutputParserException
from pydantic import BaseModel, Field, ValidationError
from typing import List, Dict
from langchain.memory import ConversationBufferMemory
from IPython.display import display, Markdown
import sys
from langchain_core.runnables import chain  # Import @chain decorator



# ---- Data Models ----
class ResearchReport(BaseModel):
    title: str = Field(..., description="Main research question/title")
    sections: List[Dict] = Field(default_factory=list, description="Sections: title, content")
    sources: List[str] = Field(default_factory=list, description="Cited references (URLs)")
    key_findings: List[str] = Field(default_factory=list, description="Bullet-point summary")


class Section(BaseModel):
    name: str = Field(..., description="Section name")
    description: str = Field(..., description="Section overview")
    research: bool = Field(..., description="Requires research?")
    queries: List[str] = Field(default_factory=list, description="Search queries (max 3)")

os.environ["TAVILY_API_KEY"] = "tvly-"


# ---- LLMs (Chatbot and Research LLM) ----
chatbot_llm = ChatOpenAI(model="gpt-4o", api_key="sk-proj-", temperature=0.7, streaming=True)
research_llm = ChatOpenAI(model="gpt-4o", api_key="sk-proj-", temperature=0.5)


# ---- Tools ----
class SafeTavilySearchResults(TavilySearchResults):
    def _run(self, query: str, **kwargs):
        try:
            result = super()._run(query, **kwargs)
            return result
        except Exception as e:
            error_msg = f"Tavily error: {e}"
            
            return [error_msg]

search_tool = SafeTavilySearchResults(max_results=5)

@tool
def data_analyzer(data: str) -> str:
    """Analyze data (basic stats if JSON list of numbers)."""
    try:
        
        data_json = json.loads(data)
        if isinstance(data_json, list) and all(isinstance(item, (int, float)) for item in data_json):
            result = f"Data analysis:\n- Mean: {sum(data_json) / len(data_json)}\n- Median: {sorted(data_json)[len(data_json) // 2]}"
            
            return result
        return "Not a list of numbers."
    except Exception as e:
        error_msg = f"Error: {e}"
        
        return error_msg

# ---- Research Agent Functions (using @chain for clarity) ----
@chain
def planning_chain(user_input, llm=research_llm):
    
    prompt = ChatPromptTemplate.from_messages([
        ("system", """You are a senior research analyst. Your goal is to create a well-structured research plan to answer the user's question thoroughly.

        Create a detailed research plan in JSON format for the given topic.

        The plan should include:
        - title: (string) The main research question or title of the report.  This should directly reflect the user's input.
        - key_findings: (list of strings) Placeholder for key findings (leave empty for now).
        - sources: (list of strings) Placeholder for sources (leave empty for now).
        - sections: (list of objects) An outline of the report's sections.
            - Each section should have:
                - name: (string) A concise title for the section.
                - description: (string) A brief overview of what will be covered in the section. Be specific about the information you expect to find.
                - research: (boolean) Indicate whether this section requires web research (True) or not (False).  Base this on whether you need external information to complete the section.
                - queries: (list of strings) Placeholder for section-specific search queries (leave empty for now).

        Example Sections:
        -   {{
                "name": "Introduction",
                "description": "Introduce the topic and provide background information. Define key terms and concepts.",
                "research": false,
                "queries": []
            }}
        -   {{
                "name": "Key Factors",
                "description": "Examine the main factors influencing the topic. Provide evidence and examples for each factor.",
                "research": true,
                "queries": []
            }}
        -   {{
                "name": "Conclusion",
                "description": "Summarize the findings and provide concluding remarks.  Highlight any limitations or areas for further research.",
                "research": false,
                "queries": []
            }}

        Ensure the plan is comprehensive and well-structured, setting the stage for in-depth research. The sections should logically flow and cover all aspects of the user's question.
        """),
        ("human", "{input}")
    ])
    result = (prompt | llm | JsonOutputParser()).invoke({"input": user_input})
    
    return result

@chain
def query_generation_chain(plan, llm=research_llm):
    
    prompt = ChatPromptTemplate.from_messages([
        ("system", """You are an expert search query generator.  For each section in the research plan that requires web research (research=True), generate 2-3 highly specific and effective search queries.

        Consider the section's name and description when creating queries. Aim for queries that target high-quality, reliable sources such as:
        - Academic papers and research journals
        - Reputable news outlets and media organizations
        - Official reports and publications from government or international organizations
        - Expert blogs and industry publications

        The queries should be precise and focused, avoiding broad or ambiguous terms.  Think about keywords, synonyms, and related concepts to formulate the best possible queries.

        Update the 'queries' field within each relevant section with the generated search queries.  If a section does not require research (research=False), leave the 'queries' field empty.
        """),
        ("user", "Research Plan: {plan}")
    ])
    result = (prompt | llm.with_structured_output(QueryGenerationOutputParser)).invoke({"plan": plan})
    
    return result

class QueryGenerationOutputParser(BaseModel):
    sections: List[Section] = Field(..., description="Sections with queries")

@chain
def research_agent_executor(plan_with_queries, llm=research_llm):
    
    research_prompt = ChatPromptTemplate.from_messages([
        ("system", """You are a research assistant tasked with gathering information for specific sections of a research report.

        Follow these steps carefully:

        1. **Focus on Sections with Research:** Only gather information for sections where 'research' is set to 'True'. Ignore sections where 'research' is 'False'.
        2. **Use Provided Queries:** For each section that requires research, use the queries provided in the 'queries' field to conduct web searches.
        3. **Utilize Search Tool:** Use the 'tavily_search_results_json' tool to perform web searches for each query.
        4. **Synthesize Information:**  Carefully read the search results and extract the most relevant information that directly addresses the section's description.  Do not simply copy and paste content; synthesize the information and write a coherent summary.
        5. **Cite Sources:**  Keep track of the URLs of the sources you use and include them in the 'sources' list for each section.
        6. **Analyze Data:**  If you find data that requires analysis (e.g., statistics, numbers), use the 'data_analyzer' tool to perform basic statistical analysis and include the results in the section's content.
        7. **Be Concise:**  Focus on providing the most important information in a clear and concise manner.  Avoid unnecessary details or jargon.

        Remember to only gather information for sections that require research.  For sections that do not require research, the 'content' field should be left empty.
        """),
        ("user", "Research these queries: {queries}"),
        MessagesPlaceholder(variable_name="agent_scratchpad")
    ])
    agent = create_openai_tools_agent(llm, [search_tool, data_analyzer], research_prompt)
    executor = AgentExecutor(agent=agent, tools=[search_tool, data_analyzer], verbose=True, return_intermediate_steps=True)

    results = []
    for section in plan_with_queries['sections']:
        if section["research"]:
            
            result = executor.invoke({
                "section_name": section["name"],
                "queries": ", ".join(section["queries"]),
            })
            sources = [
                r['url'] for step in result['intermediate_steps']
                if step[0].tool == "tavily_search_results_json"
                for r in step[1] if isinstance(r, dict) and 'url' in r
            ]
            results.append({"section": section["name"], "content": result["output"], "sources": sources})
            
        else:
            results.append({"section": section["name"], "content": "", "sources": []})
    
    return results

@chain
def report_generation_chain(inp: dict, llm=research_llm):
    
    research_results = inp.get("research_results")
    user_input = inp.get("user_input")
    
    # Ensure research_results is in the correct format
    if isinstance(research_results, str):
        try:
            research_results = json.loads(research_results)
        except json.JSONDecodeError:
            research_results = [{"title": "Research Results", "content": research_results}]
    elif not isinstance(research_results, (list, dict)):
        research_results = [{"title": "Research Results", "content": str(research_results)}]
    
    # Convert to dictionary format if needed
    if isinstance(research_results, list):
        research_results = {"results": research_results}
    
    report_prompt = ChatPromptTemplate.from_messages([
        ("system", """You are a research report writer tasked with creating a comprehensive and well-structured report based on the research results provided.

        Your goal is to synthesize the information from the research results and create a coherent and informative report that answers the user's research question.

        Follow these guidelines:

        1. **Report Structure:** The report should be in JSON format with the following keys:
            - "title": (string) The research question/title.  This should be the same as the user's input.
            - "sections": (array of objects) An array of sections, each with a "title" and "content".  The sections should be organized logically and cover all aspects of the research question.
            - "sources": (array of strings) An array of cited URLs.  Include all sources used in the research.
            - "key_findings": (array of strings) An array of bullet-point summaries of the most important findings.  These should be concise and highlight the key takeaways from the research.

        2. **Section Content:**  For each section, synthesize the information from the research results and write a coherent and informative summary.  Use clear and concise language, and avoid jargon.  Cite your sources appropriately.

        3. **Key Findings:**  Identify the most important findings from the research and summarize them in a bullet-point list.  These should be the key takeaways from the report.

        4. **Introduction and Conclusion:**  Write a brief introduction that provides background information on the research question.  Write a conclusion that summarizes the findings and provides concluding remarks.

        5. **No Plagiarism:**  Do not simply copy and paste content from the research results.  Synthesize the information and write it in your own words.

        Ensure all keys are present in the output, even if the corresponding arrays are empty. The report should be well-organized, informative, and easy to understand.
        """),
        ("user", "Research Question: {user_input}\n\nResearch Results:\n{research_results}\n\n")
    ])
    result = (report_prompt | llm.with_structured_output(ResearchReport)).invoke({
        "user_input": user_input,
        "research_results": json.dumps(research_results) if not isinstance(research_results, str) else research_results
    })
    
    return result


def _format_report_as_markdown(report: ResearchReport) -> str:
    
    md = f"# {report.title}\n\n## Key Findings\n" + '\n'.join(f"- {f}" for f in report.key_findings)
    md += "\n\n## Detailed Analysis\n" + ''.join(f"### {s['title']}\n{s['content']}\n\n" for s in report.sections)
    md += "### Sources Cited\n" + '\n'.join(f"- {s}" for s in report.sources)
    return md

# ---- Chatbot ----
class Chatbot:
    def __init__(self):
        
        self.memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
        self.prompt = ChatPromptTemplate.from_messages([
            ("system", """You are a helpful AI assistant. Answer concisely. If the user requests research, respond *only* with '[RESEARCH]'. Do NOT answer the research question directly."""),
            MessagesPlaceholder(variable_name="chat_history"),
            ("user", "{input}"),
        ])
        self.chain = self.prompt | chatbot_llm
        

    def chat(self, user_input: str) -> str:
        try:
            
            response = self.chain.invoke({"input": user_input, "chat_history": self.memory.load_memory_variables({})['chat_history']})
            self.memory.save_context({"input": user_input}, {"output": response.content})

            if "[RESEARCH]" in response.content:
                
                plan = planning_chain.invoke(user_input)
                plan_with_queries = query_generation_chain.invoke(plan).model_dump()
                research_results = research_agent_executor.invoke(plan_with_queries)

                report_data = []
                all_sources = []
                for result in research_results:
                    section_name = result["section"]
                    content = result["content"]
                    sources = result["sources"]
                    all_sources.extend(sources)
                    for section in plan_with_queries["sections"]:
                        if section['name'] == section_name:
                            title = section['description']
                            break
                    else:
                        title = section_name

                    report_data.append({"title": title, "content": content})
                
                unique_sources = list(set(all_sources))
                
                report = report_generation_chain.invoke({
                    "research_results": report_data,
                    "user_input": user_input
                })
                
                report.sources = unique_sources
                markdown_report = _format_report_as_markdown(report)
                self.memory.save_context({"input": user_input}, {"output": markdown_report})
                
                return markdown_report

            
            return response.content

        except Exception as e:
            error_msg = f"Error: {type(e).__name__}: {e}"
            
            self.memory.save_context({"input": user_input}, {"output": error_msg})
            return error_msg

# ---- Main Chat Loop ----
chatbot = Chatbot()

print("Welcome to the Research Chatbot! Ask anything, or type 'exit'.")

while True:
    user_input = input("You: ")
    if user_input.lower() == 'exit':
        
        break

    
    response_stream = chatbot.chat(user_input)

    print("\nAI Assistant:", end="")
    if isinstance(response_stream, str):
        print(response_stream)
    else:
        for chunk in response_stream:
            print(chunk, end="", flush=True)

    print("\n\n" + "-" * 50 + "\n")

Welcome to the Research Chatbot! Ask anything, or type 'exit'.


You:  hi



AI Assistant:Hello! How can I assist you today?


--------------------------------------------------



You:  who are you



AI Assistant:I’m an AI assistant designed to help answer questions and provide information. How can I assist you today?


--------------------------------------------------



You:  Analyze population growth patterns in Paris vs Tokyo since 2020, including economic and environmental factors. Focus on data related to housing, employment, and pollution levels.




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `tavily_search_results_json` with `{'query': 'Paris Tokyo population growth trends 2020-2023 site:.edu OR site:.gov'}`


[0m[36;1m[1;3m[{'url': 'https://www.iese.edu/media/research/pdfs/ST-0649-E.pdf', 'content': 'Paris and Tokyo also stand out for their frequent pres- ence among the top 10 cities in various rankings. The. Japanese capital, for example, appears in'}, {'url': 'https://www.cia.gov/the-world-factbook/about/whats-new/', 'content': "The World Factbook is pleased to announce the addition of charts showing urban growth rate and total population growth rate over time for all of the world's"}, {'url': 'https://www.johnson.cornell.edu/wp-content/uploads/sites/3/2024/03/EMR_2023_Master_File_20230906_finalv2_lc-vd.pdf', 'content': 'Growth performance: recent trends. Following a significant rebound in 2021, with global growth surging to 6% following the COVID--induced crisis trough in.'}, {'url': 'https://ww

You:  exit


---

### Conclusion

Congratulations! You have now built a fully functional AI research agent using LangChain. This agent can plan research, generate search queries, gather information from the web, and compile a structured report.

**Further Exploration:**

-   **Experiment with different research topics:** Try running the agent with various queries to see how it performs.
-   **Customize the prompts:** Modify the prompts (planning, query generation, research, report generation) to fine-tune the agent's behavior and output quality.
-   **Add more tools:** Explore other LangChain tools and integrate them into your agent to enhance its capabilities (e.g., tools for interacting with databases, APIs, or other data sources).
-   **Implement more sophisticated memory:** Replace the simple list-based memory with a more advanced memory mechanism from LangChain to allow the agent to learn and improve over time.
-   **Explore different agent architectures:** LangChain offers various agent types beyond `create_openai_tools_agent`. Investigate other agent architectures like ReAct agents or conversational agents to see which one best suits your needs.
-   **Consider using LangGraph:** For more complex workflows and finer control over the agent's execution flow, explore LangGraph, which allows you to define state machines and manage agent operations using a graph-based approach.

By experimenting and building upon this foundation, you can create even more powerful and sophisticated AI agents for various tasks. Remember that the field of AI agents is rapidly evolving, so stay up-to-date with the latest developments in LangChain and the broader AI community.