<a href="https://colab.research.google.com/github/nischay1100/OpenDeepResearcher/blob/main/Open_deep_researcher_task2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## 1 — Setup & Installation

1(a) — Install required libraries

In [None]:
# Install necessary Python libraries for the project
!pip install langgraph google-generativeai tavily-python


Collecting langgraph
  Downloading langgraph-0.6.10-py3-none-any.whl.metadata (6.8 kB)
Collecting tavily-python
  Downloading tavily_python-0.7.12-py3-none-any.whl.metadata (7.5 kB)
Collecting langgraph-checkpoint<3.0.0,>=2.1.0 (from langgraph)
  Downloading langgraph_checkpoint-2.1.2-py3-none-any.whl.metadata (4.2 kB)
Collecting langgraph-prebuilt<0.7.0,>=0.6.0 (from langgraph)
  Downloading langgraph_prebuilt-0.6.4-py3-none-any.whl.metadata (4.5 kB)
Collecting langgraph-sdk<0.3.0,>=0.2.2 (from langgraph)
  Downloading langgraph_sdk-0.2.9-py3-none-any.whl.metadata (1.5 kB)
Collecting ormsgpack>=1.10.0 (from langgraph-checkpoint<3.0.0,>=2.1.0->langgraph)
  Downloading ormsgpack-1.11.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.2 kB)
Downloading langgraph-0.6.10-py3-none-any.whl (155 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m155.4/155.4 kB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading tavily_python-0.7.12-py3-none-any.wh

1(b) — API Keys Setup

In [None]:
# Import libraries for API clients and environment
import os
from tavily import TavilyClient
import google.generativeai as genai
from getpass import getpass

# Take API keys from user input (hidden)
tavily_key = getpass("Enter your TAVILY API Key: ")
gemini_key = getpass("Enter your GEMINI API Key: ")

# Set API keys as environment variables
os.environ["TAVILY_API_KEY"] = tavily_key
os.environ["GEMINI_API_KEY"] = gemini_key

# Initialize clients
tavily_client = TavilyClient(api_key=os.environ["TAVILY_API_KEY"])
genai.configure(api_key=os.environ["GEMINI_API_KEY"])


Enter your TAVILY API Key: ··········
Enter your GEMINI API Key: ··········


## 2 — Define Research State

2(a) — ResearchState class

In [None]:
from typing import List, Optional
from pydantic import BaseModel

# This class keeps track of the research session
class ResearchState(BaseModel):
    query: str  # User's original query
    clarification_needed: bool = False  # Does the query need clarification?
    follow_up_questions: Optional[List[str]] = None  # Questions for clarification
    clarified_query: Optional[str] = None  # Query after clarification
    research_brief: Optional[dict] = None  # Structured research brief
    search_results: Optional[List[dict]] = None  # Results from web search
    summary: Optional[str] = None  # Final research summary

    # 🆕 Added for Reflection Tool
    reflection_feedback: Optional[str] = None  # AI feedback before/after reflection
    reflection_runs: int = 0  # How many times Tavily ran during reflection


## 3 — Utilities

3(a) — Helper functions

In [None]:
# Estimate number of tokens roughly for LLM
def estimate_tokens(text: str) -> int:
    return int(len(text.split()) * 1.3)

# Print JSON nicely for readability
def safe_print_json(data):
    import json
    print(json.dumps(data, indent=2, ensure_ascii=False))


## 4 — Clarification Agent (Task 1)

4(a) — Function

In [None]:
def clarification_agent(state: ResearchState) -> ResearchState:
    """
    Check if user's query is clear.
    If not, ask max 2 follow-up questions.
    """
    model = genai.GenerativeModel("gemini-2.5-flash")

    prompt = f"""
You are a helpful research assistant.
Given this user query:

"{state.query}"

Decide if clarification is needed.
If yes, ask **max 2 clear and specific follow-up questions**.
If not, confirm it's sufficient.

Respond in JSON:
{{
  "clarification_needed": true/false,
  "follow_up_questions": [..] or null
}}
"""
    response = model.generate_content(prompt)
    try:
        parsed = response.text.strip()
        import json, re

        # Extract JSON safely
        json_match = re.search(r"\{.*\}", parsed, re.DOTALL)
        if json_match:
            parsed_json = json.loads(json_match.group())
            state.clarification_needed = parsed_json.get("clarification_needed", False)
            state.follow_up_questions = parsed_json.get("follow_up_questions")
        else:
            state.clarification_needed = False
            state.follow_up_questions = None
    except Exception:
        state.clarification_needed = False
        state.follow_up_questions = None

    return state


## 5 — Research Brief Agent (Task 2)

5(a) — Function

In [None]:
def research_brief_agent(state: ResearchState) -> ResearchState:
    """
    Generate a structured research brief based on clarified query.
    """
    model = genai.GenerativeModel("gemini-2.5-flash")

    prompt = f"""
You are an AI research planner. The user has clarified their topic as:

"{state.clarified_query}"

Create a structured research brief including:
- Research Objective
- Key Questions or Subtopics
- Relevant Domains
- Suggested Data or Sources
- Expected Output

Respond ONLY in JSON:
{{
  "objective": "",
  "key_questions": [],
  "domains": [],
  "suggested_sources": [],
  "expected_output": ""
}}
"""
    response = model.generate_content(prompt)
    import json, re
    parsed = response.text.strip()
    match = re.search(r"\{.*\}", parsed, re.DOTALL)
    if match:
        try:
            state.research_brief = json.loads(match.group())
        except Exception:
            state.research_brief = {"objective": state.clarified_query}
    else:
        state.research_brief = {"objective": state.clarified_query}

    return state


## 6 — Query Generator

6(a) — Function

In [None]:
# If user clarified query exists, use it. Otherwise use original query
def query_generator(state: ResearchState) -> ResearchState:
    if not state.clarified_query:
        state.clarified_query = state.query
    return state


## 7 — Research Pipeline

7(a) — Function

In [None]:
def research_pipeline(state: ResearchState) -> ResearchState:
    """
    If query needs research (keywords present), search online via Tavily.
    Then summarize results using Gemini LLM.
    """
    keywords = ["latest", "statistics", "research", "compare", "trends", "report"]
    if any(kw in state.clarified_query.lower() for kw in keywords):
        print("🔎 Running Tavily search...")
        search = tavily_client.search(state.clarified_query, max_results=3)
        state.search_results = search.get("results", [])
    else:
        print("⚡ Skipping web search (not needed).")
        state.search_results = []

    model = genai.GenerativeModel("gemini-2.5-flash")
    context = ""
    if state.search_results:
        for r in state.search_results:
            context += f"- {r.get('title','')} :: {r.get('content','')[:200]}\n"

    summary_prompt = f"""
User query: {state.clarified_query}

Context (may be empty):
{context}

Provide a clear, agent-like research summary.
"""
    response = model.generate_content(summary_prompt)
    state.summary = response.text.strip()
    return state


7(b) — Reflection Agent

In [None]:
def reflection_agent(state: ResearchState) -> ResearchState:
    """
    Evaluates the first summary and decides if it’s complete.
    If not, re-runs Tavily up to 2 more times to improve the result.
    Returns updated state with improved summary and reflection feedback.
    """
    model = genai.GenerativeModel("gemini-2.5-flash")
    reflection_feedback = ""
    reflection_runs = 0  # count how many times Tavily re-runs

    # Step 1: Ask Gemini to critique the first summary
    critique_prompt = f"""
You are a reflection evaluator.
Here is the user's clarified query:
"{state.clarified_query}"

And here is the research summary generated earlier:
"{state.summary}"

Critically evaluate this summary.
Say whether it is clear, complete, and accurate.
If it lacks depth or certainty, say it needs re-research.
Respond in JSON:
{{
  "needs_reflection": true/false,
  "feedback": "string explaining the decision"
}}
"""
    critique_response = model.generate_content(critique_prompt)
    import json, re
    parsed = critique_response.text.strip()
    match = re.search(r"\{.*\}", parsed, re.DOTALL)

    needs_reflection = False
    if match:
        try:
            data = json.loads(match.group())
            needs_reflection = data.get("needs_reflection", False)
            reflection_feedback = data.get("feedback", "")
        except Exception:
            reflection_feedback = "Reflection JSON parse error."

    # Step 2: If Gemini says it needs improvement, re-run Tavily up to 2 more times
    if needs_reflection:
        new_context = ""
        for i in range(2):  # max 2 additional searches
            reflection_runs += 1
            print(f"🔁 Reflection pass {i+1}: running Tavily again...")
            search = tavily_client.search(state.clarified_query, max_results=3)
            results = search.get("results", [])
            for r in results:
                new_context += f"- {r.get('title','')} :: {r.get('content','')[:200]}\n"

        # Step 3: Combine old + new context and re-summarize
        combined_context = new_context
        reflection_prompt = f"""
User query: {state.clarified_query}

Context from reflection searches:
{combined_context}

Write an improved, well-rounded, and up-to-date research summary.
"""
        improved_response = model.generate_content(reflection_prompt)
        state.summary = improved_response.text.strip()
    else:
        reflection_feedback += " (No reflection needed.)"

    # Save reflection details in state (optional)
    state.reflection_feedback = reflection_feedback
    state.reflection_runs = reflection_runs
    return state


## 8 — LangGraph Workflow

8(a) — Define workflow

In [None]:
from langgraph.graph import StateGraph, END

workflow = StateGraph(ResearchState)

workflow.add_node("clarification", clarification_agent)
workflow.add_node("brief_agent", research_brief_agent)   # added before pipeline
workflow.add_node("query_gen", query_generator)
workflow.add_node("pipeline", research_pipeline)

workflow.set_entry_point("clarification")
workflow.add_edge("clarification", "brief_agent")
workflow.add_edge("brief_agent", "query_gen")
workflow.add_edge("query_gen", "pipeline")
workflow.add_edge("pipeline", END)

graph = workflow.compile()


## 9 — Chatbot Loop

9(a) — Function

In [None]:
#Chatbot Loop (with Reflection)

def chatbot():
    print("🤖 Research Agent Ready (with Reflection Tool)\n")
    while True:
        query = input("You: ").strip()
        if query.lower() in ["exit", "quit"]:
            print("👋 Ending session.")
            break

        state = ResearchState(query=query)

        # Step 1: Clarification
        state = clarification_agent(state)
        if state.clarification_needed and state.follow_up_questions:
            print("\n🤖 I need a bit more info:")
            for q in state.follow_up_questions:
                print(" -", q)
            ans = input("\nYour clarification: ").strip()
            state.clarified_query = f"{state.query} | Clarified: {ans}"
        else:
            state.clarified_query = state.query

        # Step 2: Research Brief
        state = research_brief_agent(state)
        # (brief generation)
        # safe_print_json(state.research_brief)

        # Step 3: Query Generation
        state = query_generator(state)

        # Step 4: First Research Run
        state = research_pipeline(state)

        print("\n🧩 Before Reflection:")
        print(state.summary)

        # Step 5: Reflection Agent
        print("\n💭 Running Reflection Tool...")
        state = reflection_agent(state)

        print("\n💬 Reflection Feedback:")
        print(state.reflection_feedback)

        print("\n✅ After Reflection:")
        print(state.summary)

        print(f"\n🔁 Tavily ran {state.reflection_runs} extra time(s) during reflection.\n")
        print("-------------------------------------------------------\n")
chatbot()

🤖 Research Agent Ready (with Reflection Tool)

You: Compare the 2025 EV market share between Tata, Mahindra, and Tesla in India.

🤖 I need a bit more info:
 - Are you interested in the overall EV market share (including both passenger and commercial vehicles), or specifically the passenger EV market share?

Your clarification: Passenger cars, based on unit sales volume.
🔎 Running Tavily search...

🧩 Before Reflection:
Based on the provided context, here's a comparison of the 2025 EV passenger car market share between Tata, Mahindra, and Tesla in India:

*   **Dominance of Tata, Mahindra (and MG):** For **May 2025**, Tata, MG, and Mahindra **together accounted for 87.3%** of the total Indian EV passenger car market, which stood at 12,197 units for that month. This indicates that Tata and Mahindra are significant players, holding a vast majority of the market share alongside MG Motor India.
*   **Individual Shares (Not Specified):** While their combined share is high, the provided contex