## Reflexion-Inspired LLM Agent with LangGraph and Tavily

This notebook builds a multi-step reasoning agent using [LangGraph](https://github.com/langchain-ai/langgraph), inspired by the **Reflexion** framework:

**Reflexion: Language Agents with Verbal Reinforcement Learning**  
Yao et al. (2023) – [arXiv link](https://arxiv.org/abs/2303.11366)

The agent follows a 3-step reasoning loop:
1. **Answer** – Generate an initial response.
2. **Reflect** – Critique what’s missing or unnecessary.
3. **Revise** – Improve the answer using critique and web search.

Search is powered by [Tavily](https://www.tavily.com/), enabling the agent to query the web based on its own reflection.

> Also influenced by ideas from the LangGraph Udemy course: https://www.udemy.com/course/langgraph

At the end of each run, the final answer and total token usage are displayed.

### Answer and Reflection Schema

This cell defines structured Pydantic models for a multi-step Q&A workflow:

- **`Reflection`** captures what’s missing or unnecessary in an answer.
- **`AnswerQuestion`** includes a ~250-word response, a reflection, and 1–3 search queries.
- **`ReviseAnswer`** extends this with a list of supporting references.

These schemas enable structured parsing in LangChain or agent-based pipelines.

In [1]:
# Import necessary types and modules
from typing import List
from dotenv import load_dotenv  # Loads environment variables from a .env file
from pydantic import BaseModel, Field  # For data validation and schema enforcement

# Load environment variables (e.g., API keys)
load_dotenv()

# Define a schema for critique/reflection on a generated answer
class Reflection(BaseModel):
    missing: str = Field(description="Critique of what is missing.")  # Feedback on missing content
    superfluous: str = Field(description="Critique of what is superfluous")  # Feedback on unnecessary/excess detail

# Schema for answering a question with structured components
class AnswerQuestion(BaseModel):
    """Answer the question."""

    answer: str = Field(description="~250 word detailed answer to the question.")  # Main response text
    reflection: Reflection = Field(description="Your reflection on the initial answer.")  # Self-critique of the answer
    search_queries: List[str] = Field(
        description="1-3 search queries for researching improvements to address the critique of your current answer."
    )  # Optional tool input to help guide future revisions

# Schema for a revised answer that includes supporting citations
class ReviseAnswer(AnswerQuestion):
    """Revise your original answer to your question."""

    references: List[str] = Field(
        description="Citations motivating your updated answer."
    )  # List of reference URLs or sources used to support the revised answer

### Tool Configuration for Answer and Revision Agents

This cell sets up the language model, prompt templates, and tool bindings for the answer–reflect–revise workflow.

- **Model and Parsers**:
  - Initializes an OpenAI-compatible LLM.
  - Uses `JsonOutputToolsParser` for general tool output and `PydanticToolsParser` for structured `AnswerQuestion` responses.

- **Prompt Template**:
  - Defines a shared `actor_prompt_template` with instructions to answer, critique, and suggest search queries.
  - Includes a `MessagesPlaceholder` for chat history and injects the current time dynamically.

- **Tool Bindings**:
  - **`first_responder`**: Generates an initial answer and critique using the base prompt and `AnswerQuestion` tool.
  - **`revisor`**: Revises the answer using custom instructions and the `ReviseAnswer` tool.

These components enable structured, multi-step reasoning in a LangGraph pipeline.

In [2]:
# Import necessary LangChain classes and parsers
from langchain_core.messages import HumanMessage  # Used for user input in message history
from langchain_core.output_parsers.openai_tools import (
    JsonOutputToolsParser,     # Generic JSON output parser for OpenAI tool outputs
    PydanticToolsParser,       # Parser that converts tool output into Pydantic models
)
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder  # Used to define LLM prompt templates
from langchain_openai import ChatOpenAI  # OpenAI chat model wrapper for LangChain

from schemas import AnswerQuestion, ReviseAnswer  # Custom Pydantic schemas for tool outputs

# Initialize the LLM with a specific model (OpenAI-compatible)
llm = ChatOpenAI(model="o4-mini")

# Parser that returns tool outputs as JSON along with the tool name/id
parser = JsonOutputToolsParser(return_id=True)

# Parser that converts the tool output into structured Pydantic objects (like AnswerQuestion)
parser_pydantic = PydanticToolsParser(tools=[AnswerQuestion])

# Prompt template used by both first-responder and reviser agents
actor_prompt_template = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            """You are an expert researcher.
Current time: {time}

1. {first_instruction}
2. Reflect critically on your answer. What key points are missing? What content could be removed?
3. Recommend search queries to research information and improve your answer.""",
        ),
        MessagesPlaceholder(variable_name="messages"),  # Used to include previous chat history dynamically
        ("system", "Answer the user's question above using the required format."),  # Explicit instruction to format output properly
    ]
).partial(
    time=lambda: datetime.datetime.now().isoformat(),  # Injects current timestamp dynamically into the prompt
)

# First responder configuration: focuses on generating an initial answer
first_responder_prompt_template = actor_prompt_template.partial(
    first_instruction="Provide a detailed ~250 word answer."
)

# Chain that binds the first_responder prompt to the model and specifies tool output
first_responder = first_responder_prompt_template | llm.bind_tools(
    tools=[AnswerQuestion], tool_choice="AnswerQuestion"
)

# Instruction text for revision agent, guiding how to update the original answer
revise_instructions = """Revise your previous answer using the new information.
    - You should use the previous critique to add important information to your answer.
        - You MUST include numerical citations in your revised answer to ensure it can be verified.
        - Add a "References" section to the bottom of your answer (which does not count towards the word limit). In form of:
            - [1] https://example.com
            - [2] https://example.com
    - You should use the previous critique to remove superfluous information from your answer and make SURE it is not more than 250 words.
"""

# Revisor configuration: modifies the original answer using critique and new info
revisor = actor_prompt_template.partial(
    first_instruction=revise_instructions
) | llm.bind_tools(tools=[ReviseAnswer], tool_choice="ReviseAnswer")

### LangGraph Construction and Execution

This cell builds and runs a LangGraph agent for multi-step reasoning and answer refinement.

- **Graph Definition**:
  - Constructs a `MessageGraph` with three nodes:
    - **`draft`**: generates the initial answer using `first_responder`.
    - **`execute_tools`**: runs search queries via the tool executor.
    - **`revise`**: revises the answer using `revisor`.
  - Edges define the flow: `draft → execute_tools → revise`.
  - A conditional loop enforces a max of `MAX_ITERATIONS` through the revision cycle based on `ToolMessage` count.

- **Graph Execution**:
  - Accepts user input as a prompt.
  - Executes the graph and extracts the final answer (from tool output if present).
  - Prints both the final answer and token usage for monitoring model cost.

This ties together the model, prompts, and tools into a complete answer–reflect–revise pipeline.

In [6]:
from typing import List

# LangGraph and LangChain imports for message handling and agent graph definition
from langchain_core.messages import BaseMessage, ToolMessage, AIMessage
from langgraph.graph import END, MessageGraph

# Import the nodes (first_responder, revisor) and tool executor function from other modules
from chains import revisor, first_responder
from tool_executor import execute_tools

# Maximum number of tool-use iterations to prevent infinite loops
MAX_ITERATIONS = 2

# === Build the LangGraph ===
builder = MessageGraph()

# Add nodes (graph steps)
builder.add_node("draft", first_responder)          # Step 1: Generate initial answer
builder.add_node("execute_tools", execute_tools)    # Step 2: Run any tool calls
builder.add_node("revise", revisor)                 # Step 3: Revise the answer based on critique

# Define edges (step transitions)
builder.add_edge("draft", "execute_tools")          # After initial draft, go to tool execution
builder.add_edge("execute_tools", "revise")         # After tools run, go to revision

# Conditional loop: determines whether to stop or return to tool execution
def event_loop(state: List[BaseMessage]) -> str:
    # Count how many times tools have been used
    count_tool_visits = sum(isinstance(item, ToolMessage) for item in state)
    if count_tool_visits > MAX_ITERATIONS:
        return END  # Stop the loop if limit exceeded
    return "execute_tools"  # Otherwise, keep revising via tool execution

# Attach conditional edge from "revise" step using event_loop logic
builder.add_conditional_edges("revise", event_loop)

# Set the entry point for the graph
builder.set_entry_point("draft")

# Compile the graph into an executable object
graph = builder.compile()

# === User Input & Execution ===

# Get prompt from user
user_prompt = input("Enter your question: ")
print("Running reflion agent: ", user_prompt)

# Run the graph agent with user input
res = graph.invoke(user_prompt)

# === Parse final output ===

# Extract the final answer:
# If it's an AIMessage with a tool call, grab the tool's output argument
if isinstance(res[-1], AIMessage) and res[-1].tool_calls:
    final_answer = res[-1].tool_calls[0]["args"].get("answer", "")
else:
    # Otherwise, fall back to the message content directly
    final_answer = res[-1].content if hasattr(res[-1], "content") else str(res[-1])

# === Token usage reporting (optional) ===

# Try to extract token usage info from response metadata
token_usage = res[-1].response_metadata.get("token_usage", {}) if hasattr(res[-1], 'response_metadata') else {}
total_tokens = token_usage.get("total_tokens", "N/A")

# === Output results ===
print("\n=== Final Answer ===\n")
print(final_answer)

print(f"\n--- Token Usage ---\nTotal tokens: {total_tokens}")

Enter your question:   Write about using agentic approaches to solving tier 1 IT and HR issues and list startups that realm that have raised capital.


Running reflion agent:   Write about using agentic approaches to solving tier 1 IT and HR issues and list startups that realm that have raised capital.

=== Final Answer ===

Agentic AI—autonomous, goal-directed software agents powered by large language models—automates up to 70% of Tier 1 IT and HR tasks by orchestrating microservices, event-driven pipelines and continuous feedback loops[1]. In enterprise pilots, these agents boosted first-contact resolution from 30% to 65% and cut mean time to resolution by 45% versus legacy rule-based bots[2].

IT Use Cases: Agents triage tickets, reset credentials, provision accounts and diagnose routine network or software issues via integrations with ServiceNow, Jira and identity platforms (e.g., Okta). Confidence thresholds guide escalations to human engineers.

HR Use Cases: Agents answer benefits FAQs, schedule interviews, onboard new hires and process time-off or payroll inquiries by interfacing with Workday, Greenhouse and calendar APIs—deli