# Stage 4: Hybrid Search & The ReAct Loop

## Introduction

In Stage 3, we made significant progress in context management around retrieved context. We built an intent-driven architecture that dynamically selected between hierarchical data views (summaries vs. full details), assembled context using progressive disclosure, and implemented a quality validation loop that automatically retried with different strategies when context was insufficient. We even introduced tool calling where the LLM could decide *when* to search for courses.

This was effective for managing context when queries fit our predefined categories. But we were still pre-programming every path through the system. We decided which intents existed, which search strategies to use for each intent, and what fallback rules to follow when context was insufficient.

Real production agents work differently. Think about how ChatGPT or Claude handle queries. They reason about what context they need and how to retrieve it. They don't necessarily follow pre-programmed paths; they make dynamic decisions based on the specific query in front of them. In these types of systems the job shifts from programming all the paths to providing the tools and guidance agents need to make smart context decisions themselves.

This shift becomes critical when we add conversation memory in the next stage (Stage 5). Memory introduces fundamental ambiguity our stage 3 agent simply cannot handle. Consider the following query: 

*"What are the prerequisites for that course?"* 

There are core challenges with a query like this for our current system. Which course? The one from three messages ago? The last course code mentioned? A course implied by context? 

Pre-programmed logic would need to account for every possible reference pattern. A reasoning agent can analyze the conversation history and determine what "that course" means in context.

To build agents that reason about their context needs, we need a new cognitive architecture: the ReAct pattern (Reasoning + Acting). This allows the agent to:
1. Analyze the user's request
2. Decide what context is missing
3. Retrieve that specific context using tools
4. Evaluate if the context is sufficient, and if not, loop back to get more

To implement this, we need two components working together:

**1. The ReAct agent architecture**

The reasoning engine that replaces our Stage 3 pipeline. The agent will think through what context it needs, retrieve it, evaluate if it's sufficient, and loop until it has everything required to answer confidently. Its structure will look like this:

```mermaid
graph TD
    Q[Query] --> RA[ReAct Agent]

    subgraph ReAct Loop
        RA --> T1[üí≠ Thought: What context do I need?]
        T1 --> A1[üîß Action: Retrieve Context]
        A1 --> O1[üëÅÔ∏è Observation: Review Data]
        O1 --> T2[üí≠ Thought: Is this enough?]
        T2 --> |No| A1
        T2 --> |Yes| F[‚úÖ FINISH]
    end

    F --> R[Response]
```

**2. A hybrid search tool**

In addition to the new architecture, we're also going to provide one more improvement to the way we search for the right context. 

In stage 3's hierarchical context assembly (summaries vs. full details, progressive disclosure), we solved the problem of *how much* information to return and *how to structure it*. That was context engineering for presentation.

In this stage we're going to implement the ability to perform hybrid search.

Hybrid search adds the ability to combine semantic search with exact filtering:
- Semantic search alone: "Find courses similar to 'CS101'" (might return CS101, but also other intro courses)
- Exact filtering: "Find the course with course_code='CS101'" (guaranteed to return only CS101)
- Hybrid: "Find advanced courses about machine learning" (semantic search for 'machine learning' + filter for difficulty_level='advanced')

The key difference between these approaches is intentional precision vs. semantic luck. Our hybrid search tool exposes both capabilities, allowing a reasoning agent to choose the right retrieval strategy based on the query's needs.

This will become critical when:
- The user asks for something specific ("Tell me about CS101" - why search 5 similar courses when you can filter for exactly CS101?)
- The user needs combined criteria ("Most advanced CS course" - semantic search + exact difficulty filter)
- Multi-step reasoning requires exact references ("What are the prerequisites for that course?" - need to filter for the exact course code from context)

Together, these components create an agent that can reason about both *what* context it needs and *how* to retrieve it.

Let's start building.


## Setup

First, like before, let's set up our environment and import the stage 3 agent. Run the code block below.

In [None]:
# This code sets up the notebook to be able to access the provided OpenAI API Key and access to the agent code
import sys
import os
from pathlib import Path
from dotenv import load_dotenv

project_root = Path("..").resolve()

stage4_path = project_root / "progressive_agents" / "stage4_hybrid_search_react"
src_path = project_root / "src"

load_dotenv(project_root / ".env")

sys.path.insert(0, str(src_path))
sys.path.insert(0, str(stage4_path))

from agent import setup_agent, create_workflow

print("Initializing Stage 4 Agent...")
course_manager, _ = await setup_agent(auto_load_courses=True)
workflow = create_workflow(course_manager)

print("‚úÖ Agent is ready!")
print("‚úÖ Course manager initialized")

## Part 1: Building the Hybrid Search Tool

In this part, we're going to build out the hybrid search capability.

In Stage 3, you built a `search_courses_tool` that used a pre-built helper function (`search_courses_sync`) to handle retrieval. Under the hood, that helper used RedisVL's [`VectorQuery`](https://docs.redisvl.com/en/latest/api/query.html#vectorquery) class to perform semantic/vector search.

Now we're upgrading to hybrid search by working directly with a underlying `course_manager.search_courses()` method and leveraging more of RedisVL. In addition to `VectorQuery` for KNN search, RedisVL also provides the [`FilterQuery`](https://docs.redisvl.com/en/latest/api/query.html#filterquery) class for field-level filters. 

Our hybrid implementation combines both: we use vector search to retrieve semantically similar candidates, then apply filtering to guarantee exact matches when needed. This gives us the speed of vector search with the precision of exact matching. (Note: RedisVL also offers a [`HybridQuery`](https://docs.redisvl.com/en/latest/api/query.html#hybridquery) class for combining vector + full-text search, but our use case is simpler since we only need "vector search OR exact field matching" rather than "vector + text ranking".)

We'll create a `search_courses_hybrid` function with an optional `course_code` parameter. When provided, the function switches to exact mode‚Äîsearching a larger set and filtering for matches. When `None`, it uses semantic search (the Stage 3 behavior). In Part 2, the ReAct agent will reason about which mode to use based on the query.

Let's implement it.

### üìå Task 1: Implement Hybrid Search

Your task is to complete the `search_courses_hybrid` function below to support both semantic and exact search modes.

The function should check whether a `course_code` parameter is provided. If it is, perform exact matching by searching a larger set of courses and filtering to only those matching the code. If not, perform a semantic search and return the top results directly.

Both modes should return results as JSON strings, and any errors should be caught and returned as error messages.

<details>
<summary>üõ†Ô∏è Show Implementation Details</summary>
<br>

Step 1: **Search with higher limit (exact match mode)**

Use the async method `.search_courses(query, limit=10)` on `course_manager` to fetch more results. This increases the chance of finding the target course code among the semantic results.

Step 2: **Convert Pydantic models to dicts (exact match mode)**

Use a list comprehension: `[r.model_dump(mode='json') for r in results]`. This converts the Pydantic Course objects into JSON-serializable dictionaries.

Step 3: **Filter for course_code match**

Filter the results to find courses where the `course_code` field matches your target. Use case-insensitive comparison: `[r for r in results_data if course_code.lower() in r['course_code'].lower()]`

Step 4: **Search with smaller limit (semantic mode)**

Use the async method `.search_courses(query, limit=3)`. For semantic search, you only need the top few matches.

Step 5: **Convert and return as JSON (semantic mode)**

Same conversion pattern: `[r.model_dump(mode='json') for r in results]`. The `json.dumps()` return statement is already provided.

</details>

In [None]:
import json

async def search_courses_hybrid(query: str, course_code: str = None) -> str:
    """
    Search for courses using hybrid search (semantic + exact filter).
    
    Args:
        query: The search text (e.g., "machine learning")
        course_code: Optional exact course code to filter by (e.g., "CS002")
    
    Returns:
        JSON string of matching courses
    """
    try:
        if course_code:
            # EXACT MATCH MODE
            print(f"üîç Tool: Executing EXACT match for code='{course_code}'")
            
            # TODO - Step 1: Search with higher limit to ensure we find the match
            results = None
            
            # TODO - Step 2: Convert Pydantic models to dicts using .model_dump(mode='json')
            results_data = None
            
            # TODO - Step 3: Filter results where course_code matches (case-insensitive)
            filtered = None
            
            if filtered:
                return json.dumps(filtered, indent=2)
            else:
                return f"No courses found with code {course_code}"
        
        else:
            # SEMANTIC SEARCH MODE
            print(f"üîç Tool: Executing SEMANTIC search for query='{query}'")
            
            # TODO - Step 4: Search with smaller limit for top semantic matches
            results = None
            
            # TODO - Step 5: Convert Pydantic models to dicts and return as JSON
            results_data = None
            return json.dumps(results_data, indent=2)
            
    except Exception as e:
        return f"Error searching courses: {str(e)}"

print("‚úÖ Hybrid search tool defined")

<details>
<summary>üóùÔ∏è Solution code</summary>
<br>
    
```python

import json

async def search_courses_hybrid(query: str, course_code: str = None) -> str:
    """
    Search for courses using hybrid search (semantic + exact filter).
    
    Args:
        query: The search text (e.g., "machine learning")
        course_code: Optional exact course code to filter by (e.g., "CS002")
    
    Returns:
        JSON string of matching courses
    """
    try:
        # If we have a specific course code, use exact matching mode
        if course_code:
            print(f"üîç Tool: Executing EXACT match for code='{course_code}'")
            
            # Fetch more results to ensure we find the match
            results = await course_manager.search_courses(query, limit=10)
            
            # Convert Pydantic models to dicts
            results_data = [r.model_dump(mode='json') for r in results]
            
            # Filter for the specific course code
            filtered = [r for r in results_data if course_code.lower() in r['course_code'].lower()]
            
            if filtered:
                return json.dumps(filtered, indent=2)
            else:
                return f"No courses found with code {course_code}"
        
        # Otherwise, use semantic search mode
        else:
            print(f"üîç Tool: Executing SEMANTIC search for query='{query}'")
            results = await course_manager.search_courses(query, limit=3)
            
            # Convert Pydantic models to dicts
            results_data = [r.model_dump(mode='json') for r in results]
            return json.dumps(results_data, indent=2)
            
    except Exception as e:
        return f"Error searching courses: {str(e)}"

print("‚úÖ Hybrid search tool defined")
```

</details>

### Test the implementation

Run the test utility below to verify your hybrid search tool works correctly.

In [None]:
# Import the test utility
from test_hybrid_search import test_hybrid_search_tool

# Test your implementation
await test_hybrid_search_tool(search_courses_hybrid)

### Connect the tool to the agent

Before we build the ReAct loop, let's register our hybrid search function in a `tools_map` dictionary. The ReAct agent will use this mapping to look up and execute tools by name.

In [None]:
# Create a mapping of tool names to functions
tools_map = {
    "search_courses_hybrid": search_courses_hybrid
}

print("‚úÖ Tool registered! The ReAct agent can now access search_courses_hybrid.")

## Part 2: The ReAct Agent Architecture

Now that we have a hybrid search tool that can retrieve context with precision, we can build an agent that reasons about how to use it.

This reasoning happens in a loop structure. The agent alternates between three phases:

1. **Thought**: The agent reasons about what it needs to do next. "I don't have enough information yet. I should search for courses about machine learning."

2. **Action**: The agent decides to either call a tool or provide a final answer. "I'll use search_courses with query='machine learning' and no course_code (semantic search)."

3. **Observation**: The agent sees the result of its action. "I received 3 courses about machine learning. Now I have enough context to answer the user's question."

The loop continues until the agent decides it has sufficient information to answer. Within a single query, each iteration builds on the previous and the agent sees its own thoughts and observations in the message history, allowing it to build up context progressively.

In order to implement the architecture, we'll need the following components:

1. **A System Prompt**: Similar to how we designed a system prompt before, we'll need instructions that tell the LLM how to format its thoughts and actions. This defines the ReAct protocol the agent will follow.

2. **The Parser**: A function that extracts structured decisions from the LLM's free-text output. It identifies what action the agent wants to take and what inputs to provide.

3. **The Loop**: A `react_agent_node` function that orchestrates the cycle. It calls the LLM, parses the response, executes tools, and feeds observations back into the conversation.

We'll provide the prompt and parser for you. Your job is to implement the loop logic that brings them together.

Let's start by examining the system prompt and parser, then you'll build the loop.

### The ReAct System Prompt and Parser

Before implementing the ReAct loop, you need to understand the two components that make it work:

1. **REACT_SYSTEM_PROMPT**: Tells the LLM how to think and format its output
2. **parse_react_output()**: Extracts the Action and Action Input from LLM responses

We've provided these implementations for you in the dropdown sections below. Review them carefully

<details>
<summary>üîç Open to explore the ReAct System Prompt</summary>
<br>
    
The abreviated prompt below defines the ReAct protocol and includes:
- Multiple search strategies (exact_match, hybrid, semantic_only)
- Intent classification (GENERAL, PREREQUISITES, SYLLABUS_OBJECTIVES, ASSIGNMENTS)
- Detailed examples for different query types
- Guidance on handling empty results vs. errors
- Instructions to avoid unnecessary re-searching

You can also view the complete implementation in the agent directory for [react_prompts.py](../progressive_agents/stage4_hybrid_search_react/agent/react_prompts.py)

```python

REACT_SYSTEM_PROMPT = """You are a helpful Redis University course advisor assistant.

You have access to ONE tool:

**search_courses_hybrid** - Search the Redis University course catalog with hybrid search
   Parameters:
   - query (str): Search query
   - intent (str): GENERAL, PREREQUISITES, SYLLABUS_OBJECTIVES, or ASSIGNMENTS
   - search_strategy (str): "exact_match", "hybrid", or "semantic_only"
   - course_codes (list): Specific course codes to search for (use for exact matches)
   - information_type (list): What info to retrieve (e.g., ["prerequisites", "syllabus"])
   - departments (list): Filter by department

You must use the following format:

Thought: [Your reasoning about what to do next]
Action: [One of: search_courses or FINISH]
Action Input: [Valid JSON with the required parameters]

You will receive:
Observation: [Result of the action]

Then you continue with another Thought/Action/Observation cycle.

When you have enough information to answer the user's question, use:
Thought: I have enough information to provide a complete answer
Action: FINISH
Action Input: [Your final answer to the user]

IMPORTANT GUIDELINES:
- Always start with a Thought explaining your reasoning
- Only use ONE Action per turn
- Action Input must be valid JSON matching the tool's parameters
- Use "exact_match" strategy when the user mentions specific course codes
- Use "hybrid" strategy for topic-based searches
- Use FINISH when you're ready to provide the final answer

INTERPRETING SEARCH RESULTS:
- If a search returns course data with an empty field (e.g., "prerequisites": []), 
  that means the field has NO VALUE - not that the search failed
- Empty prerequisites [] means "no prerequisites required" - this IS a valid answer
- Only retry a search if you get an actual error or no courses are found
- Do NOT keep searching with different strategies when you already have the course data

[... additional examples and guidance ...]
"""
```

</details>

<details>
<summary>üîç Open to explore the ReAct Parser</summary>
<br>
    
The core parser function extracts structured data from the LLM's free-text ReAct output. It returns a dictionary with three keys:
- `thought`: The agent's reasoning
- `action`: The action to take (e.g., "search_courses" or "FINISH")
- `action_input`: The JSON parameters for the action

The full implementation also includes three helper methods to be aware of:
- `validate_action_input()`: Parses and validates JSON with fallback logic for malformed input
- `format_observation()`: Formats tool results with configurable truncation to prevent context overflow
- `extract_final_answer()`: Extracts the final answer text from FINISH actions

You can view the complete implementation with all helper methods in [react_parser.py](../progressive_agents/stage4_hybrid_search_react/agent/react_parser.py). Below, you will find the core parser function.

```python

def parse_react_output(text: str) -> Dict[str, Optional[str]]:
    """
    Parse ReAct format output from LLM.
    
    Returns:
        Dictionary with 'thought', 'action', and 'action_input' keys
    """
    # Extract Thought (everything between "Thought:" and "Action:")
    thought_match = re.search(
        r"Thought:\s*(.+?)(?=\nAction:|\Z)", text, re.DOTALL | re.IGNORECASE
    )
    
    # Extract Action (word after "Action:")
    action_match = re.search(r"Action:\s*(\w+)", text, re.IGNORECASE)
    
    # Extract Action Input
    action_input_match = re.search(
        r"Action Input:\s*(.+?)(?=\nThought:|\nObservation:|\nAction:|\Z)",
        text,
        re.DOTALL | re.IGNORECASE,
    )
    
    return {
        "thought": thought_match.group(1).strip() if thought_match else None,
        "action": action_match.group(1).strip() if action_match else None,
        "action_input": (
            action_input_match.group(1).strip() if action_input_match else None
        ),
    }
```

</details>

### üìå Task: Implement the ReAct Loop

In this task, you will implement the core ReAct loop logic. Your implementation will match the production version used in [react_agent.py](../progressive_agents/stage4_hybrid_search_react/agent/react_agent.py), except we'll skip the logging code to focus on the essential logic.

Before you start coding, here's what's already provided for you:

1. `AgentState` - A TypedDict that defines the structure of our state:
   - `input`: The user's query string
   - `history`: List of previous conversation turns
   - `final_response`: The final answer to return
     
2. `messages` - A list of `AIMessage` and `HumanMessage` objects that stores the conversation history. Each iteration appends new messages so the LLM can see what happened previously.

3. `get_react_llm()` - A helper function imported from the production code that returns a configured LLM instance with the right parameters (model, temperature, max_tokens, etc.).

4. `execute_react_tool()` - A helper function imported from the production code that handles tool lookup and execution. You just call it with the tool name and arguments.

5. `tools_map` - A dictionary mapping tool names (like "search_courses") to their actual function implementations.

6. `REACT_SYSTEM_PROMPT` - The system prompt that instructs the LLM to use ReAct format (Thought/Action/Action Input).

7. `parse_react_output()` - A function that parses the LLM's text response and extracts the thought, action, and action_input fields.

Now, implement the ReAct loop logic. At a high level the  loop needs to:
1. Parse the LLM response to extract the action
2. Check if it's a FINISH action (completion)
3. Execute tool actions with proper error handling
4. Append observations to message history for the next iteration

> **Note**: By appending observations to the message list, the LLM sees the full conversation history on the next iteration. It can reason about what it learned and decide what to do next.

<details>
<summary>üõ†Ô∏è Show Implementation Details</summary>
<br> 
    
Step 1: **Parse the LLM output**

The `parse_react_output()` function returns a dictionary. You need to:
- Call `parse_react_output(response_text)` and store the result in a variable called `parsed`
- Extract the `"action"` key from the dictionary stored in `parsed` and store it in a variable called `action`

This gives you access to what action the LLM wants to take.

Step 2: **Check for FINISH action**

The production implementation uses "FINISH" to signal completion. You need to:
- Check if `action` exists and if it equals "FINISH" (case-insensitive using `.upper()`)
- If it is finished, get the final answer from `parsed["action_input"]`
- Return a dictionary with key `"final_response"` containing that answer
- Use `or ""` to handle None values

Step 3: **Execute tool actions**

For tool actions like "search_courses", you need to:
- Check if the `action` is a key in the `tools_map` dictionary
- Get the action input string from `parsed["action_input"]`
- Parse that string as JSON using `json.loads()` to get a dictionary of arguments
- Call the helper function `execute_react_tool(action, tool_args)` with await
- Wrap the tool execution in a try/except block to catch any errors (JSON parsing or tool execution)
- If there's an error, set the observation to an error message string
- Print the observation (truncate to 200 characters for readability)

The `execute_react_tool()` helper function handles looking up the tool in the map and calling it for you.

Step 4: **Append observation to messages**

After executing a tool successfully, you need to update the message history so the LLM sees what happened:

- Append the LLM's response as an `AIMessage` with content=response_textThis helps the LLM self-correct on the next iteration.

- Append the observation as a `HumanMessage` with content=f"Observation: {observation}"

This uses LangChain's message types (AIMessage/HumanMessage) instead of tuples, matching the production code.

Step 5: **Handle invalid actions**

If the action isn't recognized (not in tools_map and not FINISH), you need to give the LLM feedback:
- Append the LLM's response as an `AIMessage`
- Append an error message as a `HumanMessage`: "Error: Unknown action..."
- Include the action name and list valid options

This helps the LLM self-correct on the next iteration.

</details>


In [None]:
import json
import sys
from pathlib import Path
from typing import TypedDict, List, Optional
from langchain_core.messages import AIMessage, HumanMessage
from langchain_openai import ChatOpenAI

from agent.react_agent import get_react_llm, execute_react_tool
from agent.react_prompts import REACT_SYSTEM_PROMPT
from agent.react_parser import parse_react_output

# Define the State
class AgentState(TypedDict):
    input: str
    history: List[str]
    final_response: str

async def react_agent_node(state: AgentState, llm: Optional[ChatOpenAI] = None):
    """
    The ReAct agent that implements Thought ‚Üí Action ‚Üí Observation loop.
    
    This function maintains a conversation with the LLM where:
    - The LLM sees the system prompt, user query, and all previous observations
    - Each iteration adds new observations to the message history
    - The loop continues until the LLM returns "FINISH" or max iterations reached
    
    Args:
        state: AgentState with input, history, and final_response
        llm: Optional LLM instance (uses get_react_llm() if not provided)
    """
    query = state['input']
    history = state.get('history', [])
    
    # Get LLM instance (use provided or default)
    if llm is None:
        llm = get_react_llm()
    
    # Build initial messages (using LangChain message types)
    messages = [
        HumanMessage(content=REACT_SYSTEM_PROMPT),
        HumanMessage(content=f"\nUser: {query}\n\nHistory: {history}")
    ]
    
    # Limit the loop to prevent infinite spins
    max_iterations = 5
    iteration = 0
    
    while iteration < max_iterations:
        iteration += 1
        
        # Call the LLM
        response = await llm.ainvoke(messages)
        response_text = response.content
        print(f"\nü§ñ Step {iteration} Response:\n{response_text}")
        
        # TODO Step 1: Parse the output to get action dictionary
        # Use parse_react_output() and extract the 'action' field
        parsed = None  # Replace with: parse_react_output(response_text)
        action = None  # Replace with: parsed["action"]
        
        # TODO Step 2: Check if action is "FINISH"
        # If action is FINISH (case-insensitive), extract final answer and return
        
        
        # TODO Step 3: Execute tool if action is in tools_map
        # - Get action input from parsed["action_input"]
        # - Parse action input as JSON using json.loads()
        # - Call execute_react_tool(action, tool_args) with try/except
        # - Print observation (truncate to 200 chars)
        
        
        # TODO Step 4: Append messages for next iteration
        # After executing a tool, append both the LLM response and observation:
        # - AIMessage with the response_text
        # - HumanMessage with the observation
        
        
        # TODO Step 5: Handle invalid actions
        # If action not recognized, append error feedback
        
    
    return {"final_response": "I could not find the answer within the iteration limit."}

print("‚úÖ ReAct agent node defined")

<details>
<summary>üóùÔ∏è Solution code</summary>

```python

import json
from typing import TypedDict, List
from langchain_core.messages import AIMessage, HumanMessage

from agent.react_agent import get_react_llm, execute_react_tool
from agent.react_prompts import REACT_SYSTEM_PROMPT
from agent.react_parser import parse_react_output

# Define the State
class AgentState(TypedDict):
    input: str
    history: List[str]
    final_response: str

async def react_agent_node(state: AgentState):
    """
    The ReAct agent that implements Thought ‚Üí Action ‚Üí Observation loop.
    
    This function maintains a conversation with the LLM where:
    - The LLM sees the system prompt, user query, and all previous observations
    - Each iteration adds new observations to the message history
    - The loop continues until the LLM returns "FINISH" or max iterations reached
    """
    query = state['input']
    history = state.get('history', [])
    
    # Get LLM instance
    llm = get_react_llm()
    
    # Build initial messages (using LangChain message types)
    messages = [
        HumanMessage(content=REACT_SYSTEM_PROMPT),
        HumanMessage(content=f"\nUser: {query}\n\nHistory: {history}")
    ]
    
    # Limit the loop to prevent infinite spins
    max_iterations = 5
    iteration = 0
    
    while iteration < max_iterations:
        iteration += 1
        
        # Call the LLM
        response = await llm.ainvoke(messages)
        response_text = response.content
        print(f"\nü§ñ Step {iteration} Response:\n{response_text}")
        
        # Step 1: Parse the output
        parsed = parse_react_output(response_text)
        action = parsed["action"]
        
        # Step 2: Check for FINISH action
        if action and action.upper() == "FINISH":
            final_answer = parsed["action_input"] or ""
            return {"final_response": final_answer}
        
        # Step 3: Execute tool actions
        elif action in tools_map:
            action_input_str = parsed["action_input"]
            
            try:
                tool_args = json.loads(action_input_str)
                observation = await execute_react_tool(action, tool_args)
            except Exception as e:
                observation = f"Error: {str(e)}"
            
            print(f"\nüëÅÔ∏è Observation: {observation[:200]}...")
            
            # Step 4: Append to message history
            messages.append(AIMessage(content=response_text))
            messages.append(HumanMessage(content=f"Observation: {observation}"))
        
        # Step 5: Handle invalid actions
        else:
            messages.append(AIMessage(content=response_text))
            messages.append(HumanMessage(content=f"Error: Unknown action '{action}'. Use 'search_courses_hybrid' or 'FINISH'."))
    
    return {"final_response": "I could not find the answer within the iteration limit."}
    
print("‚úÖ ReAct agent node defined")
```

This core implementation matches the logic in [react_agent.py](../progressive_agents/stage4_hybrid_search/agent/react_agent.py). The production version adds:
- Detailed logging for debugging and monitoring
- LLM call tracking and metrics
- Reasoning trace for observability
- More sophisticated error handling
- State management for workflow integration

</details>

### Test Your Implementation

Now let's test your ReAct loop! 

We'll use the production implementation from [react_agent.py](../progressive_agents/stage4_hybrid_search_react/agent/react_agent.py) which includes the same core logic you just built, plus additional features like detailed logging, metrics tracking, and error handling.

The test utility below will verify that your understanding of the ReAct pattern is correct by comparing your implementation's behavior against expected outputs.

In [None]:
from test_react_agent import test_react_agent

# Test your implementation
await test_react_agent(react_agent_node, tools_map, None)

## Part 3: Building the Graph Architecture

Now that we have our ReAct agent node, we need to wrap it in a LangGraph StateGraph to create the execution flow.

Since our `react_agent_node` handles the looping internally (the while loop), our graph structure is simple and linear: `Start ‚Üí Agent ‚Üí End`.

This is different from Stage 3, which had multiple nodes (classify_intent_node, agent_node, etc.) with conditional edges. Here, the ReAct pattern consolidates all reasoning into a single node that loops internally.

### üìå Task: Build and Compile the Graph

Your task is to build a simple LangGraph workflow that wraps the ReAct agent node.

The graph building process involves initializing a [`StateGraph`](https://langchain-ai.github.io/langgraph/reference/graphs/#langgraph.graph.StateGraph) with your state schema, registering your agent function as a node, defining the execution flow with edges, and compiling everything into an executable application.

This is a simplified mock implementation to help you understand the core concepts. The full production implementation with additional features can be found in [workflow.py](../progressive_agents/stage4_hybrid_search/agent/workflow.py).

<details>
<summary>üõ†Ô∏è Show Implementation Details</summary>
<br> 
    
Step 1: **Initialize the StateGraph**

Create a [`StateGraph`](https://langchain-ai.github.io/langgraph/reference/graphs/#langgraph.graph.StateGraph) instance by passing your `AgentState` TypedDict as the type parameter. This tells LangGraph what shape the state will have.

Step 2: **Add the agent node**

Create a StateGraph instance by passing your `AgentState` TypedDict as the type parameter. This tells LangGraph what shape the state will have.

Use the [`.add_node()`](https://langchain-ai.github.io/langgraph/reference/graphs/#langgraph.graph.StateGraph.add_node) method to add a node:
- First parameter: node name as a string (use "agent")
- Second parameter: the function to execute (your `react_agent_node`)

Step 3: **Set the entry point**

Use the [`.set_entry_point()`](https://langchain-ai.github.io/langgraph/reference/graphs/#langgraph.graph.StateGraph.set_entry_point) method to specify which node executes first. Pass the node name as a string.

Step 4: **Add edge to END**

Use the [`.add_edge()`](https://langchain-ai.github.io/langgraph/reference/graphs/#langgraph.graph.StateGraph.add_edge) method to connect the agent to termination:
- First parameter: source node name ("agent")
- Second parameter: destination (the `END` constant)

Step 5: **Compile the graph**

Call [`.compile()`](https://langchain-ai.github.io/langgraph/reference/graphs/#langgraph.graph.StateGraph.compile) on the workflow to create the executable application. Store this in a variable called `app`.

</details>

In [None]:
from langgraph.graph import StateGraph, END

# TODO: Step 1 - Initialize the StateGraph with AgentState

# TODO: Step 2 - Add the agent node

# TODO: Step 3 - Set the entry point

# TODO: Step 4 - Add edge to END

# TODO: Step 5 - Compile

print("‚úÖ Graph compiled!")

<details>
<summary>üóùÔ∏è Solution code</summary>
<br>
    
```python

from langgraph.graph import StateGraph, END

# Initialize the StateGraph with AgentState
workflow = StateGraph(AgentState)

# Add the agent node
workflow.add_node("agent", react_agent_node)

# Set the entry point
workflow.set_entry_point("agent")

# Add edge to END
workflow.add_edge("agent", END)

# Compile
app = workflow.compile()

print("‚úÖ Graph compiled!")
```

This creates a simple linear graph:
- `START` ‚Üí `agent` ‚Üí `END`
- The agent node contains all the ReAct loop logic
- No conditional edges needed since the loop handles decision-making internally
- The graph terminates when `react_agent_node` returns the final response

</details>

## Part 4: Testing the ReAct Agent

Now let's test our agent with different query types. Watch the output carefully to see the Thought ‚Üí Action ‚Üí Observation cycle in action.

### Single-Turn Queries

First, let's test individual queries to see how the agent reasons. The first query attempts to query for an exact match, while the second query is a topic-based search. Run both code blocks to see the agent's thought process.

In [None]:
import logging
import time

# Suppress verbose logging
workflow_logger = logging.getLogger("course-qa-workflow")
original_level = workflow_logger.level
workflow_logger.setLevel(logging.WARNING)

# Query 1: Exact Match
query1 = "What are the prerequisites for CS013?"
print(f"Query 1 (Exact Match): {query1}\n" + "="*60)

start1 = time.perf_counter()
result1 = await app.ainvoke({"input": query1})


# Query 2: Semantic Search
print("\n" + "="*60)
query2 = "I want to learn about neural networks."
print(f"Query 2 (Semantic Search): {query2}\n" + "="*60)

start2 = time.perf_counter()
result2 = await app.ainvoke({"input": query2})


# Restore logging
workflow_logger.setLevel(original_level)

### Multi-Turn Conversations with "Memory"

Now let's test something Stage 3 couldn't handle: multi-turn conversations with references to previous exchanges.

We'll use a simple conversation history list that gets passed through the state. This isn't true memory (we'll implement this in the next stage), but it demonstrates how the ReAct agent can reason about conversational context.

Watch how the agent handles queries like "What about that course?" by analyzing the conversation history to determine which course "that" refers to.

In [None]:
from IPython.display import display, HTML, Markdown

# Suppress verbose logging
workflow_logger = logging.getLogger("course-qa-workflow")
original_level = workflow_logger.level
workflow_logger.setLevel(logging.WARNING)

# Multi-turn conversation simulation
display(HTML("<h3>üí¨ Multi-Turn Conversation Demo</h3>"))

# Initialize conversation history
conversation_history = []

# Turn 1: Initial query
display(HTML("<div style='background:#e3f2fd; padding:10px; border-radius:8px; margin:8px 0;'><b>üë§ User:</b> Tell me about CS002</div>"))
result1 = await app.ainvoke({"input": "Tell me about CS002", "history": conversation_history})
display(HTML(f"<div style='background:#e8f5e9; padding:12px; border-left:4px solid #4CAF50; border-radius:4px; margin:8px 0;'><b>ü§ñ Agent:</b><br>{result1['final_response']}</div>"))

# Add to history
conversation_history.append(f"User: Tell me about CS002")
conversation_history.append(f"Agent: {result1['final_response'][:200]}...")

# Turn 2: Follow-up with reference
display(HTML("<hr style='border:none; border-top:1px dashed #ccc; margin:16px 0;'>"))
display(HTML("<div style='background:#e3f2fd; padding:10px; border-radius:8px; margin:8px 0;'><b>üë§ User:</b> What are the prerequisites for that course?</div>"))
result2 = await app.ainvoke({
    "input": "What are the prerequisites for that course?", 
    "history": conversation_history
})
display(HTML(f"<div style='background:#e8f5e9; padding:12px; border-left:4px solid #4CAF50; border-radius:4px; margin:8px 0;'><b>ü§ñ Agent:</b><br>{result2['final_response']}</div>"))

# Add to history
conversation_history.append(f"User: What are the prerequisites for that course?")
conversation_history.append(f"Agent: {result2['final_response'][:200]}...")

# Turn 3: Another follow-up
display(HTML("<hr style='border:none; border-top:1px dashed #ccc; margin:16px 0;'>"))
display(HTML("<div style='background:#e3f2fd; padding:10px; border-radius:8px; margin:8px 0;'><b>üë§ User:</b> What's the difficulty level?</div>"))
result3 = await app.ainvoke({
    "input": "What's the difficulty level?", 
    "history": conversation_history
})
display(HTML(f"<div style='background:#e8f5e9; padding:12px; border-left:4px solid #4CAF50; border-radius:4px; margin:8px 0;'><b>ü§ñ Agent:</b><br>{result3['final_response']}</div>"))

# Restore logging
workflow_logger.setLevel(original_level)

# Summary
display(HTML("""<div style='margin-top:16px; padding:12px; background:#fff3e0; border-radius:8px; border-left:4px solid #FF9800;'>
<b>‚úÖ Multi-turn conversation complete!</b><br><br>
<b>Key Observations:</b>
<ul style='margin:8px 0 0 0;'>
<li>The agent used conversation history to resolve "that course" to CS002</li>
<li>The agent maintained context across multiple turns</li>
<li>This is a simple history-based approach (Stage 5 adds semantic memory)</li>
</ul>
</div>"""))

## Wrap Up üèÅ

Great job! You've completed Stage 4 and built a complete ReAct agent that represents a fundamental shift from pre-programmed pipelines to dynamic reasoning.

In this stage, you learned how to:

- Implement the hybrid search tool that combines semantic vector search with exact field filtering
- Build the ReAct loop(Thought ‚Üí Action ‚Üí Observation) that enables agents to reason about their information needs
- Construct a LangGraph workflow that orchestrates agent execution with state management

The key transformation from stage 3 is moving from fixed intent paths to adaptive decision-making. Instead of pre-defining every possible query pattern, your agent reasons about each situation dynamically. This cognitive architecture is what enables ChatGPT, Claude, and production customer support systems to handle the infinite variety of real-world queries.

In stages 5 and 6, you'll complete this foundation by adding true memory with Redis. Instead of passing conversation history as a simple list, you'll store and retrieve context from past interactions using Redis Agent Memory Server, enabling the agent to maintain coherent conversations across hundreds of messages and recall relevant information from long dialogue histories.