# Stage 3: Hierarchical Retrieval & Intent-Driven Context

## Introduction

Recall that at the beginning of this course, we defined context engineering as the practice of deliberately shaping what information reaches the LLM, and in what form. It's not just about *what* we retrieve ‚Äî it's also about *how we structure it*, *when we include it*, and *how much detail we provide*.

In our stage 1 naive RAG, we saw that if we include *everything* in the context (the entire course catalog), we overwhelm the LLM, waste tokens, and, based on research from top AI companies, would eventually create issues like context rot.

In stage 2, we tackled how to structure the context. We explored how to start to shape data to be both LLM-friendly and task-appropriate. We built two representations of the same data:
- `transform_course_to_text` gave us a full, human-readable format (structured for comprehension)
- `optimize_course_text` gave us a compressed, essentials-only version (optimized for token efficiency)

This was our way of presenting the same piece of information at different granularities, and *how* we format it matters as much as what we include.

But this implementation was simple because it had a critical limitation: the context was static. We chose one representation strategy (full or optimized) at the start, and every query ‚Äî whether *"What machine learning courses exist?"* or *"What is the week 1 assignment for CS101?"* ‚Äî received the same treatment. The pipeline didn't adapt to the question.

In this stage, we'll add intelligence: instead of choosing one strategy for all queries, we'll build a system that analyzes the user's intent and dynamically adjusts the context's structure and depth in response.

Here's what we'll build:

1. **Hierarchical Data Models (Structure)**  
   Instead of choosing between "full" or "optimized," we'll formalize two distinct *views* of the data:
   - Summary View: Lightweight, ~50-100 tokens per item. Perfect for browsing.
   - Details View: Comprehensive, ~500-1000 tokens per item. Contains the full syllabus, assignments, and learning objectives.
   
   This is a layered context design, providing the system with multiple lenses through which to view the same underlying data.

2. **Context Assembly with Progressive Disclosure (Presentation)**  
   We'll use research-backed techniques to structure context for maximum LLM comprehension:
   - Place summaries at the beginning (high-level map)
   - Place details at the end (specifics are "fresh" when generating)
   
   This combats the ["Lost in the Middle"](https://arxiv.org/abs/2307.03172) phenomenon and is positional context optimization.

3. **Intent Classification (Decision)**  
   Not all questions require the same information depth. We'll teach the system to classify user queries by information need:
   - *"What courses are there?"* ‚Üí Need: breadth (summaries only)
   - *"What will I learn in CS101?"* ‚Üí Need: depth (full details)
   
   This will serve as a query-aware context selection strategy where the context we provide adapts to the question.

4. **Quality Evaluation (Validation)**  
   Before sending context to the LLM, we'll validate: *"Is this information actually sufficient to answer the question?"*  
   If not, we automatically retrieve again with a different strategy.
   
   This is feedback-driven context refinement‚Äîthe system learns from its mistakes in real-time.

By the end of Stage 3, we'll have built a system that practices adaptive, validated context engineering:
- It analyzes the question to determine the information depth needed
- It assembles context hierarchically (breadth-first, then depth)
- It validates quality before use
- It automatically corrects insufficient context

Let's get started.

## Setup and Agent Overview

Let's set up our environment and import the stage 3 agent. Run the code block below.

In [None]:
#This code sets up the notebook to be able to access the provided OpenAI API Key and access to the agent code

import sys
import os
from pathlib import Path

if "OPENAI_API_BASE" in os.environ:
    os.environ["OPENAI_BASE_URL"] = os.environ["OPENAI_API_BASE"]

project_root = Path("..").resolve()

stage3_path = project_root / "progressive_agents" / "stage3_hierarchical_retrieval"
src_path = project_root / "src"

sys.path.insert(0, str(src_path))
sys.path.insert(0, str(stage3_path))

from agent import setup_agent, create_workflow

print("Initializing Stage 3 Agent...")
# This reuses the same Redis data from previous stages
course_manager = await setup_agent(auto_load_courses=True)
workflow = create_workflow(course_manager)

print("‚úì Agent is ready!")
print("‚úì Course manager initialized")

Before we start building, let's understand where these components reside in the actual Stage 3 agent workflow ‚Äî and how it differs from what we built in Stage 2.

In Stage 2, the workflow was linear:

<br>

```mermaid
graph LR
    Start([User Query]) --> Research["Research Node<br/>Retrieve & assemble context"]
    Research --> Synthesize["Synthesize Node<br/>Generate answer"]
    Synthesize --> End([Final Response])

    style Research fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px
    style Synthesize fill:#fff9c4,stroke:#f57f17,stroke-width:2px
```
<br>


In Stage 3, the workflow will have three main nodes. The agent node is where the magic happens:

<br>

```mermaid
graph TD
    Start([User Query]) --> Classify["Classify Intent<br/>(Node)"]

    Classify -->|GREETING| Greeting["Handle Greeting<br/>(Node)"]
    Classify -->|GENERAL or DETAIL| Agent["Agent with Tool Calling<br/>(Node)"]

    Greeting --> End1([Return Greeting])

    Agent -->|Use search_courses tool| Tool["search_courses Tool<br/>- Hierarchical models<br/>- Context assembler<br/>- Quality evaluator<br/>.."]
    
    Tool -->|Good quality| Return["Return Answer"]
    Tool -->|Poor quality| Tool
    
    style Classify fill:#ffe0b2,stroke:#e65100,stroke-width:3px
    style Greeting fill:#c8e6c9,stroke:#2e7d32,stroke-width:2px
    style Agent fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px
    style Tool fill:#fff9c4,stroke:#f57f17,stroke-width:2px
```

The agent node is where we will call tools. The node will have access to call a `search_courses` tool, which internally will use:
- The `CourseSummary` & `CourseDetails` models
- The `ContextAssembler` for progressive disclosure
- A `evaluate_context_quality` function to evaluate quality

Now that you're familiar with the overall flow, let's explore the hierarchical models we'll be working with.

## Part 1: The Data Structure (Hierarchical Models)

To start, we're going to formalize two distinct views of our data:
1.  `CourseSummary`: A lightweight model (~50-100 tokens) for browsing and broad search.
2.  `CourseDetails`: A comprehensive model (~500-1000 tokens) containing the full syllabus, assignments, and prerequisites.

We'll keep the summary view fairly straightforward. It will simply be a flat list of fields, such as title, code, and instructor. 

On the other hand, the details view is a bit more complex. To represent a full course faithfully, we can't just use a giant string of text. We need structured data to allow for precise rendering and querying. A real university course has a syllabus composed of weekly plans, assignments with specific metadata (including due dates and points), and prerequisites.

If we flattened all this into a single string, we'd lose the ability to format it dynamically or query specific parts (e.g., "What is due in Week 3?").

To keep the agent organized, the data models are split into two files (which you can feel free to explore):
*   `src/redis_context_course/models.py`: Contains the core domain entities and enums (like `DifficultyLevel`, `CourseFormat`) that are used everywhere.
*   `src/redis_context_course/hierarchical_models.py`: Contains the specialized structures (like `WeekPlan`, `Assignment`) that are specifically designed for this hierarchical retrieval strategy.

For example, here is what the `WeekPlan` specialized structure looks like:

```python
class WeekPlan(BaseModel):
    """Detailed plan for a single week of the course."""
    week_number: int
    topic: str
    subtopics: List[str] = Field(default_factory=list)
    readings: List[str] = Field(default_factory=list)
    assignments: List[str] = Field(default_factory=list)
    learning_objectives: List[str] = Field(default_factory=list)
```

In addition to `WeekPlan`, there are also models for the following:
*   `CourseSyllabus`: A collection of `WeekPlan` objects.
*   `Assignment`: Structured details like `due_week`, `points`, and `type` (Exam vs Project).
*   `Prerequisite`: Links to other course codes.

These models act as the "schema" for the detailed view. The `CourseDetails` model will utilize them as fields (e.g., `syllabus: CourseSyllabus`), effectively serving as a container for this rich information.

### The Two-Tier Storage Architecture

Understanding how these models are stored in Redis is key to understanding why they're designed this way:

```
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ      USER QUERY: "machine learning courses"             ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
                  ‚îÇ
                  ‚ñº
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ  TIER 1: Vector Search on CourseSummary                 ‚îÇ
‚îÇ  ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ      ‚îÇ
‚îÇ  Storage: Redis Vector Index                            ‚îÇ
‚îÇ  Data: CourseSummary + Embeddings                       ‚îÇ
‚îÇ  Purpose: Fast semantic search across ALL courses       ‚îÇ
‚îÇ  Cost: ~50-100 tokens per course                        ‚îÇ
‚îÇ                                                         ‚îÇ
‚îÇ  Query ‚Üí Embedding ‚Üí Vector Search ‚Üí Top 5 Course IDs   ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
                  ‚îÇ
                  ‚îÇ IDs: ["CS401", "CS501", "CS601"]
                  ‚ñº
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ  TIER 2: Direct Lookup of CourseDetails                 ‚îÇ
‚îÇ  ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ      ‚îÇ
‚îÇ  Storage: Redis Hash (key-value)                        ‚îÇ
‚îÇ  Data: CourseDetails (no embeddings needed)             ‚îÇ
‚îÇ  Purpose: Fetch full details by ID                      ‚îÇ
‚îÇ  Cost: ~500-1000 tokens per course                      ‚îÇ
‚îÇ                                                         ‚îÇ
‚îÇ  Course IDs ‚Üí Direct Hash Lookup ‚Üí Full Details         ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
```

### The Two Model Implementations

Below in the drop downs you'll find the complete model definitions. Explore the code for both to gain familiarity on how they work. 

<details>

<summary>üîç Open to explore <code>CourseSummary</code></summary>

<br>

The lightweight "card catalog" view is used for browsing and initial search results. It contains only essential course information (~50-100 tokens per course).

Note that it has a `generate_embedding_text()` method, which creates a searchable text representation that is converted to a vector embedding and stored alongside the course data.

<br>

```python

class CourseSummary(BaseModel):
    """
    Lightweight course overview for initial search.

    This is the first tier in hierarchical retrieval.
    Contains just enough information for users to decide
    if they want more details.
    """

    # Core identification
    course_code: str
    title: str

    # Basic info
    department: str
    credits: int
    difficulty_level: DifficultyLevel
    format: CourseFormat
    instructor: str

    # Brief description (1-2 sentences)
    short_description: str

    # Prerequisites (just course codes for brevity)
    prerequisite_codes: List[str] = Field(default_factory=list)

    # Metadata for search
    tags: List[str] = Field(default_factory=list)

    # For vector search
    embedding_text: Optional[str] = None

    def generate_embedding_text(self) -> str:
        """Generate text for vector embedding."""
        parts = [
            f"{self.course_code}: {self.title}",
            f"Department: {self.department}",
            f"Level: {self.difficulty_level.value}",
            self.short_description,
        ]
        if self.tags:
            parts.append(f"Topics: {', '.join(self.tags)}")

        self.embedding_text = " | ".join(parts)
        return self.embedding_text
```
</details>

<details>
<summary>üîç Open to explore <code>CourseDetails</code></summary>

<br>

The comprehensive "full book" view with complete syllabus and assignments. Retrieved only for the most relevant courses (~500-1000 tokens per course). 

This model is stored as plain JSON in a Redis hash. It's never searched ‚Äî it's fetched directly by course ID after the summary search finds relevant courses. Since we already know the exact ID we want, there's no need for vector search or embeddings on this data.

<br>

```python

class CourseDetails(BaseModel):
    """
    Full course details with syllabus and assignments.

    This is the second tier in hierarchical retrieval.
    Retrieved only for the most relevant courses after
    initial summary search.
    """

    # All fields from CourseSummary
    course_code: str
    title: str
    department: str
    credits: int
    difficulty_level: DifficultyLevel
    format: CourseFormat
    instructor: str

    # Full description (multiple paragraphs)
    full_description: str

    # Prerequisites (full details)
    prerequisites: List[Prerequisite] = Field(default_factory=list)

    # Learning outcomes
    learning_objectives: List[str] = Field(default_factory=list)

    # Detailed syllabus
    syllabus: CourseSyllabus

    # Assignments
    assignments: List[Assignment] = Field(default_factory=list)

    # Additional metadata
    semester: Semester
    year: int
    max_enrollment: int
    tags: List[str] = Field(default_factory=list)

    def to_summary(self) -> CourseSummary:
        """Convert full details to summary view."""
        # Create short description from first 2 sentences of full description
        sentences = self.full_description.split(". ")
        short_desc = ". ".join(sentences[:2])
        if not short_desc.endswith("."):
            short_desc += "."

        return CourseSummary(
            course_code=self.course_code,
            title=self.title,
            department=self.department,
            credits=self.credits,
            difficulty_level=self.difficulty_level,
            format=self.format,
            instructor=self.instructor,
            short_description=short_desc,
            prerequisite_codes=[p.course_code for p in self.prerequisites],
            tags=self.tags,
        )
```
</details>

Now that you've seen the model definitions, let's import them and verify that everything is working as expected. Run the cell below to import the models from the source code.

In [None]:
from redis_context_course.models import DifficultyLevel, CourseFormat
from redis_context_course.hierarchical_models import (
    CourseSummary,
    CourseDetails,
    Prerequisite, 
    WeekPlan, 
    CourseSyllabus, 
    Assignment, 
    AssignmentType
)
from pydantic import Field
from typing import List

print("‚úÖ Models imported")

With our hierarchical data models now imported, we're ready to move on to the next critical component: how we present this data to the LLM.

## Part 2: The Context Assembler (Progressive Disclosure)

Having established our two-tier data structure, we now face a new challenge: how do we arrange this information to maximize the LLM's ability to understand and use it effectively? This is where progressive disclosure comes in.

Progressive disclosure means showing only the information needed, when it's needed, in the optimal order. In traditional interfaces, this means starting with high-level categories, then allowing users to drill down into specifics.

This matters because of how LLMs process context. Research shows they pay more attention to information at the beginning and end of their context window (the "Lost in the Middle" problem), while processing hierarchically structured information more effectively than flat lists. By placing summaries at the start, we create navigational landmarks. By placing details at the end, we leverage recency bias during generation. This is context engineering in action: deliberately shaping not just what information is included, but where it's positioned and how densely it's packed.

For our course search system, we structure context in two layers. The overview layer places all `CourseSummary` objects at the start to create a high-level map, answering the question "What options exist?" The deep dive layer places full `CourseDetails` for the most relevant courses at the very end‚Äîproviding comprehensive information where the LLM's attention is naturally high.

### üìå Task 1: Implement the Context Assembler

Below you'll find a `ContextAssembler` class. Your task is to implement the two methods that decide *how* to arrange course information for the LLM.

The parent class `HierarchicalContextAssembler` (located in `src/redis_context_course/hierarchical_context.py`) provides two helper methods you can call:
- `self._format_summary(summary, index)` - Formats a single course summary with numbering
- `self._format_details(details)` - Formats full course details including syllabus and assignments

You'll need to complete the two main assembly strategies:

1. **`assemble_summary_only_context(summaries, query)`**: For general queries like "What CS courses exist?"
   - Return a string with: header, count, and all summaries formatted with `self._format_summary()`
   - This is the "breadth-first" strategy (overview only)

<details><summary>üõ†Ô∏è Show Implementation Details</summary>
<br>
    
Build a string by populating the `sections` list with the following:
 
1. Add aheader using `sections.append()`: `f"# Course Search Results for: {query}\n"`
2. Add a count: `f"Found {len(summaries)} relevant courses:\n"`
3. Loop through summaries using `enumerate(summaries, 1)` to get index `i` starting from 1
4. For each summary, call `self._format_summary(summary, i)` and append to sections
5. Join all sections with `"\n".join(sections)` and return the result

</details>

2. **`assemble_hierarchical_context(summaries, details, query)`**: For detailed queries like "What will I learn in CS101?"
   - Return a string with: header, "Overview" section with all summaries, "Detailed Information" section with full details
   - This is the "progressive disclosure" strategy (breadth first, then depth)

<details><summary>üõ†Ô∏è Show Implementation Details</summary>
<br>    
    
Build a string by populating the `sections` list with:
 
1. Add a header: `f"# Course Search Results for: {query}\n"`
2. Add an overview section:
   - Header: `"## Overview of All Matches\n"`
   - Count: `f"Found {len(summaries)} relevant courses:\n"`
   - Loop through summaries with `enumerate(summaries, 1)` and append `self._format_summary(summary, i)`
3. Add a details section (if details exist):
   - Header: `f"\n## Detailed Information (Top {len(details)} Courses)\n"`
   - Intro: `"Full syllabi and assignments for the most relevant courses:\n"`
   - Loop through details and append `self._format_details(detail)` for each
4. Join all sections with `"\n".join(sections)` and return the result
</details>

If you get stuck, reference the solution dropdown below the implementation cell.

In [None]:
from redis_context_course.hierarchical_models import CourseSummary, CourseDetails
from redis_context_course.hierarchical_context import HierarchicalContextAssembler
from typing import List

class ContextAssembler(HierarchicalContextAssembler):
    """
    Assembles context using a progressive disclosure pattern.
    
    Note: The formatting helpers (_format_summary and _format_details) will be 
    provided by the production system when you connect to the agent in the Bonus section.
    """
    
    def assemble_summary_only_context(
        self,
        summaries: List[CourseSummary],
        query: str,
    ) -> str:
        """Assemble context with ONLY summaries (no details)."""
        sections = []
        
        # TODO: Add header with query
        
        # TODO: Add count of summaries found
        
        # TODO: Loop through summaries with enumerate(summaries, 1)
        
        return "\n".join(sections)
    
    def assemble_hierarchical_context(
        self,
        summaries: List[CourseSummary],
        details: List[CourseDetails],
        query: str,
    ) -> str:
        """Assemble context with progressive disclosure (summaries + details)."""
        sections = []
        
        # TODO: Add header with query
        
        # TODO: Add "Overview of All Matches" section

        # TODO: Add "Detailed Information" section if details exist

        
        return "\n".join(sections)

# Create the assembler instance
assembler = ContextAssembler()

print("‚úÖ Context assembler created")

<details>
<summary>üóùÔ∏è Solution code</summary>
    
<br>
    
```python

from redis_context_course.hierarchical_models import CourseSummary, CourseDetails
from redis_context_course.hierarchical_context import HierarchicalContextAssembler
from typing import List

class ContextAssembler(HierarchicalContextAssembler):
    """
    Assembles context using a progressive disclosure pattern.
    
    Note: The formatting helpers (_format_summary and _format_details) will be 
    provided by the production system when you connect to the agent in the Bonus section.
    """
    
    def assemble_summary_only_context(
        self,
        summaries: List[CourseSummary],
        query: str,
    ) -> str:
        """Assemble context with ONLY summaries (no details)."""
        sections = []
        
        # Header
        sections.append(f"# Course Search Results for: {query}\n")
        
        # Summary count and list
        sections.append(f"Found {len(summaries)} relevant courses:\n")
        
        # Format each summary
        for i, summary in enumerate(summaries, 1):
            sections.append(self._format_summary(summary, i))
        
        return "\n".join(sections)
    
    def assemble_hierarchical_context(
        self,
        summaries: List[CourseSummary],
        details: List[CourseDetails],
        query: str,
    ) -> str:
        """Assemble context with progressive disclosure (summaries + details)."""
        sections = []
        
        # Header
        sections.append(f"# Course Search Results for: {query}\n")
        
        # Section 1: Overview of ALL matches (breadth-first)
        sections.append("## Overview of All Matches\n")
        sections.append(f"Found {len(summaries)} relevant courses:\n")
        
        for i, summary in enumerate(summaries, 1):
            sections.append(self._format_summary(summary, i))
        
        # Section 2: Detailed Information for TOP matches (depth at end)
        if details:
            sections.append(f"\n## Detailed Information (Top {len(details)} Courses)\n")
            sections.append("Full syllabi and assignments for the most relevant courses:\n")
            
            for detail in details:
                sections.append(self._format_details(detail))
        
        return "\n".join(sections)

assembler = ContextAssembler()

print("‚úÖ Context assembler created")
```

</details>

### Test Your Implementation

Before connecting to the agent, let's verify your assembly strategies work correctly. Run the test utility below:

In [None]:
from test_context_assembler import test_assembler

# Test your implementation with real course data
test_assembler(assembler)

### Connect to the agent

Now that we've tested the implementation, let's connect it to the agent system. Run the code block below.

In [None]:
# Connect your implementation to the agent
import redis_context_course.hierarchical_context as hc_module

# Your ContextAssembler inherits from HierarchicalContextAssembler,
# so it has the formatting helpers built-in and your custom assembly strategies.
# We just need to tell the agent module to use your instance instead of the default.
hc_module.context_assembler = assembler

print("‚úÖ ContextAssembler is now connected to the agent!")

## Part 3: Intent Classification (Query-Aware Context Selection)

We now have hierarchical data models (Part 1) and progressive disclosure assembly (Part 2). We're just missing the piece that allows the agent to adapt to the question being asked. This is where intent classification comes in.

Here's the context engineering problem we're solving:

If someone asks *"What is CS101?"* ‚Üí They need a summary (cheap, fast, ~100 tokens)  
If someone asks *"Show me the syllabus for CS101"* ‚Üí They need full details (expensive, thorough, ~1000 tokens)

Without intent classification, we'd face a context engineering dilemma:
1. Always return full details ‚Üí wasteful, slow, expensive, and risks overwhelming the LLM with unnecessary information
2. Always return summaries ‚Üí incomplete, frustrating, and fails to provide sufficient context for detailed queries

Intent Classification is the decision layer in our agent's flow. It analyzes the query to determine what level of detail the LLM needs to generate a quality response.

We'll use a system that routes queries based on five retrieval strategies:

- GREETING ‚Üí Skip search entirely, return friendly response
- GENERAL ‚Üí Return course summaries only (~500 tokens for 5 courses)
- PREREQUISITES ‚Üí Return course summaries only (prerequisite codes are included in summaries)
- SYLLABUS_OBJECTIVES ‚Üí Return summaries + full details (~2000 tokens)
- ASSIGNMENTS ‚Üí Return summaries + full details (~2000 tokens)

The classifier acts as an early filter. It determines whether to run the full agent workflow or short-circuit with a simple greeting response. This saves unnecessary API calls and retrieval operations for queries that don't need them.

### üìå Task 2: Implement the Intent Classifier

Your goal is to build a query classification function that analyzes user questions and determines the appropriate retrieval strategy. This classifier serves as the decision layer in your agent's workflow, routing queries to either skip search entirely (for greetings) or determining the depth of information needed (summaries vs. full details).

The function will use an LLM to evaluate the query against five intent categories (GREETING, GENERAL, SYLLABUS_OBJECTIVES, ASSIGNMENTS, PREREQUISITES) and return the most appropriate category. We've provided the classification prompt with clear category definitions and examples.

<details>
<summary> üõ†Ô∏è Show Implementation Details </summary>
<br>


1: **Get the LLM instance**

Call `get_analysis_llm()` (already imported) and assign it to a variable named `llm`.

2: **Send the prompt to the LLM**

Use the `llm` variable to call `.ainvoke()` with a message list. Wrap the `intent_prompt` (already provided in the code) in a `HumanMessage` object: `[HumanMessage(content=intent_prompt)]`. 

Store the result in a variable named `response`.

3: **Extract the text**

Get the response text using `response.content.strip()` and store it in a variable named `response_content`.

4: **Parse the intent**

Loop through the lines in `response_content.split("\n")` to find the line that starts with `"INTENT:"`, then extract the category name after the colon using `.split(":", 1)[1].strip()`.

Store the extracted intent in a variable named `intent`, defaulting to `"GENERAL"` if no match is found.

</details>

If you get stuck, reference the solution dropdown after the code block.

In [None]:
from langchain_core.messages import HumanMessage
from langchain_openai import ChatOpenAI
from agent.nodes import get_analysis_llm

async def classify_intent_node(query: str) -> str:
    """
    Classify user query intent to determine appropriate retrieval strategy.
    
    Args:
        query: The user's question
        
    Returns:
        Intent category string (GREETING, GENERAL, SYLLABUS_OBJECTIVES, ASSIGNMENTS, PREREQUISITES)
    """
    
    # Classification prompt (provided for you - defining good categories is the hard part!)
    intent_prompt = f"""You are a query intent classifier for a course information system.

TASK: Analyze the query and return ONLY the most appropriate intent category.

Query: {query}

INTENT CATEGORIES:

1. GREETING
   - Greetings, acknowledgments, pleasantries
   - Examples: "hello", "hi there", "thank you", "thanks"

2. GENERAL
   - Broad course information requests
   - Course descriptions and overviews
   - "What is [course]?" questions
   - Example: "What is CS002?"

3. SYLLABUS_OBJECTIVES
   - Syllabus requests
   - Course structure and topics covered
   - Learning objectives and outcomes
   - Examples: "Show me the syllabus for CS002", "What will I learn?", "What topics are covered?", "Give me details about this course"

4. ASSIGNMENTS
   - Homework, projects, exams
   - Assessment types and workload
   - Grading information
   - Examples: "What are the assignments?", "How many exams?", "What's the workload?"

5. PREREQUISITES
   - Course requirements
   - Prior knowledge needed
   - Examples: "What are the prerequisites?", "What do I need before taking this?"

CLASSIFICATION RULES:
- Choose the MOST SPECIFIC category that matches
- If multiple categories apply, prioritize based on the primary intent
- Default to GENERAL for ambiguous queries
- Ignore filler words and focus on core intent

OUTPUT FORMAT (respond with exactly this structure):
INTENT: <category_name>
"""
    
    # TODO: Step 1 - Get the LLM instance
    
    # TODO: Step 2 - Send the prompt to the LLM
    
    # TODO: Step 3 - Extract the response text
    
    # TODO: Step 4 - Parse the intent from the response
    
    return intent

print("‚úÖ Intent classification node created")

<details>
<summary>üóùÔ∏è Solution code</summary>

```python

from langchain_core.messages import HumanMessage
from langchain_openai import ChatOpenAI
from agent.nodes import get_analysis_llm

async def classify_intent_node(query: str) -> str:
    """
    Classify user query intent to determine appropriate retrieval strategy.
    
    Args:
        query: The user's question
        
    Returns:
        Intent category string (GREETING, GENERAL, SYLLABUS_OBJECTIVES, ASSIGNMENTS, PREREQUISITES)
    """
    
    # Classification prompt (provided for you - defining good categories is the hard part!)
    intent_prompt = f"""You are a query intent classifier for a course information system.

TASK: Analyze the query and return ONLY the most appropriate intent category.

Query: {query}

INTENT CATEGORIES:

1. GREETING
   - Greetings, acknowledgments, pleasantries
   - Examples: "hello", "hi there", "thank you", "thanks"

2. GENERAL
   - Broad course information requests
   - Course descriptions and overviews
   - "What is [course]?" questions
   - Example: "What is CS002?"

3. SYLLABUS_OBJECTIVES
   - Syllabus requests
   - Course structure and topics covered
   - Learning objectives and outcomes
   - Examples: "Show me the syllabus for CS002", "What will I learn?", "What topics are covered?", "Give me details about this course"

4. ASSIGNMENTS
   - Homework, projects, exams
   - Assessment types and workload
   - Grading information
   - Examples: "What are the assignments?", "How many exams?", "What's the workload?"

5. PREREQUISITES
   - Course requirements
   - Prior knowledge needed
   - Examples: "What are the prerequisites?", "What do I need before taking this?"

CLASSIFICATION RULES:
- Choose the MOST SPECIFIC category that matches
- If multiple categories apply, prioritize based on the primary intent
- Default to GENERAL for ambiguous queries
- Ignore filler words and focus on core intent

OUTPUT FORMAT (respond with exactly this structure):
INTENT: <category_name>
"""
    
    # Step 1 - Get the LLM instance
    llm = get_analysis_llm()
    
    # Step 2 - Send the prompt to the LLM
    response = await llm.ainvoke([HumanMessage(content=intent_prompt)])
    
    # Step 3 - Extract the response text
    response_content = response.content.strip()
    
    # Step 4 - Parse the intent from the response
    intent = "GENERAL"  # Default fallback
    for line in response_content.split("\n"):
        if line.startswith("INTENT:"):
            intent = line.split(":", 1)[1].strip()
            break
    
    return intent

print("‚úÖ Intent classification node created")
```


</details>

### Test Your Implementation

Now let's test your intent classifier with the test utility. It will run 17 test cases covering all 5 categories.

In [None]:
from test_intent_classifier import test_intent_classifier

# Run tests
await test_intent_classifier(classify_intent_node)

### Connect to the Agent

Now let's inject your implementation into the Stage 3 agent. This replaces the default classification logic in `classify_intent_node` with your function.

In [None]:
# Import the agent module
from agent import set_classify_intent_function

# Inject your implementation
set_classify_intent_function(classify_intent_node)

print("‚úÖ Your intent classifier has been injected into the agent!")

## Part 4: The Search Tool

We now have hierarchical data models (Part 1), the progressive disclosure assembly (Part 2), and a intent classification node (Part 3). In this section, we will see how we define a tool to process the intent (returned by our classification node) and process the context assembly to return the right amount of *depth* based on the intent of the query. 

If you're unfamiliar with tools, a tool (we are using LangChain to define one) is simply a function that the LLM can invoke to perform a specific task. To make our retrieval logic accessible to the agent, we need to define two things:
1.  **The Input Schema (`SearchCoursesInput`)**: A Pydantic model that tells the LLM *what arguments* it can provide.
2.  **The Tool Function (`search_courses_tool`)**: The actual Python function that executes the logic.

In the agent code, the complex logic for connecting to Redis and assembling the context is already defined by a function called `search_courses_sync` (located in `agent/tools.py`). This function connects the intent (determined by the LLM via the intent classifier node) to the context assembler. For example, here is a snippet from that function: 

```python
# Inside search_courses_sync:
if intent == "GENERAL":
    # Use the assembler to return summaries only
    return context_assembler.assemble_summary_only_context(...)
else:
    # Use the assembler to return full details
    return context_assembler.assemble_hierarchical_context(...)
```

#### üìå **Task: Define the Search Courses Tool**

Your task is to define the `SearchCoursesInput` schema and the `search_courses_tool` function.

In the starter code below, you'll find we have imported the following:
- The `BaseModel` and `Field` class from pydantic. These are used to define structured input schemas. Don't worry if you haven't used Pydantic before. Think of it as defining a form that the LLM has to fill out. The most important part is the `description` field. It tells the LLM what each field means and what values are allowed.
- A `tool` decorator from `langchain_core.tools`. This converts our Python function into a LangChain tool

<details>
<summary>üõ†Ô∏è Show Implementation Details</summary>

1. **Define the intent field in the input schema (`SearchCoursesInput`)**:

   Create the intent field. You must list the valid categories (GENERAL, PREREQUISITES, SYLLABUS_OBJECTIVES, ASSIGNMENTS) in the description so the LLM knows which one to pick.

2. **Add the Tool Decorator**:

   Add the LangChain `@tool` decorator to the function. The tool needs to have two arguments passed into it: the tool name (`"search_courses"`) and the schema (`SearchCoursesInput`).

3. **Call the pre-defined retrieval function in the tool (`search_courses_tool`)**:

   Call the `search_courses_sync` function inside the `try` block with the correct parameters:
    - `query`: The user's query
    - `intent`: The intent selected by the LLM
    - `top_k`: Set to `5`
    - `use_optimized_format`: Set to `False` (we are using hierarchical format now)
      
</details>

If you get stuck, reference the solution dropdown below the implementation cell.

In [None]:
from pydantic import BaseModel, Field
from langchain_core.tools import tool
from agent.tools import search_courses_sync


# Define the Input Schema (provided for you - defining good schemas is the hard part!)
class SearchCoursesInput(BaseModel):
    """Input schema for search_courses tool."""

    query: str = Field(description="The search query for finding courses")
    
    # TODO: Define the 'intent' field
    # Use Field() with default="GENERAL" and a description listing valid categories:
    # GENERAL, PREREQUISITES, SYLLABUS_OBJECTIVES, ASSIGNMENTS


# TODO: Add the @tool decorator
# Pass the tool name "search_courses" and args_schema=SearchCoursesInput
def search_courses_tool(query: str, intent: str = "GENERAL") -> str:
    """
    Search for courses with intent-based hierarchical retrieval.
    
    Args:
        query: The user's search query
        intent: The intent category (GENERAL, PREREQUISITES, SYLLABUS_OBJECTIVES, ASSIGNMENTS)
        
    Returns:
        str: Formatted context with course information
    """
    
    # TODO: Check if course_manager is initialized
    # If not, return "Course search not available - CourseManager not initialized"
    
    
    try:
        # TODO: Call search_courses_sync
        # Call with: query, top_k=5, use_optimized_format=False, intent
        # Store result in 'result' variable
        
        return result
        
    except Exception as e:
        return f"Search failed: {str(e)}"

print("‚úÖ Tool schema defined!")

<details>
<summary>üóùÔ∏è Solution code</summary>

```python

from pydantic import BaseModel, Field
from langchain_core.tools import tool
from agent.tools import search_courses_sync

# Step 1 - Define the Input Schema
class SearchCoursesInput(BaseModel):
    """Input schema for search_courses tool."""

    query: str = Field(description="The search query for finding courses")
    intent: str = Field(
        default="GENERAL",
        description="Intent category: GENERAL (summaries only), PREREQUISITES (summaries only - prereq codes included), SYLLABUS_OBJECTIVES (full details), ASSIGNMENTS (full details)",
    )


# Step 2 - Define the Tool Function
@tool("search_courses", args_schema=SearchCoursesInput)
async def search_courses_tool(query: str, intent: str = "GENERAL") -> str:
    """
    Search for courses with intent-based hierarchical retrieval.
    """
    # Check if dependencies are ready (good practice)
    if not course_manager:
        return "Course search not available - CourseManager not initialized"

    try:
        result = search_courses_sync(
            query=query,
            top_k=5,
            use_optimized_format=False,
            intent=intent,
        )
        return result
    except Exception as e:
        return f"Search failed: {str(e)}"

print("‚úÖ Tool defined with intent-based retrieval!")
```

</details>

### Test Your Implementation

Run the test utility below to verify your tool works correctly with different intents.

In [None]:
# Import the test utility
import sys
from pathlib import Path

from test_search_tool import test_search_courses_tool

# Test your implementation
await test_search_courses_tool(search_courses_tool)

### Connect to the Agent

Now let's inject your tool implementation into the Stage 3 agent. This replaces the default search tool in the `agent_node` with your function.

In [None]:
# Import the agent module
from agent import set_search_tool

# Inject your implementation
set_search_tool(search_courses_tool)

print("‚úÖ Your search tool has been injected into the agent!")

## Part 5: Quality Evaluation (Context Validation)

We now have all the pieces to retrieve context intelligently (hierarchical data, intent classification, tool calling). But we're missing something critical: validation.

According to LangChain's [2025 State of Agent Engineering survey](https://www.langchain.com/state-of-agent-engineering#biggest-barriers-to-production), output quality and reliability remain the #1 barrier to putting agents into production. Teams can build sophisticated retrieval pipelines, but if they can't consistently validate the quality of what they're retrieving, they can't trust their systems in production.

What if the search returns irrelevant results? What if the retrieved context doesn't actually answer the question? In a production system, we can't afford to send poor-quality context to the LLM and hope for the best.

This is where quality evaluation comes in. It's a validation gate that checks: *"Is the retrieved context good enough?"*

To track quality in our agent,  we'll use a technique called LLM-as-a-Judge, one of the most widely adopted evaluation techniques (according to the same LangChain survey). 

If you're unfamiliar with the LLM-as-a-Judge technique, it's essentially using a separate LLM call specifically designed to evaluate quality against explicit criteria.

Here's how it works:

After our system retrieves context, we ask an LLM (serving the role of an evaluator) to score it on four dimensions:
- **Completeness**: Does this context fully answer the question?
- **Accuracy**: Is the information correct and relevant?
- **Relevance**: Does it directly address what was asked?
- **Grounding**: Does it reference specific courses (not generic knowledge)?

The evaluator returns a quality score (0.0 to 1.0):
- **Score ‚â• 0.7**: "Good enough" ‚Üí Proceed to synthesis
- **Score < 0.7**: "Insufficient" ‚Üí Trigger another search

This creates a feedback loop where poor results automatically trigger re-retrieval.

Let's now get started implementing it into the agent.

### üìå Task: Build the Quality Evaluator

Your task is to implement the `evaluate_context_quality` function, which determines whether the retrieved context is sufficient to answer the user's question.

You'll build a function that takes a `question` and `context` as input, uses an LLM to evaluate the context quality on 4 criteria (Completeness, Accuracy, Relevance, Grounding), and returns a tuple: `(score: float, reasoning: str)` where score is between 0.0 and 1.0.

The implementation cell below includes starter code with a pre-written evaluation prompt that instructs the LLM on the 4 criteria, and tasks to guide your implementation.

<details>
<summary>üõ†Ô∏è Show Detailed Implementation Steps</summary>
    
1. **Get the LLM Instance**

   The agent uses a pre-configured LLM for analysis tasks. Call `get_analysis_llm()` and assign it to the `llm` variable (already created in starter code).

2. **Send Prompt to LLM**

   Use the `llm` variable from the first step to call `.ainvoke()` with a message list. Wrap the `evaluation_prompt` (already provided) in a `HumanMessage` object (already imported): `[HumanMessage(content=evaluation_prompt)]`. Store the result in the `response` variable (already created in starter code).

3. **Extract and Parse the Score**

   Extract the text from `response.content` and strip whitespace, storing it in the `score_text` variable. Then, convert `score_text` to a float and store it in the `score` variable. Lastly, clamp `score` between 0.0 and 1.0 using `max(0.0, min(1.0, score))`

4. **Generate Reasoning**

   Use the `score` variable to create a human-readable explanation and store it in the `reasoning` variable. The threshold for "adequate" quality is 0.7 - scores at or above this are good enough to proceed.

5. **Error Handling**

   The try/except structure is already provided. In the except block, if parsing fails (ValueError), assign safe defaults to the variables: `score = 0.8` and `reasoning = "‚ö†Ô∏è Parsing error, defaulting to 0.8"`.


</details>

If you get stuck, reference the full solution dropdown below the test section.

In [None]:
from langchain_core.messages import HumanMessage
from agent.nodes import get_analysis_llm

async def evaluate_context_quality(question: str, context: str) -> tuple:
    """
    Evaluate the quality of retrieved context.
    
    Args:
        question: The user's original question
        context: The retrieved context to evaluate
    
    Returns:
        tuple: (score: float between 0.0-1.0, reasoning: str)
    
    This function implements the quality evaluation gate that decides
    whether context is good enough to use or if we need to search again.
    """
    
    # Starter variables
    llm = None
    response = None
    score_text = ""
    score = 0.0
    reasoning = ""
    
    # Evaluation prompt (already provided for you!)
    evaluation_prompt = f"""Evaluate the quality of this course search answer on a scale of 0.0 to 1.0.

Question: {question}
Answer: {context}

Criteria:
- Completeness: Does it fully answer the question?
- Accuracy: Is the course information correct and relevant?
- Relevance: Does it directly address what was asked?
- Grounding: Does it provide specific course details and stick to facts?

Respond with ONLY a number between 0.0 and 1.0 (e.g., 0.85)
"""
    
    try:
        # TODO: Get the LLM instance
        
        # TODO: Send prompt to LLM with HumanMessage
        
        # TODO: Extract response content and parse as float
        
        # TODO: Clamp score between 0.0 and 1.0
        
        # Generate reasoning based on score
        if score >= 0.7:
            reasoning = f"‚úÖ Adequate quality (score: {score:.2f})"
        else:
            reasoning = f"‚ö†Ô∏è Needs improvement (score: {score:.2f})"
        
    except ValueError:
        # Handle parsing errors with safe defaults
        score = 0.8
        reasoning = "‚ö†Ô∏è Parsing error, defaulting to 0.8"
    
    return score, reasoning

print("‚úÖ Evaluation function created!")

<details>
<summary>üóùÔ∏è Solution code</summary>

```python

from langchain_core.messages import HumanMessage
from agent.nodes import get_analysis_llm

async def evaluate_context_quality(question: str, context: str) -> tuple:
    """
    Evaluate the quality of retrieved context.
    
    Args:
        question: The user's original question
        context: The retrieved context to evaluate
    
    Returns:
        tuple: (score: float between 0.0-1.0, reasoning: str)
    
    This function implements the quality evaluation gate that decides
    whether context is good enough to use or if we need to search again.
    """
    
    # Starter variables
    llm = None
    response = None
    score_text = ""
    score = 0.0
    reasoning = ""
    
    # Evaluation prompt (already provided for you!)
    evaluation_prompt = f"""Evaluate the quality of this course search answer on a scale of 0.0 to 1.0.

Question: {question}
Answer: {context}

Criteria:
- Completeness: Does it fully answer the question?
- Accuracy: Is the course information correct and relevant?
- Relevance: Does it directly address what was asked?
- Grounding: Does it provide specific course details and stick to facts?

Respond with ONLY a number between 0.0 and 1.0 (e.g., 0.85)
"""
    
    try:
        # TODO 1: Get the LLM instance
        llm = get_analysis_llm()
        
        # TODO 2: Send prompt to LLM
        response = await llm.ainvoke([HumanMessage(content=evaluation_prompt)])
        
        # TODO 3: Extract and parse the score
        score_text = response.content.strip()
        score = float(score_text)
        score = max(0.0, min(1.0, score))
        
        # TODO 4: Generate reasoning
        if score >= 0.7:
            reasoning = f"‚úÖ Adequate quality (score: {score:.2f})"
        else:
            reasoning = f"‚ö†Ô∏è Needs improvement (score: {score:.2f})"
        
    except ValueError:
        # TODO 5: Error handling
        score = 0.8
        reasoning = "‚ö†Ô∏è Parsing error, defaulting to 0.8"
    
    return score, reasoning

print("‚úÖ Evaluation function created!")
```

</details>

### Test Your Implementation

Run the test utility below to verify your quality evaluator works correctly with different context quality levels.

In [None]:
# Import the test utility
import sys
from pathlib import Path

from test_quality_evaluator import test_quality_evaluator

# Test your implementation
await test_quality_evaluator(evaluate_context_quality)

### Connect to the Agent

Now, let's inject your quality evaluator into the Stage 3 agent. This replaces the default evaluation logic in `evaluate_quality_node` with your function.

In [None]:
# Import the agent module
from agent import set_evaluate_quality_function

# Inject your implementation
set_evaluate_quality_function(evaluate_context_quality)

print("‚úÖ Your quality evaluator has been injected into the agent!")
print("üéØ The evaluate_quality_node will now use your implementation")

## Part 6: Running the complete agent

Great job. At this point you have explored and built the five core components of the Stage 3 agent:

1. The hierarchical data models (CourseSummary & CourseDetails)  
2. The Context assembler with progressive disclosure  
3. The intent classifier for query-aware routing  
4. The Search tool with intent-based retrieval  
5. The Quality evaluator using a LLM-as-a-Judge pattern

Now it's time to see everything work together in the full workflow. This workflow includes one additional component we haven't explicitly built: a greeting handler node that responds to pleasantries without triggering any search or retrieval (pure cost optimization). You can find the code for this pre-built node in the `nodes.py` file of the agent.

Let's now run the full agent and test it with a variety of queries to see how your implementations handle different scenarios and optimize token usage.

### Test 1: Greeting Query (Zero-Cost Route)

First, let's test the greeting handler. The intent classifier will identify this as a GREETING, and the workflow will route directly to the greeting node‚Äîskipping all retrieval, tool calling, and quality evaluation.

Expected behavior:
- Intent: GREETING
- Route: classify_intent ‚Üí handle_greeting ‚Üí END
- Retrieval: None
- Token cost: ~100 (just the greeting response)

Run the code block below.

In [None]:
# Import the test utility to run tests
from test_util import run_agent_test

# Query 1: Greeting (tests the greeting handler)
query1 = "Hello! How are you doing?"
await run_agent_test(workflow, query1)

### Test 2: General Query (Summary-Only Retrieval)

Now, let's test a **GENERAL** intent query. Your intent classifier will detect the need for course overviews, and your search tool will retrieve summaries only (not full details).

Expected behavior:
- Intent: GENERAL
- Route: classify_intent ‚Üí agent ‚Üí search_courses_tool (summary mode) ‚Üí evaluate_quality ‚Üí synthesize
- Retrieval: CourseSummary objects only
- Token cost: ~1500

Run the code block below.

In [None]:
# Query 2: General question (low cost, summaries only)
query2 = "What computer science courses are available?"
await run_agent_test(workflow, query2)

### Test 3: Detailed Query (Hierarchical Retrieval)

Now a **SYLLABUS_OBJECTIVES** query. With this query, the intent classifier will detect that it requires detailed information, and the search tool will utilize hierarchical retrieval (summaries and full details with progressive disclosure).

Expected behavior:
- Intent: SYLLABUS_OBJECTIVES
- Route: classify_intent ‚Üí agent ‚Üí search_courses_tool (hierarchical mode) ‚Üí evaluate_quality ‚Üí synthesize
- Retrieval: 5 CourseSummary + 2-3 CourseDetails with full syllabi
- Token cost: ~4000-5000 (summaries + detailed syllabi + LLM generation)
  
Run the code block below.

In [None]:
# Query 3: Detailed question (hierarchical retrieval)
query3 = "What are the learning objectives for CS002?"
await run_agent_test(workflow, query3)

### Test 4: Assignment Query (Another Detailed Scenario)

Let's additionally test an **ASSIGNMENTS** intent to see hierarchical retrieval in action again.

Expected behavior:
- Intent: ASSIGNMENTS
- Route: classify_intent ‚Üí agent ‚Üí search_courses_tool (hierarchical mode) ‚Üí evaluate_quality ‚Üí synthesize
- Retrieval: 5 CourseSummary + 2-3 CourseDetails with full syllabi
- Token cost: ~4000-5000 (summaries + detailed syllabi + LLM generation)
  
Run the code block below.

In [None]:
# Query 4: Assignment-specific question
query4 = "What assignments are in CS002?"
await run_agent_test(workflow, query4)

### Final Comparison: Adaptive Retrieval in Action

Now let's run all four test queries to see how your hierarchical retrieval system adapts to different query types. This will demonstrate the token savings and quality improvements from intent-driven context selection.

We'll test:
1. **Greeting** ‚Üí No retrieval (minimal tokens)
2. **General overview** ‚Üí Summaries only (~1500 tokens)
3. **Learning objectives** ‚Üí Hierarchical retrieval (~4000-5000 tokens)
4. **Assignments** ‚Üí Hierarchical retrieval (~4000-5000 tokens)

Run the code block below to see the full comparison and token analysis.

In [None]:
# Import the comparison test function
from test_comparison import run_comparison_tests

# Run the comparison (pass the already-created workflow)
await run_comparison_tests(workflow)

## Looking Ahead: The Limits of Semantic Search

With hierarchical retrieval and intent classification, we've made significant progress in shaping *what* information reaches the LLM and *how much* of it. But context engineering isn't just about structure and volume‚Äîit's also about precision. Before we wrap up, let's examine a subtle limitation in how we *select* context.

When users ask about a specific course code, vector search finds it through semantic similarity - not guaranteed exact matching. Run this experiment to see what happens:

In [None]:
# Experiment: How reliably does vector search find an exact course code?
from redisvl.query import VectorQuery
import re

test_queries = [
    "What are the learning objectives for CS002?",
    "Tell me about CS014",
    "What are the prerequisites for ARCH050?",
]

for query in test_queries:
    query_embedding = await course_manager.embeddings.aembed_query(query)
    
    vector_query = VectorQuery(
        vector=query_embedding,
        vector_field_name="content_vector",
        return_fields=["course_code", "title"],
        num_results=5,
    )
    
    results = course_manager.vector_index.query(vector_query)
    result_list = results if isinstance(results, list) else results.docs
    
    # Extract the course code from the query
    mentioned_code = re.search(r'([A-Z]{2,4}\d{3})', query)
    target_code = mentioned_code.group(1) if mentioned_code else "?"
    
    codes = [getattr(r, 'course_code', r.get('course_code', '')) if isinstance(r, dict) else getattr(r, 'course_code', '') for r in result_list]
    
    rank = codes.index(target_code) + 1 if target_code in codes else "Not in top 5"
    
    print(f"Query: '{query}'")
    print(f"  Target: {target_code} ‚Üí Rank: {rank}")
    print(f"  Top 3: {codes[:3]}")
    print()

Notice how the target course isn't always the #1 result. Sometimes it's 2nd, 3rd, or even lower. When a user explicitly mentions "CS002", they expect information about CS002, not whatever course happens to be semantically closest to their phrasing. Relying on "semantic luck" to surface the right course creates an unpredictable user experience. In Stage 4, we'll introduce hybrid search, a way of searching that combines exact matching with semantic search, ensuring that when users mention specific course codes, we find exactly what they asked for.

## Wrap Up üèÅ

You've completed Stage 3 and transformed your RAG system from static retrieval into an adaptive, intent-driven agent.

In this stage, you learned you:

- Implemented the Context Assembler that provides different levels of detail (summaries vs. full course details) based on query requirements
- Implemented the Intent Classifier that analyzes queries and routes them to appropriate retrieval strategies
- Created the Search Tool that combines intent classification with progressive disclosure (breadth-first, then depth)
- Used the Quality Evaluator with LLM-as-a-Judge to ensure responses meet quality thresholds before delivery

The key transformation: your agent now adapts retrieval depth to query complexity, achieving a ~30-50% token reduction while maintaining quality.

In Stage 4, you'll level up with hybrid search (combining semantic + exact matching) and the ReAct agent architecture (giving your agent visible reasoning and tool-calling capabilities).