# Assignment 2: Advanced RAG Techniques
## Day 6 Session 2 - Advanced RAG Fundamentals

**OBJECTIVE:** Implement advanced RAG techniques including postprocessors, response synthesizers, and structured outputs.

**LEARNING GOALS:**
- Understand and implement node postprocessors for filtering and reranking
- Learn different response synthesis strategies (TreeSummarize, Refine)
- Create structured outputs using Pydantic models
- Build advanced retrieval pipelines with multiple processing stages

**DATASET:** Use the same data folder as Assignment 1 (`data/`)

**PREREQUISITES:** Complete Assignment 1 first

**INSTRUCTIONS:**
1. Configure your OpenAI API key when prompted
2. Run each cell in order
3. Each technique builds on the previous one
4. Functions are already implemented - focus on understanding the concepts

---
## üîë Setup: Configure Your OpenAI API Key

**REQUIRED for this assignment:** Advanced RAG techniques use LLM operations that require an API key.

### Get Your API Key:
1. Go to: https://platform.openai.com/api-keys
2. Sign up or log in
3. Create a new API key
4. Copy the key (starts with `sk-proj-...` or `sk-...`)

### Cost Estimate:
- Model: GPT-4o-mini (~$0.15 per 1M input tokens, ~$0.60 per 1M output tokens)
- This assignment: ~10-20 queries √ó ~500 tokens each = **$0.01 - $0.02 total cost**
- Very affordable for learning!

### How to Enter Your API Key:
Run the cell below and paste your API key when prompted. It will be securely stored for this session only.

In [None]:
# OpenAI API Key Configuration (REQUIRED)
import os
from getpass import getpass

# Check if API key is already set in environment
if not os.getenv("OPENAI_API_KEY"):
    print("\nüîë OpenAI API Key Required")
    print("=" * 50)
    print("This assignment uses OpenAI GPT-4o-mini for LLM operations.")
    print("\nGet your API key from: https://platform.openai.com/api-keys")
    print("Expected cost: ~$0.01-0.02 for this entire assignment\n")
    
    api_key = getpass("Paste your OpenAI API key: ").strip()
    
    if api_key:
        os.environ["OPENAI_API_KEY"] = api_key
        print("\n‚úÖ OpenAI API key configured successfully!")
        print("   You're ready for advanced RAG operations.")
    else:
        print("\n‚ö†Ô∏è  No API key entered. LLM operations will fail.")
        print("   Please run this cell again and enter your API key.")
else:
    print("‚úÖ OpenAI API key already configured in environment")
    print("   Ready for advanced RAG operations!")

---
## üìö Step 1: Import Advanced RAG Libraries

**What this does:**
- Imports all necessary components for advanced RAG techniques
- Includes postprocessors, response synthesizers, and output parsers
- Imports Pydantic for structured outputs

**New Components (vs Assignment 1):**
- `SimilarityPostprocessor`: Filters low-quality results
- `TreeSummarize`, `Refine`: Different ways to synthesize answers
- `PydanticOutputParser`: Creates structured, validated outputs
- `OpenAI`: LLM integration for generating responses

In [None]:
# Import required libraries for advanced RAG
import os
from pathlib import Path
from typing import Dict, List, Optional, Any
from pydantic import BaseModel, Field

# Core LlamaIndex components
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, StorageContext, Settings
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.retrievers import VectorIndexRetriever

# Vector store
from llama_index.vector_stores.lancedb import LanceDBVectorStore

# Embeddings and LLM (Using OpenAI)
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.openai import OpenAI

# Advanced RAG components
from llama_index.core.postprocessor import SimilarityPostprocessor
from llama_index.core.response_synthesizers import TreeSummarize, Refine, CompactAndRefine
from llama_index.core.output_parsers import PydanticOutputParser
from llama_index.core.program import LLMTextCompletionProgram

print("‚úÖ Advanced RAG libraries imported successfully!")
print("   Using OpenAI for LLM operations")

---
## ‚öôÔ∏è Step 2: Configure Advanced RAG Settings

**What this does:**
- Configures OpenAI GPT-4o-mini as the LLM (for generating responses)
- Uses local HuggingFace embeddings (same as Assignment 1, free!)
- Sets optimized chunk size for better precision

**Why GPT-4o-mini?**
- ‚úÖ Cost-effective (~10x cheaper than GPT-4)
- ‚úÖ Fast responses (~1-2 seconds)
- ‚úÖ Good quality for learning and many applications
- ‚úÖ Perfect for this assignment (~$0.01-0.02 total)

**Temperature = 0.1:**
- Low temperature = More consistent, focused responses
- Good for factual RAG applications
- Less creative randomness

**Chunk Size = 512:**
- Smaller chunks = Better precision (find exact relevant parts)
- Assignment 1 used default (~1024)
- 512 is optimized for detailed retrieval

In [None]:
# Configure Advanced RAG Settings (Using OpenAI)
def setup_advanced_rag_settings():
    """
    Configure LlamaIndex with optimized settings for advanced RAG.
    Uses local embeddings and OpenAI for LLM operations.
    """
    # Check for OpenAI API key
    api_key = os.getenv("OPENAI_API_KEY")
    if not api_key:
        print("‚ö†Ô∏è  OPENAI_API_KEY not found!")
        print("   Please run the API key configuration cell above.")
        print("   LLM operations will fail without an API key.")
        return False
    
    print("‚úÖ OpenAI API key found - configuring advanced RAG...")
    
    # Configure OpenAI LLM
    Settings.llm = OpenAI(
        api_key=api_key,
        model="gpt-4o-mini",  # Cost-effective model for learning
        temperature=0.1  # Lower temperature for more consistent responses
    )
    print("   Using model: gpt-4o-mini (cost-optimized)")
    print("   Temperature: 0.1 (consistent, factual responses)")
    
    # Configure local embeddings (no API key required, same as Assignment 1)
    print("\nüîÑ Loading local embedding model...")
    Settings.embed_model = HuggingFaceEmbedding(
        model_name="BAAI/bge-small-en-v1.5",
        trust_remote_code=True
    )
    
    # Advanced RAG configuration
    Settings.chunk_size = 512  # Smaller chunks for better precision
    Settings.chunk_overlap = 50
    
    print("‚úÖ Advanced RAG settings configured successfully!")
    print("   - Chunk size: 512 (optimized for precision)")
    print("   - Chunk overlap: 50 (maintains context across chunks)")
    print("   - Using local embeddings (free, 384 dimensions)")
    print("   - OpenAI LLM ready for response synthesis")
    return True

# Setup the configuration
config_success = setup_advanced_rag_settings()

if not config_success:
    print("\n‚ùå Configuration failed. Please configure API key above and retry.")

---
## üìÇ Step 3: Create Basic Index (Reuse from Assignment 1)

**What this does:**
- Creates the foundational vector index that we'll enhance with advanced techniques
- Reuses the same concepts from Assignment 1 (document loading, vector store, indexing)
- Creates a separate database (`advanced_rag_vectordb`) so it doesn't conflict with Assignment 1

**Why a separate database?**
- Assignment 1 database: `./assignment_vectordb/`
- Assignment 2 database: `./advanced_rag_vectordb/`
- Keeps assignments independent
- Uses optimized chunk size (512 vs default)

**This is the foundation** - Advanced techniques in the following cells will enhance this basic index with:
- Similarity filtering
- Better response synthesis
- Structured outputs

In [None]:
# Setup: Create index from Assignment 1 (reuse the basic functionality)
def setup_basic_index(data_folder: str = "data", force_rebuild: bool = False):
    """
    Create a basic vector index that we'll enhance with advanced techniques.
    This reuses the concepts from Assignment 1.
    """
    # Create vector store
    vector_store = LanceDBVectorStore(
        uri="./advanced_rag_vectordb",
        table_name="documents"
    )
    
    # Load documents
    if not Path(data_folder).exists():
        print(f"‚ùå Data folder not found: {data_folder}")
        print("   Make sure you're in the correct directory with the 'data' folder.")
        return None
        
    print(f"üìÇ Loading documents from: {data_folder}")
    reader = SimpleDirectoryReader(input_dir=data_folder, recursive=True)
    documents = reader.load_data()
    print(f"   Loaded {len(documents)} documents")
    
    # Create storage context and index
    print("\nüîó Creating vector index...")
    print("   (This may take 30-60 seconds for ~39 documents...)")
    storage_context = StorageContext.from_defaults(vector_store=vector_store)
    index = VectorStoreIndex.from_documents(
        documents, 
        storage_context=storage_context,
        show_progress=True
    )
    
    print(f"\n‚úÖ Basic index created with {len(documents)} documents")
    print("   Ready for advanced RAG techniques!")
    return index

# Create the basic index
print("üöÄ Setting up basic index for advanced RAG...")
print("=" * 50)
index = setup_basic_index()

if index:
    print("\n" + "=" * 50)
    print("‚úÖ Ready to implement advanced RAG techniques!")
    print("   The following cells will add:")
    print("   1. Similarity filtering (remove irrelevant results)")
    print("   2. TreeSummarize (better response synthesis)")
    print("   3. Structured outputs (Pydantic models)")
    print("   4. Combined advanced pipeline")
else:
    print("\n‚ùå Failed to create index - check data folder path")

---
## üéØ Technique 1: Similarity Filtering (Postprocessor)

**What this technique does:**
- Filters out retrieved chunks that score below a relevance threshold
- Improves response quality by removing "noise"
- Reduces API costs (fewer tokens sent to LLM)

**Key Concept - Postprocessors:**
Postprocessors refine retrieval results **after** the initial vector search but **before** sending to the LLM. Think of it as a quality control step.

**How Similarity Filtering Works:**
1. Vector search retrieves top 10 chunks
2. Each chunk has a similarity score (0.0 to 1.0)
3. SimilarityPostprocessor filters out chunks below threshold (e.g., 0.3)
4. Only high-quality chunks (score ‚â• 0.3) go to the LLM

**Example:**
```
Query: "AI agent architectures"

Initial retrieval (10 chunks):
- Chunk 1: Score 0.85 ‚úÖ (about AI agents - VERY RELEVANT)
- Chunk 2: Score 0.72 ‚úÖ (about agent frameworks - RELEVANT)
- Chunk 3: Score 0.65 ‚úÖ (about system design - RELEVANT)
- Chunk 4: Score 0.28 ‚ùå (about cooking recipes - NOT RELEVANT)
- Chunk 5: Score 0.15 ‚ùå (about finance - NOT RELEVANT)
- ... (5 more low-scoring chunks)

After SimilarityPostprocessor (cutoff=0.3):
- Only Chunks 1, 2, 3 passed (scores ‚â• 0.3)
- Result: Cleaner context for LLM, better answers
```

**Why it matters:**
- ‚úÖ Removes irrelevant results that confuse the LLM
- ‚úÖ Reduces API costs (fewer tokens)
- ‚úÖ Improves answer quality and focus
- ‚úÖ Typical cutoff: 0.3 (adjustable based on your needs)

**Parameters:**
- `similarity_cutoff`: Minimum score (0.0-1.0). Common: 0.3-0.5
- `top_k`: How many chunks to retrieve initially (before filtering)

In [None]:
def create_query_engine_with_similarity_filter(index, similarity_cutoff: float = 0.3, top_k: int = 10):
    """
    Create a query engine that filters results based on similarity scores.
    
    Args:
        index: Vector index to query
        similarity_cutoff: Minimum similarity score (0.0 to 1.0)
        top_k: Number of initial results to retrieve before filtering
        
    Returns:
        Query engine with similarity filtering
    """
    # Create similarity postprocessor with the cutoff threshold
    similarity_processor = SimilarityPostprocessor(similarity_cutoff=similarity_cutoff)
    
    # Create query engine with similarity filtering
    query_engine = index.as_query_engine(
        similarity_top_k=top_k,
        node_postprocessors=[similarity_processor]
    )
    
    return query_engine

# Test the function
if index:
    print("üîß Creating query engine with similarity filtering...")
    filtered_engine = create_query_engine_with_similarity_filter(index, similarity_cutoff=0.3)
    
    if filtered_engine:
        print("‚úÖ Query engine with similarity filtering created")
        print("   Settings: Retrieve 10, filter out scores < 0.3")
        
        # Test query
        test_query = "What are the benefits of AI agents?"
        print(f"\nüîç Testing query: '{test_query}'")
        print("   (This will make an OpenAI API call - ~$0.001 cost)\n")
        
        # Test the response
        response = filtered_engine.query(test_query)
        print(f"üìù Filtered Response:\n{response}")
        
        print("\nüí° Notice: Only high-quality, relevant chunks were used!")
    else:
        print("‚ùå Failed to create filtered query engine")
else:
    print("‚ùå No index available - run previous cells first")

---
## üå≥ Technique 2: TreeSummarize (Response Synthesizer)

**What this technique does:**
- Changes **how** the LLM combines multiple retrieved chunks into a final answer
- Uses hierarchical summarization (like building a tree from bottom to top)
- Better for complex analytical questions

**Key Concept - Response Synthesizers:**
Response synthesizers control how retrieved information becomes the final answer. Different strategies work better for different query types.

**Available Synthesizers:**
1. **TreeSummarize** (this cell):
   - Builds response hierarchically
   - Summarizes pairs of chunks, then summarizes summaries
   - Good for: Comprehensive analysis, "compare X and Y", long responses

2. **Refine** (not shown here):
   - Iteratively improves answer chunk by chunk
   - Good for: Detailed explanations, evolving answers

3. **CompactAndRefine** (not shown here):
   - Combines chunks first, then refines
   - Good for: Balance between quality and speed

**How TreeSummarize Works:**
```
Retrieved Chunks: [A, B, C, D]

Level 1 (pair summaries):
  Summary_AB = Summarize(A, B)
  Summary_CD = Summarize(C, D)

Level 2 (combine summaries):
  Final_Answer = Summarize(Summary_AB, Summary_CD)
```

**Example Query Types:**
- ‚úÖ "Compare the advantages and disadvantages of X"
- ‚úÖ "Explain the evolution of Y from early to modern"
- ‚úÖ "Analyze the relationship between A and B"
- ‚ùå "What is X?" (simple factual - default synthesizer is fine)

**Why it matters:**
- ‚úÖ More comprehensive answers for complex queries
- ‚úÖ Better synthesis across multiple sources
- ‚úÖ Maintains context across many chunks
- ‚ö†Ô∏è Slightly more API calls (but better quality)

In [None]:
def create_query_engine_with_tree_summarize(index, top_k: int = 5):
    """
    Create a query engine that uses TreeSummarize for comprehensive responses.
    
    Args:
        index: Vector index to query
        top_k: Number of results to retrieve
        
    Returns:
        Query engine with TreeSummarize synthesis
    """
    # Create TreeSummarize response synthesizer
    tree_synthesizer = TreeSummarize()
    
    # Create query engine with the synthesizer
    query_engine = index.as_query_engine(
        similarity_top_k=top_k,
        response_synthesizer=tree_synthesizer
    )
    
    return query_engine

# Test the function
if index:
    print("üå≥ Creating query engine with TreeSummarize...")
    tree_engine = create_query_engine_with_tree_summarize(index)
    
    if tree_engine:
        print("‚úÖ Query engine with TreeSummarize created")
        print("   Best for: Analytical queries, comparisons, comprehensive answers")
        
        # Test with a complex analytical query
        analytical_query = "Compare the advantages and disadvantages of different AI agent frameworks"
        print(f"\nüîç Testing analytical query: '{analytical_query}'")
        print("   (This will make OpenAI API calls for hierarchical summarization)\n")
        
        # Test the response
        response = tree_engine.query(analytical_query)
        print(f"üìù TreeSummarize Response:\n{response}")
        
        print("\nüí° Notice: More comprehensive analysis by building answer hierarchically!")
    else:
        print("‚ùå Failed to create TreeSummarize query engine")
else:
    print("‚ùå No index available - run previous cells first")

---
## üìä Technique 3: Structured Outputs (Pydantic Models)

**What this technique does:**
- Forces LLM to return data in a specific, validated structure
- Uses Pydantic models to define the exact output format
- Essential for API endpoints, databases, and data pipelines

**Key Concept - Structured Outputs:**
Instead of free-text responses, you get type-safe, validated data structures that applications can reliably process.

**Problem with Free-Text Responses:**
```python
# Free-text response (unpredictable)
response = "AI agents are systems that can reason. Key capabilities include planning, tool use..."

# How do you extract:
# - The title?
# - List of key points? (parsing is error-prone)
# - Applications? (where do they start/end?)
```

**Solution with Structured Outputs:**
```python
# Structured response (predictable)
response = ResearchPaperInfo(
    title="AI Agents and Their Capabilities",
    key_points=["reasoning", "planning", "tool execution"],
    applications=["autonomous systems", "financial analysis"],
    summary="AI agents are autonomous systems..."
)

# Easy to use:
print(response.title)  # Direct access
for point in response.key_points:  # Iterate list
    print(point)
```

**Pydantic Model Example:**
```python
class ResearchPaperInfo(BaseModel):
    title: str  # Must be a string
    key_points: List[str]  # Must be a list of strings
    applications: List[str]  # Must be a list of strings
    summary: str  # Must be a string
```

**Why it matters:**
- ‚úÖ **Predictable outputs** - Always the same structure
- ‚úÖ **Type safety** - Pydantic validates data types
- ‚úÖ **Easy integration** - Works with databases, APIs, JSON
- ‚úÖ **Error prevention** - Catches invalid outputs early

**Use Cases:**
- REST API endpoints (return JSON)
- Database inserts (structured records)
- Data pipelines (consistent format)
- Frontend applications (predictable data)

In [None]:
# First, define the Pydantic model for structured outputs  
class ResearchPaperInfo(BaseModel):
    """Structured information about a research paper or AI concept."""
    title: str = Field(description="The main title or concept name")
    key_points: List[str] = Field(description="3-5 main points or findings")
    applications: List[str] = Field(description="Practical applications or use cases")
    summary: str = Field(description="Brief 2-3 sentence summary")

def create_structured_output_program(output_model: BaseModel = ResearchPaperInfo):
    """
    Create a structured output program using Pydantic models.
    
    Args:
        output_model: Pydantic model class for structured output
        
    Returns:
        LLMTextCompletionProgram that returns structured data
    """
    # Create output parser with the Pydantic model
    output_parser = PydanticOutputParser(output_cls=output_model)
    
    # Create the structured output program
    prompt_template_str = """
    Based on the following context and query, extract structured information.
    
    Context: {context}
    Query: {query}
    
    {format_instructions}
    """
    
    program = LLMTextCompletionProgram.from_defaults(
        output_parser=output_parser,
        prompt_template_str=prompt_template_str,
        verbose=True
    )

    return program

# Test the function
if index:
    print("üìä Creating structured output program...")
    structured_program = create_structured_output_program(ResearchPaperInfo)
    
    if structured_program:
        print("‚úÖ Structured output program created")
        print("   Output format: ResearchPaperInfo (Pydantic model)")
        print("   Fields: title, key_points, applications, summary")
        
        # Test with retrieval and structured extraction
        structure_query = "Tell me about AI agents and their capabilities"
        print(f"\nüîç Testing structured query: '{structure_query}'")
        
        # Get context for structured extraction
        print("   Step 1: Retrieving relevant context...")
        retriever = VectorIndexRetriever(index=index, similarity_top_k=3)
        nodes = retriever.retrieve(structure_query)
        context = "\n".join([node.text for node in nodes])
        print(f"   Retrieved {len(nodes)} relevant chunks")
        
        # Generate structured response
        print("\n   Step 2: Generating structured output...")
        print("   (This will make an OpenAI API call)\n")
        response = structured_program(context=context, query=structure_query)
        
        print(f"üìä Structured Response:")
        print(f"\n   Title: {response.title}")
        print(f"\n   Key Points:")
        for i, point in enumerate(response.key_points, 1):
            print(f"      {i}. {point}")
        print(f"\n   Applications:")
        for i, app in enumerate(response.applications, 1):
            print(f"      {i}. {app}")
        print(f"\n   Summary: {response.summary}")
        
        print("\nüí° Output format validated:")
        print(f"   ‚úÖ Type: {type(response).__name__}")
        print(f"   ‚úÖ Title: {type(response.title).__name__}")
        print(f"   ‚úÖ Key points: List with {len(response.key_points)} items")
        print(f"   ‚úÖ Applications: List with {len(response.applications)} items")
        print(f"   ‚úÖ Summary: {len(response.summary)} characters")
    else:
        print("‚ùå Failed to create structured output program")
else:
    print("‚ùå No index available - run previous cells first")

---
## üöÄ Technique 4: Advanced RAG Pipeline (Combining All Techniques)

**What this technique does:**
- Combines multiple advanced techniques into a single powerful query engine
- Similarity filtering **+** TreeSummarize response synthesis
- Best of both worlds: Clean results + comprehensive answers

**Key Concept - Production RAG Systems:**
In real-world applications, you rarely use just one technique. Production RAG systems combine multiple techniques for optimal results.

**How the Advanced Pipeline Works:**
```
User Query: "Analyze AI agent architectures"
    ‚Üì
Step 1: Vector Search
    ‚Üí Retrieve top 10 chunks from vector database
    ‚Üì
Step 2: Similarity Filtering (Postprocessor)
    ‚Üí Filter out chunks with score < 0.3
    ‚Üí Result: 5-7 high-quality chunks
    ‚Üì
Step 3: TreeSummarize (Response Synthesizer)
    ‚Üí Build hierarchical summary of chunks
    ‚Üí Level 1: Pair-wise summaries
    ‚Üí Level 2: Combine into final answer
    ‚Üì
Final Response: Comprehensive, relevant, well-synthesized answer
```

**Benefits of Combining Techniques:**
1. **Similarity Filtering** removes noise ‚Üí Cleaner input for LLM
2. **TreeSummarize** builds comprehensive answer ‚Üí Better output quality
3. **Together** ‚Üí High-quality results + comprehensive analysis

**When to use this:**
- ‚úÖ Production applications (where quality matters)
- ‚úÖ Complex analytical queries
- ‚úÖ When you need both precision and comprehensiveness
- ‚úÖ API endpoints serving end users

**When NOT to use this:**
- ‚ùå Simple factual queries ("What is X?") - basic RAG is fine
- ‚ùå Extremely cost-sensitive applications - more API calls
- ‚ùå Real-time systems needing <100ms response - adds latency

In [None]:
def create_advanced_rag_pipeline(index, similarity_cutoff: float = 0.3, top_k: int = 10):
    """
    Create a comprehensive advanced RAG pipeline combining multiple techniques.
    
    Args:
        index: Vector index to query
        similarity_cutoff: Minimum similarity score for filtering
        top_k: Number of initial results to retrieve
        
    Returns:
        Advanced query engine with filtering and synthesis combined
    """
    # Create similarity postprocessor
    similarity_processor = SimilarityPostprocessor(similarity_cutoff=similarity_cutoff)
    
    # Create TreeSummarize for comprehensive responses
    tree_synthesizer = TreeSummarize()
    
    # Create the comprehensive query engine combining both techniques
    advanced_engine = index.as_query_engine(
        similarity_top_k=top_k,
        node_postprocessors=[similarity_processor],
        response_synthesizer=tree_synthesizer
    )
    
    return advanced_engine

# Test the comprehensive pipeline
if index:
    print("üöÄ Creating advanced RAG pipeline...")
    print("   Combining:")
    print("   - Similarity filtering (remove noise)")
    print("   - TreeSummarize (comprehensive synthesis)")
    
    advanced_pipeline = create_advanced_rag_pipeline(index)
    
    if advanced_pipeline:
        print("\n‚úÖ Advanced RAG pipeline created successfully!")
        print("   üîß Similarity filtering: ‚úÖ (cutoff 0.3)")
        print("   üå≥ TreeSummarize synthesis: ‚úÖ")
        
        # Test with complex query
        complex_query = "Analyze the current state and future potential of AI agent technologies"
        print(f"\nüîç Testing complex query: '{complex_query}'")
        print("   (This combines both techniques for best results)\n")
        
        # Test the response
        response = advanced_pipeline.query(complex_query)
        print(f"üöÄ Advanced RAG Response:\n{response}")
        
        print("\nüéØ This response provides:")
        print("   ‚úÖ Filtered relevant results only (no noise)")
        print("   ‚úÖ Comprehensive analytical response (hierarchical synthesis)")
        print("   ‚úÖ Production-quality output")
    else:
        print("‚ùå Failed to create advanced RAG pipeline")
else:
    print("‚ùå No index available - run previous cells first")

---
## üÜö Final Test: Compare Basic vs Advanced RAG

**What this cell does:**
- Tests the same queries with **basic RAG** vs **advanced RAG**
- Shows you the quality improvements from advanced techniques
- Validates that all 5 components work correctly

**Components to Test:**
1. ‚úÖ Basic Index (foundation)
2. ‚úÖ Similarity Filter (postprocessor)
3. ‚úÖ TreeSummarize (response synthesizer)
4. ‚úÖ Structured Output (Pydantic models)
5. ‚úÖ Advanced Pipeline (combined techniques)

**Test Queries:**
- Query 1: Key capabilities (factual)
- Query 2: Evaluation metrics (analytical)
- Query 3: Benefits and challenges (comparative)

**What to look for:**
- Basic RAG: Functional answers
- Advanced RAG: More focused, comprehensive, better-synthesized answers

**Expected differences:**
- Advanced responses should be more relevant (filtered)
- Advanced responses should be more comprehensive (TreeSummarize)
- Less irrelevant information in advanced responses

In [None]:
# Final comparison: Basic vs Advanced RAG
print("üöÄ Advanced RAG Techniques Assignment - Final Test")
print("=" * 60)

# Test queries for comparison
test_queries = [
    "What are the key capabilities of AI agents?",
    "How do you evaluate agent performance metrics?",
    "Explain the benefits and challenges of multimodal AI systems"
]

# Check if all components were created
components_status = {
    "Basic Index": index is not None,
    "Similarity Filter": 'filtered_engine' in locals() and filtered_engine is not None,
    "TreeSummarize": 'tree_engine' in locals() and tree_engine is not None,
    "Structured Output": 'structured_program' in locals() and structured_program is not None,
    "Advanced Pipeline": 'advanced_pipeline' in locals() and advanced_pipeline is not None
}

print("\nüìä Component Status:")
for component, status in components_status.items():
    status_icon = "‚úÖ" if status else "‚ùå"
    print(f"   {status_icon} {component}")

# Create basic query engine for comparison
if index:
    print("\nüîç Creating basic query engine for comparison...")
    basic_engine = index.as_query_engine(similarity_top_k=5)
    
    print("\n" + "=" * 60)
    print("üÜö COMPARISON: Basic vs Advanced RAG")
    print("=" * 60)
    print("\n‚è±Ô∏è  Note: This will make multiple OpenAI API calls (~$0.03-0.05 total)")
    
    for i, query in enumerate(test_queries, 1):
        print(f"\nüìã Test Query {i}: '{query}'")
        print("-" * 50)
        
        # Basic RAG
        print("üîπ Basic RAG:")
        if basic_engine:
            basic_response = basic_engine.query(query)
            print(f"   {str(basic_response)[:200]}...")
        
        # Advanced RAG (if implemented)
        print("\nüî∏ Advanced RAG:")
        if components_status["Advanced Pipeline"]:
            advanced_response = advanced_pipeline.query(query)
            print(f"   {str(advanced_response)[:200]}...")
        else:
            print("   Complete the advanced pipeline function to test")

# Final status
print("\n" + "=" * 60)
print("üéØ Assignment Status:")
completed_count = sum(components_status.values())
total_count = len(components_status)

print(f"   Completed: {completed_count}/{total_count} components")

if completed_count == total_count:
    print("\nüéâ Congratulations! You've mastered Advanced RAG Techniques!")
    print("   ‚úÖ Node postprocessors for result filtering")
    print("   ‚úÖ Response synthesizers for better answers")
    print("   ‚úÖ Structured outputs for reliable data")
    print("   ‚úÖ Advanced pipelines combining all techniques")
    print("\nüöÄ You're ready for production RAG systems!")
    print("\nüìö Key Takeaways:")
    print("   ‚Ä¢ Postprocessors filter noise ‚Üí Better input quality")
    print("   ‚Ä¢ TreeSummarize builds comprehensive answers ‚Üí Better output quality")
    print("   ‚Ä¢ Structured outputs enable system integration ‚Üí Production-ready")
    print("   ‚Ä¢ Combining techniques ‚Üí Professional RAG applications")
else:
    missing = total_count - completed_count
    print(f"\nüìù {missing} component(s) need attention:")
    for component, status in components_status.items():
        if not status:
            print(f"   ‚ùå {component}")

print("\nüí° Advanced RAG vs Basic RAG:")
print("   Basic: Good for simple queries, fast responses")
print("   Advanced: Better quality, comprehensive answers, production-ready")