# Assignment 2: Advanced RAG Techniques
## Day 6 Session 2 - Advanced RAG Fundamentals

**OBJECTIVE:** Implement advanced RAG techniques including postprocessors, response synthesizers, and structured outputs.

**LEARNING GOALS:**
- Understand and implement node postprocessors for filtering and reranking
- Learn different response synthesis strategies (TreeSummarize, Refine)
- Create structured outputs using Pydantic models
- Build advanced retrieval pipelines with multiple processing stages

**DATASET:** Use the same data folder as Assignment 1 (`Day_6/session_2/data/`)

**PREREQUISITES:** Complete Assignment 1 first

**INSTRUCTIONS:**
1. Complete each function by replacing the TODO comments with actual implementation
2. Run each cell after completing the function to test it
3. The answers can be found in the `03_advanced_rag_techniques.ipynb` notebook
4. Each technique builds on the previous one


In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
# If it's in a specific folder (e.g., "Projects/MyProject/")
!pip install -r '/content/drive/MyDrive/ai-accelerator-C2-main/Day_6/session_2/requirements.txt'



In [None]:
# Import required libraries for advanced RAG
import os
from pathlib import Path
from typing import Dict, List, Optional, Any
from pydantic import BaseModel, Field

# Core LlamaIndex components
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, StorageContext, Settings
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.retrievers import VectorIndexRetriever

# Vector store
from llama_index.vector_stores.lancedb import LanceDBVectorStore

# Embeddings and LLM
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.openrouter import OpenRouter

# Advanced RAG components (we'll use these in the assignments)
from llama_index.core.postprocessor import SimilarityPostprocessor
from llama_index.core.response_synthesizers import TreeSummarize, Refine, CompactAndRefine
from llama_index.core.output_parsers import PydanticOutputParser

print("‚úÖ Advanced RAG libraries imported successfully!")


‚úÖ Advanced RAG libraries imported successfully!


In [None]:
# Configure Advanced RAG Settings (Using OpenRouter)
def setup_advanced_rag_settings():
    """
    Configure LlamaIndex with optimized settings for advanced RAG.
    Uses local embeddings and OpenRouter for LLM operations.
    """
    # Check for OpenRouter API key

    from google.colab import userdata

    try:
        api_key = userdata.get('OPEN_ROUTER')  #  your named your secret
        print("‚úÖ OpenRouter API key found in Colab secrets")
    except Exception:
        print("‚ÑπÔ∏è  OPENROUTER_API_KEY not found - that's OK for this assignment!")
        print("   This assignment only uses local embeddings for vector operations.")


        # Configure OpenRouter LLM
        Settings.llm = OpenRouter(
            api_key=api_key,
            model="gpt-4o",
            temperature=0.1  # Lower temperature for more consistent responses
        )

    # Configure local embeddings (no API key required)
    Settings.embed_model = HuggingFaceEmbedding(
        model_name="BAAI/bge-small-en-v1.5",
        trust_remote_code=True
    )

    # Advanced RAG configuration
    Settings.chunk_size = 512  # Smaller chunks for better precision
    Settings.chunk_overlap = 50

    print("‚úÖ Advanced RAG settings configured")
    print("   - Chunk size: 512 (optimized for precision)")
    print("   - Using local embeddings for cost efficiency")
    print("   - OpenRouter LLM ready for response synthesis")

# Setup the configuration
setup_advanced_rag_settings()


‚úÖ OpenRouter API key found in Colab secrets
‚úÖ Advanced RAG settings configured
   - Chunk size: 512 (optimized for precision)
   - Using local embeddings for cost efficiency
   - OpenRouter LLM ready for response synthesis


In [None]:
from google.colab import userdata
import os

# Get the token from Colab secrets
hf_token = userdata.get('HF_TOKEN')

# Set as environment variable (optional)
os.environ['HF_TOKEN'] = hf_token

# Use with Hugging Face libraries
from huggingface_hub import login
login(token=hf_token)

Note: Environment variable`HF_TOKEN` is set and is the current active token independently from the token you've just configured.


In [None]:
# Setup: Create index from Assignment 1 (reuse the basic functionality)
def setup_basic_index(data_folder: str = "/content/drive/MyDrive/ai-accelerator-C2-main/Day_6/session_2/data", force_rebuild: bool = False):
    """
    Create a basic vector index that we'll enhance with advanced techniques.
    This reuses the concepts from Assignment 1.
    """
    # Create vector store
    vector_store = LanceDBVectorStore(
        uri="./advanced_rag_vectordb",
        table_name="documents"
    )

    # Load documents
    if not Path(data_folder).exists():
        print(f"‚ùå Data folder not found: {data_folder}")
        return None

    reader = SimpleDirectoryReader(input_dir=data_folder, recursive=True)
    documents = reader.load_data()

    # Create storage context and index
    storage_context = StorageContext.from_defaults(vector_store=vector_store)
    index = VectorStoreIndex.from_documents(
        documents,
        storage_context=storage_context,
        show_progress=True
    )

    print(f"‚úÖ Basic index created with {len(documents)} documents")
    print("   Ready for advanced RAG techniques!")

    # Create the basic index
    print("üìÅ Setting up basic index for advanced RAG...")


    return index

# Create the basic index
index = setup_basic_index()
if index:
    print("üöÄ Ready to implement advanced RAG techniques!")
else:
    print("‚ùå Failed to create index - check data folder path")




Parsing nodes:   0%|          | 0/42 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/93 [00:00<?, ?it/s]

‚úÖ Basic index created with 42 documents
   Ready for advanced RAG techniques!
üìÅ Setting up basic index for advanced RAG...
üöÄ Ready to implement advanced RAG techniques!


## 1. Node Postprocessors - Similarity Filtering

**Concept:** Postprocessors refine retrieval results after the initial vector search. The `SimilarityPostprocessor` filters out chunks that fall below a relevance threshold.

**Why it matters:** Raw vector search often returns some irrelevant results. Filtering improves precision and response quality.

Complete the function below to create a query engine with similarity filtering.


In [None]:
# First install the huggingface integration
!pip install llama-index-llms-huggingface



In [None]:
import os
from google.colab import userdata
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings
from llama_index.core.postprocessor import SimilarityPostprocessor

# Get OpenAI API key from Colab secrets
try:
    openai_api_key = userdata.get('OPENAI_API')  # Your secret name
    os.environ["OPENAI_API_KEY"] = openai_api_key  # What OpenAI expects
    print("‚úÖ OpenAI API key loaded from Colab secrets")
except Exception as e:
    print(f"‚ùå Error loading OpenAI API key from secrets: {e}")
    print("üí° Make sure you have added 'OPENAI_API' to your Colab secrets")
    print("   Go to the key icon (üîë) in the left sidebar and add your key")

# Use OpenAI which handles long contexts much better
Settings.llm = OpenAI(model="gpt-3.5-turbo", max_tokens=256)
print("‚úÖ OpenAI LLM configured successfully")

def create_query_engine_with_similarity_filter(index, similarity_cutoff: float = 0.3, top_k: int = 5):
    """
    Create a query engine that filters results based on similarity scores.

    TODO: Complete this function to create a query engine with similarity postprocessing.
    HINT: Use index.as_query_engine() with node_postprocessors parameter containing SimilarityPostprocessor

    Args:
        index: Vector index to query
        similarity_cutoff: Minimum similarity score (0.0 to 1.0)
        top_k: Number of initial results to retrieve before filtering

    Returns:
        Query engine with similarity filtering
    """
    try:
        # TODO: Create similarity postprocessor with the cutoff threshold
        similarity_processor = SimilarityPostprocessor(similarity_cutoff=similarity_cutoff)

        # TODO: Create query engine with similarity filtering
        query_engine = index.as_query_engine(
            similarity_top_k=top_k,
            node_postprocessors=[similarity_processor]
        )

        print(f"‚úÖ Query engine with similarity cutoff {similarity_cutoff} created")
        return query_engine

    except Exception as e:
        print(f"‚ùå Error creating query engine: {e}")
        return None

# Test the function with error handling
if 'index' in locals() and index:
    filtered_engine = create_query_engine_with_similarity_filter(index, similarity_cutoff=0.3, top_k=3)

    if filtered_engine:
        print("‚úÖ Query engine with similarity filtering created")

        # Test query
        test_query = "What are the benefits of AI agents?"
        print(f"\nüîç Testing query: '{test_query}'")

        try:
            response = filtered_engine.query(test_query)
            print(f"üìù Response: {response}")
        except Exception as e:
            print(f"‚ùå Error during query: {e}")
            print("üí° Try using a different model or check your data preprocessing")
    else:
        print("‚ùå Failed to create filtered query engine")
else:
    print("‚ùå No index available - run previous cells first")

‚úÖ OpenAI API key loaded from Colab secrets
‚úÖ OpenAI LLM configured successfully
‚úÖ Query engine with similarity cutoff 0.3 created
‚úÖ Query engine with similarity filtering created

üîç Testing query: 'What are the benefits of AI agents?'
üìù Response: The benefits of AI agents include their enhanced reasoning, planning, and tool execution capabilities, which enable them to achieve complex goals efficiently. Additionally, AI agents can communicate effectively, adapt to different scenarios, and work collaboratively in both single-agent and multi-agent architectures.


## 2. Response Synthesizers - TreeSummarize

**Concept:** Response synthesizers control how retrieved information becomes final answers. `TreeSummarize` builds responses hierarchically, ideal for complex analytical questions.

**Why it matters:** Different synthesis strategies work better for different query types. TreeSummarize excels at comprehensive analysis and long-form responses.

Complete the function below to create a query engine with TreeSummarize response synthesis.


In [None]:
def create_query_engine_with_tree_summarize(index, top_k: int = 5):
    """
    Create a query engine that uses TreeSummarize for comprehensive responses.

    TODO: Complete this function to create a query engine with TreeSummarize synthesis.
    HINT: Create a TreeSummarize instance, then use index.as_query_engine() with response_synthesizer parameter

    Args:
        index: Vector index to query
        top_k: Number of results to retrieve

    Returns:
        Query engine with TreeSummarize synthesis
    """
    # TODO: Create TreeSummarize response synthesizer
    tree_synthesizer =TreeSummarize()

    # TODO: Create query engine with the synthesizer
    query_engine = index.as_query_engine(
        response_synthesizer=tree_synthesizer,
        similarity_top_k=top_k
    )



    # PLACEHOLDER - Replace with actual implementation
    print(f"‚úÖ Create query engine with TreeSummarize synthesis")

    return query_engine


# Test the function
if index:
    tree_engine = create_query_engine_with_tree_summarize(index)

    if tree_engine:
        print("‚úÖ Query engine with TreeSummarize created")

        # Test with a complex analytical query
        analytical_query = "Compare the advantages and disadvantages of different AI agent frameworks"
        print(f"\nüîç Testing analytical query: '{analytical_query}'")

        try:
            response = tree_engine.query(analytical_query)
            print(f"üìù TreeSummarize Response:\n{response}")
        except Exception as e:
            print(f"‚ùå Error during query: {e}")
        # Uncomment when implemented:



    else:
        print("‚ùå Failed to create TreeSummarize query engine")
else:
    print("‚ùå No index available - run previous cells first")


‚úÖ Create query engine with TreeSummarize synthesis
‚úÖ Query engine with TreeSummarize created

üîç Testing analytical query: 'Compare the advantages and disadvantages of different AI agent frameworks'
üìù TreeSummarize Response:
Advantages and disadvantages of different AI agent frameworks can be compared based on factors such as complexity, learning curve, best use case, performance considerations, and suitability for different tasks. Frameworks like LangChain offer a moderate complexity level and are suitable for general LLM applications, while AutoGPT is known for its high complexity and steep learning curve, making it ideal for autonomous tasks. CrewAI, on the other hand, has a medium complexity level with an easy learning curve, making it suitable for team collaboration. LlamaIndex stands out with low complexity and ease of use, making it a good fit for document Q&A tasks. Performance considerations show that single agents typically have lower latency compared to multi-agent 

## 3. Structured Outputs with Pydantic Models

**Concept:** Structured outputs ensure predictable, parseable responses using Pydantic models. This is essential for API endpoints and data pipelines.

**Why it matters:** Instead of free-text responses, you get type-safe, validated data structures that applications can reliably process.

Complete the function below to create a structured output system for extracting research paper information.


In [None]:
# First, define the Pydantic models for structured outputs
class ResearchPaperInfo(BaseModel):
    """Structured information about a research paper or AI concept."""
    title: str = Field(description="The main title or concept name")
    key_points: List[str] = Field(description="3-5 main points or findings")
    applications: List[str] = Field(description="Practical applications or use cases")
    summary: str = Field(description="Brief 2-3 sentence summary")

# Import the missing component
from llama_index.core.program import LLMTextCompletionProgram

def create_structured_output_program(output_model: BaseModel = ResearchPaperInfo):
    """
    Create a structured output program using Pydantic models.

    TODO: Complete this function to create a structured output program.
    HINT: Use LLMTextCompletionProgram.from_defaults() with PydanticOutputParser and a prompt template

    Args:
        output_model: Pydantic model class for structured output

    Returns:
        LLMTextCompletionProgram that returns structured data
    """
    # TODO: Create output parser with the Pydantic model
    output_parser = PydanticOutputParser(output_model)

    # TODO: Create the structured output program
    program = LLMTextCompletionProgram.from_defaults(
        output_parser=output_parser,
        prompt_template_str=(
            "Extract structured information from the following context:\n"
            "{context}\n\n"
            "Question: {query}\n\n"
            "Provide the information in the specified JSON format."
        )
    )

    print(f"‚úÖ: Create structured output program with {output_model.__name__}")

    return program



# Test the function
if index:
    structured_program = create_structured_output_program(ResearchPaperInfo)

    if structured_program:
        print("‚úÖ Structured output program created")

        # Test with retrieval and structured extraction
        structure_query = "Tell me about AI agents and their capabilities"
        print(f"\nüîç Testing structured query: '{structure_query}'")

        # Get context for structured extraction (Uncomment when implemented)
        retriever = VectorIndexRetriever(index=index, similarity_top_k=3)
        nodes = retriever.retrieve(structure_query)
        context = "\n".join([node.text for node in nodes])


        response = structured_program(context=context, query=structure_query)
        print(f"üìä Structured Response:\n{response}")

    else:
        print("‚ùå Failed to create structured output program")
else:
    print("‚ùå No index available - run previous cells first")


‚úÖ: Create structured output program with ResearchPaperInfo
‚úÖ Structured output program created

üîç Testing structured query: 'Tell me about AI agents and their capabilities'
üìä Structured Response:
title='AI Agents and Their Capabilities' key_points=['Architectures leveraging advanced techniques are more effective across various benchmarks and problem types', 'Current AI-driven agents show promise but have notable limitations and areas for improvement', 'Challenges around agent benchmarks, real-world applicability, and mitigating harmful biases need to be addressed for reliable agents'] applications=[] summary='The survey explores the progression from static language models to dynamic, autonomous agents, providing a comprehensive understanding of the current AI agent landscape and insights for developers.'


## 4. Advanced Pipeline - Combining All Techniques

**Concept:** Combine multiple advanced techniques into a single powerful query engine: similarity filtering + response synthesis + structured output.

**Why it matters:** Production RAG systems often need multiple techniques working together for optimal results.

Complete the function below to create a comprehensive advanced RAG pipeline.


In [None]:
def create_advanced_rag_pipeline(index, similarity_cutoff: float = 0.3, top_k: int = 5):
    """
    Create a comprehensive advanced RAG pipeline combining multiple techniques.

    TODO: Complete this function to create the ultimate advanced RAG query engine.
    HINT: Combine SimilarityPostprocessor + TreeSummarize using index.as_query_engine()

    Args:
        index: Vector index to query
        similarity_cutoff: Minimum similarity score for filtering
        top_k: Number of initial results to retrieve

    Returns:
        Advanced query engine with filtering and synthesis combined
    """
    # TODO: Create similarity postprocessor
    similarity_processor = SimilarityPostprocessor(similarity_cutoff=similarity_cutoff)

    # TODO: Create TreeSummarize for comprehensive responses
    tree_synthesizer = TreeSummarize()

    # TODO: Create the comprehensive query engine combining both techniques
    advanced_engine = index.as_query_engine(
        response_synthesizer=tree_synthesizer,
        node_postprocessors=[similarity_processor],
        similarity_top_k=top_k
    )

    print(f"‚úÖ : Create advanced RAG pipeline with all techniques")

    return advanced_engine

    # PLACEHOLDER - Replace with actual implementation

    #return None

# Test the comprehensive pipeline
if index:
    advanced_pipeline = create_advanced_rag_pipeline(index)

    if advanced_pipeline:
        print("‚úÖ Advanced RAG pipeline created successfully!")
        print("   üîß Similarity filtering: ‚úÖ")
        print("   üå≥ TreeSummarize synthesis: ‚úÖ")

        # Test with complex query
        complex_query = "Analyze the current state and future potential of AI agent technologies"
        print(f"\nüîç Testing complex query: '{complex_query}'")

        # Uncomment when implemented:
        response = advanced_pipeline.query(complex_query)
        print(f"üöÄ Advanced RAG Response:\n{response}")
        print("   (Complete the function above to test the full pipeline)")

        print("\nüéØ This should provide:")
        print("   - Filtered relevant results only")
        print("   - Comprehensive analytical response")
        print("   - Combined postprocessing and synthesis")
    else:
        print("‚ùå Failed to create advanced RAG pipeline")
else:
    print("‚ùå No index available - run previous cells first")


‚úÖ : Create advanced RAG pipeline with all techniques
‚úÖ Advanced RAG pipeline created successfully!
   üîß Similarity filtering: ‚úÖ
   üå≥ TreeSummarize synthesis: ‚úÖ

üîç Testing complex query: 'Analyze the current state and future potential of AI agent technologies'
üöÄ Advanced RAG Response:
The current state of AI agent technologies shows promising advancements in achieving complex goals that require enhanced reasoning, planning, and tool execution capabilities. Architectures leveraging these techniques have demonstrated effectiveness across various benchmarks and problem types. However, there are notable limitations that need to be addressed for future improvement. Challenges such as comprehensive agent benchmarks, real-world applicability, and mitigating harmful biases in language models are areas that require attention in the near term to enable the development of reliable agents. By transitioning from static language models to more dynamic, autonomous agents, the AI ag

## 5. Final Test - Compare Basic vs Advanced RAG

Once you've completed all the functions above, run this cell to compare basic RAG with your advanced techniques.


In [None]:
# Final comparison: Basic vs Advanced RAG
print("üöÄ Advanced RAG Techniques Assignment - Final Test")
print("=" * 60)

# Test queries for comparison
test_queries = [
    "What are the key capabilities of AI agents?",
    "How do you evaluate agent performance metrics?",
    "Explain the benefits and challenges of multimodal AI systems"
]

# Check if all components were created
components_status = {
    "Basic Index": index is not None,
    "Similarity Filter": 'filtered_engine' in locals() and filtered_engine is not None,
    "TreeSummarize": 'tree_engine' in locals() and tree_engine is not None,
    "Structured Output": 'structured_program' in locals() and structured_program is not None,
    "Advanced Pipeline": 'advanced_pipeline' in locals() and advanced_pipeline is not None
}

print("\nüìä Component Status:")
for component, status in components_status.items():
    status_icon = "‚úÖ" if status else "‚ùå"
    print(f"   {status_icon} {component}")

# Create basic query engine for comparison
if index:
    print("\nüîç Creating basic query engine for comparison...")
    basic_engine = index.as_query_engine(similarity_top_k=5)

    print("\n" + "=" * 60)
    print("üÜö COMPARISON: Basic vs Advanced RAG")
    print("=" * 60)

    for i, query in enumerate(test_queries, 1):
        print(f"\nüìã Test Query {i}: '{query}'")
        print("-" * 50)

        # Basic RAG
        print("üîπ Basic RAG:")
        if basic_engine:
            # Uncomment when testing:
            basic_response = basic_engine.query(query)
            print(f"   Response: {str(basic_response)[:200]}...")
            #print("   (Standard vector search + simple response)")

        # Advanced RAG (if implemented)
        print("\nüî∏ Advanced RAG:")
        if components_status["Advanced Pipeline"]:
            # Uncomment when testing:
            advanced_response = advanced_pipeline.query(query)
            print(f"   Response: {advanced_response}")
            #print("   (Filtered + TreeSummarize + Structured output)")
        else:
            print("   Complete the advanced pipeline function to test")

# Final status
print("\n" + "=" * 60)
print("üéØ Assignment Status:")
completed_count = sum(components_status.values())
total_count = len(components_status)

print(f"   Completed: {completed_count}/{total_count} components")

if completed_count == total_count:
    print("\nüéâ Congratulations! You've mastered Advanced RAG Techniques!")
    print("   ‚úÖ Node postprocessors for result filtering")
    print("   ‚úÖ Response synthesizers for better answers")
    print("   ‚úÖ Structured outputs for reliable data")
    print("   ‚úÖ Advanced pipelines combining all techniques")
    print("\nüöÄ You're ready for production RAG systems!")
else:
    missing = total_count - completed_count
    print(f"\nüìù Complete {missing} more components to finish the assignment:")
    for component, status in components_status.items():
        if not status:
            print(f"   - {component}")

print("\nüí° Key learnings:")
print("   - Postprocessors improve result relevance and precision")
print("   - Different synthesizers work better for different query types")
print("   - Structured outputs enable reliable system integration")
print("   - Advanced techniques can be combined for production systems")


üöÄ Advanced RAG Techniques Assignment - Final Test

üìä Component Status:
   ‚úÖ Basic Index
   ‚úÖ Similarity Filter
   ‚úÖ TreeSummarize
   ‚úÖ Structured Output
   ‚úÖ Advanced Pipeline

üîç Creating basic query engine for comparison...

üÜö COMPARISON: Basic vs Advanced RAG

üìã Test Query 1: 'What are the key capabilities of AI agents?'
--------------------------------------------------
üîπ Basic RAG:
   Response: The key capabilities of AI agents include strong performance on complex tasks involving reasoning and tool execution, the ability to work iteratively towards goals, opportunities for human feedback, c...

üî∏ Advanced RAG:
   Response: The key capabilities of AI agents include strong performance on complex tasks involving reasoning and tool execution, the ability to work iteratively towards goals, opportunities for human feedback, clear leadership, defined planning phases with opportunities for plan refinement, intelligent message filtering, and dynamic teams wit