# 🤖 Agentic RAG: Complete Demo & Tutorial

## Building an Intelligent RAG Pipeline with Local Documents and Web Search

Welcome to this comprehensive demonstration of **Agentic RAG** - a powerful approach that combines the decision-making capabilities of AI agents with the adaptability of Retrieval-Augmented Generation (RAG).

### What You'll Learn:
- 🔍 How to build a smart routing system that decides between local and web search
- 📚 Setting up vector databases for local document retrieval
- 🌐 Creating web search and scraping agents for external information
- 🧠 Implementing document relevance grading and query optimization
- 🔄 Assembling everything into a cohesive, intelligent pipeline

### Architecture Overview:
```
User Query → Router → Local DB Search OR Web Search → Context Assembly → Final Answer
```

This system intelligently determines the best source of information for each query and provides comprehensive, contextually relevant responses.

## 🛠️ Section 1: Environment Setup and API Configuration

Before we start building our Agentic RAG system, we need to install the required packages and configure our API keys.

### Required API Keys:
- **Groq API**: For fast LLM responses
- **Gemini API**: For web scraping agents  
- **Serper.dev API**: For web search functionality

In [None]:
# Install required packages
!pip install langchain-groq faiss-cpu crewai crewai_tools serper pypdf2 python-dotenv setuptools sentence-transformers huggingface_hub langchain langchain-community langchain-text-splitters langchain-huggingface

In [None]:
# Import required libraries
import os
import warnings
import getpass
from dotenv import load_dotenv
from typing import List, Dict, Any

# Suppress warnings for cleaner output
warnings.filterwarnings('ignore')

# Load environment variables
load_dotenv()

# Function to set API keys securely
def set_api_key(key_name: str, description: str):
    """Set API key either from environment or user input"""
    if key_name not in os.environ:
        os.environ[key_name] = getpass.getpass(f"Enter your {description}: ")
    return os.environ[key_name]

# Set up API keys
print("🔑 Setting up API keys...")
GROQ_API_KEY = set_api_key("GROQ_API_KEY", "Groq API Key")
GEMINI_API_KEY = set_api_key("GEMINI_API_KEY", "Gemini API Key") 
SERPER_API_KEY = set_api_key("SERPER_API_KEY", "Serper.dev API Key")

print("✅ API keys configured successfully!")

In [None]:
# Import all necessary libraries for Agentic RAG
from langchain.vectorstores import FAISS
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface.embeddings import HuggingFaceEmbeddings
from langchain_groq import ChatGroq
from crewai_tools import SerperDevTool, ScrapeWebsiteTool
from crewai import Agent, Task, Crew, LLM
from pydantic import BaseModel, Field
import time

print("📚 All libraries imported successfully!")
print("🚀 Ready to build our Agentic RAG system!")

## 🧠 Section 2: Initialize Language Models

We'll set up two different language models:
- **ChatGroq (Llama)**: For general tasks like routing and answer generation
- **Gemini LLM**: For web scraping agents with higher temperature for creativity

In [None]:
# Initialize Language Models
print("🤖 Initializing Language Models...")

# Main LLM for routing and general tasks (Groq - Fast responses)
llm = ChatGroq(
    model="llama-3.3-70b-specdec",
    temperature=0,
    max_tokens=500,
    timeout=None,
    max_retries=2,
    api_key=GROQ_API_KEY
)

# LLM for web scraping agents (Gemini - More creative)
crew_llm = LLM(
    model="gemini/gemini-1.5-flash",
    api_key=GEMINI_API_KEY,
    max_tokens=500,
    temperature=0.7
)

print("✅ Language models initialized!")
print(f"   - Main LLM: {llm.model}")
print(f"   - Crew LLM: {crew_llm.model}")

## 📚 Section 3: Load and Process Local Documents

We'll create a sample document and process it for our vector database. In a real scenario, you would load your own PDF documents.

In [None]:
# Create sample documents for demonstration
# In practice, you would load your PDF files using PyPDFLoader

sample_documents = [
    {
        "content": """
        Agentic RAG: Advanced Retrieval-Augmented Generation
        
        Agentic RAG represents a significant evolution in AI systems, combining the decision-making 
        capabilities of AI agents with traditional RAG approaches. Unlike standard RAG systems that 
        simply retrieve and generate, Agentic RAG systems can:
        
        1. Make intelligent routing decisions
        2. Evaluate retrieval quality
        3. Rewrite queries for better results
        4. Combine multiple information sources
        
        Key Components:
        - Decision Router: Determines information source
        - Document Grader: Evaluates relevance
        - Query Rewriter: Optimizes search terms
        - Multi-source Retrieval: Local and web sources
        """,
        "metadata": {"source": "agentic_rag_guide.pdf", "page": 1}
    },
    {
        "content": """
        Machine Learning Fundamentals
        
        Machine learning is a subset of artificial intelligence that enables computers to learn
        and improve from experience without being explicitly programmed. There are three main
        types of machine learning:
        
        1. Supervised Learning: Uses labeled training data
        2. Unsupervised Learning: Finds patterns in unlabeled data  
        3. Reinforcement Learning: Learns through interaction and rewards
        
        Popular algorithms include:
        - Linear Regression
        - Decision Trees
        - Neural Networks
        - Support Vector Machines
        """,
        "metadata": {"source": "ml_basics.pdf", "page": 1}
    },
    {
        "content": """
        Vector Databases and Embeddings
        
        Vector databases are specialized databases designed to store and query high-dimensional
        vectors efficiently. They are crucial for:
        
        - Semantic search capabilities
        - Similarity matching
        - Recommendation systems
        - RAG implementations
        
        Popular vector databases include:
        - FAISS (Facebook AI Similarity Search)
        - Pinecone
        - Weaviate
        - Chroma
        
        Embeddings convert text into numerical vectors that capture semantic meaning.
        """,
        "metadata": {"source": "vector_db_guide.pdf", "page": 1}
    }
]

print("📄 Sample documents created!")
print(f"   - Total documents: {len(sample_documents)}")
for i, doc in enumerate(sample_documents):
    print(f"   - Document {i+1}: {doc['metadata']['source']}")

In [None]:
# Process documents into chunks
from langchain.schema import Document

print("⚙️ Processing documents into chunks...")

# Convert sample documents to LangChain Document objects
documents = []
for doc_data in sample_documents:
    doc = Document(
        page_content=doc_data["content"].strip(),
        metadata=doc_data["metadata"]
    )
    documents.append(doc)

# Split documents into smaller chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,        # Size of each chunk
    chunk_overlap=50,       # Overlap between chunks
    length_function=len,    # Function to measure chunk length
    separators=["\n\n", "\n", " ", ""]  # Split on paragraphs first, then sentences
)

# Split all documents
document_chunks = text_splitter.split_documents(documents)

print(f"✅ Documents processed!")
print(f"   - Original documents: {len(documents)}")
print(f"   - Document chunks: {len(document_chunks)}")
print(f"   - Average chunk size: {sum(len(chunk.page_content) for chunk in document_chunks) // len(document_chunks)} characters")

# Show a sample chunk
print(f"\n📝 Sample chunk:")
print(f"Content: {document_chunks[0].page_content[:200]}...")
print(f"Metadata: {document_chunks[0].metadata}")

## 🗄️ Section 4: Create Vector Database for Document Retrieval

Now we'll create a FAISS vector database using HuggingFace embeddings to enable semantic search over our document chunks.

In [None]:
# Create Vector Database
print("🔍 Creating vector database...")

# Initialize embeddings model
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-mpnet-base-v2",
    model_kwargs={'device': 'cpu'},
    encode_kwargs={'normalize_embeddings': True}
)

# Create FAISS vector database from documents
vector_db = FAISS.from_documents(document_chunks, embeddings)

print("✅ Vector database created successfully!")
print(f"   - Embedding model: sentence-transformers/all-mpnet-base-v2")
print(f"   - Total vectors: {vector_db.index.ntotal}")

# Function to retrieve relevant content from local database
def get_local_content(query: str, k: int = 5) -> str:
    """Retrieve relevant content from the vector database"""
    try:
        docs = vector_db.similarity_search(query, k=k)
        content = " ".join([doc.page_content for doc in docs])
        return content
    except Exception as e:
        print(f"Error retrieving local content: {e}")
        return ""

# Test the retrieval function
test_query = "What is Agentic RAG?"
test_result = get_local_content(test_query, k=2)
print(f"\n🧪 Test retrieval for '{test_query}':")
print(f"Retrieved content length: {len(test_result)} characters")
print(f"Sample content: {test_result[:200]}...")

## 🧭 Section 5: Build Decision Router Function

This is the brain of our Agentic RAG system! The router decides whether a query can be answered using local documents or needs external web search.

In [None]:
# Decision Router Function
def check_local_knowledge(query: str, context: str) -> bool:
    """
    Router function to determine if we can answer from local knowledge
    Returns True if local documents contain sufficient information
    """
    
    router_prompt = '''Role: Intelligent Query Router
Task: Determine whether the local documents contain sufficient information to answer the user's question.

Instructions:
- Analyze the provided context and user question carefully
- Respond with ONLY "Yes" or "No" 
- "Yes" if the local context contains relevant information to answer the question
- "No" if external search is needed for complete information

Examples:
Question: "What is machine learning?"
Context: "Machine learning is a subset of artificial intelligence..."
Answer: Yes

Question: "What's the latest news about AI?"
Context: "Machine learning is a subset of artificial intelligence..."
Answer: No

Current Question: {query}
Available Context: {context}

Answer:'''

    try:
        formatted_prompt = router_prompt.format(query=query, context=context)
        response = llm.invoke(formatted_prompt)
        decision = response.content.strip().lower()
        return "yes" in decision
    except Exception as e:
        print(f"Router error: {e}")
        return False

# Test the router function
print("🧭 Testing the decision router...")

# Get some local context for testing
local_context = get_local_content("", k=3)  # Get general context

# Test cases
test_cases = [
    "What is Agentic RAG?",
    "What are the latest AI news from 2024?", 
    "Explain machine learning types",
    "What's the weather today?"
]

for query in test_cases:
    decision = check_local_knowledge(query, local_context)
    route = "LOCAL" if decision else "WEB"
    print(f"   Query: '{query}' → Route: {route}")

print("\n✅ Router function is working!")

## 🌐 Section 6: Implement Web Search and Scraping Agent

When local knowledge isn't sufficient, our CrewAI agents will search the web and scrape relevant content to provide up-to-date information.

In [None]:
# Web Search and Scraping Agents
print("🌐 Setting up web search and scraping agents...")

def setup_web_scraping_agent():
    """Setup the web scraping agent and related components"""
    
    # Initialize search and scraping tools
    search_tool = SerperDevTool(api_key=SERPER_API_KEY)
    scrape_tool = ScrapeWebsiteTool()
    
    # Define the web search agent
    web_search_agent = Agent(
        role="Expert Web Search Specialist",
        goal="Find the most relevant and up-to-date web sources for user queries",
        backstory="""You are an expert at identifying valuable web sources and finding 
        the most relevant articles, papers, and resources for any given topic. You have 
        a keen eye for quality and relevance.""",
        allow_delegation=False,
        verbose=False,
        llm=crew_llm
    )
    
    # Define the web scraping agent  
    web_scraper_agent = Agent(
        role="Expert Content Analyzer and Summarizer",
        goal="Extract and analyze key information from web pages",
        backstory="""You are highly skilled at analyzing web content, extracting key 
        insights, and summarizing complex information in a clear and concise manner. 
        You focus on the most relevant information for the user's needs.""",
        allow_delegation=False,
        verbose=False,
        llm=crew_llm
    )
    
    # Define the web search task
    search_task = Task(
        description="""
        Search for the most relevant and recent information about: '{topic}'.
        Find authoritative sources, recent articles, or academic papers.
        Provide the URL and a brief summary of why this source is valuable.
        """,
        expected_output="""
        The URL of the most relevant web source and a brief explanation of its relevance to '{topic}'.
        Include key points that make this source valuable.
        """,
        tools=[search_tool],
        agent=web_search_agent,
    )
    
    # Define the web scraping task
    scraping_task = Task(
        description="""
        Extract and analyze the content from the web source found in the previous task.
        Focus on information relevant to: '{topic}'.
        Summarize the key findings in a clear and structured way.
        """,
        expected_output="""
        A comprehensive summary of the web content related to '{topic}', including:
        - Key facts and insights
        - Important details and explanations
        - Relevant examples or case studies
        Ensure the summary is accurate and well-organized.
        """,
        tools=[scrape_tool],
        agent=web_scraper_agent,
    )
    
    # Create the crew
    crew = Crew(
        agents=[web_search_agent, web_scraper_agent],
        tasks=[search_task, scraping_task],
        verbose=False,
        memory=False,
    )
    
    return crew

def get_web_content(query: str) -> str:
    """Get content from web scraping"""
    try:
        print(f"   🔍 Searching web for: '{query}'")
        crew = setup_web_scraping_agent()
        result = crew.kickoff(inputs={"topic": query})
        return result.raw
    except Exception as e:
        print(f"   ❌ Web search error: {e}")
        return f"Web search unavailable. Using fallback information about: {query}"

print("✅ Web agents configured successfully!")

# Test the web search (commented out to avoid API calls during demo)
# Uncomment the following lines to test web search functionality
# test_web_query = "latest AI developments 2024"
# web_result = get_web_content(test_web_query)
# print(f"🧪 Web search test completed for: '{test_web_query}'")
# print(f"Result length: {len(web_result)} characters")

## ⚖️ Section 7: Create Document Relevance Grader

The grader evaluates whether retrieved documents are relevant to the user's question and decides the next step in our pipeline.

In [None]:
# Document Relevance Grader
print("⚖️ Setting up document relevance grader...")

class GradeDocuments(BaseModel):
    """Grade documents for relevance to user question"""
    binary_score: str = Field(
        description="Relevance score: 'yes' if relevant, 'no' if not relevant"
    )
    reasoning: str = Field(
        description="Brief explanation for the grading decision"
    )

def grade_documents(question: str, context: str) -> dict:
    """
    Determine whether retrieved documents are relevant to the question
    Returns: Dictionary with 'relevant' (bool) and 'reasoning' (str)
    """
    
    grade_prompt = f"""
    You are an expert document grader. Assess whether the retrieved document 
    is relevant to the user question.
    
    Retrieved Document: {context}
    
    User Question: {question}
    
    Instructions:
    - If the document contains keywords or semantic meaning related to the question, grade as 'yes'
    - If the document is completely unrelated or doesn't help answer the question, grade as 'no'
    - Provide brief reasoning for your decision
    
    Respond with your grading decision and reasoning.
    """
    
    try:
        # Create a simpler grading approach using regular LLM
        response = llm.invoke(grade_prompt)
        content = response.content.lower()
        
        # Simple parsing of response
        if "yes" in content or "relevant" in content:
            relevant = True
        else:
            relevant = False
            
        reasoning = response.content[:100] + "..." if len(response.content) > 100 else response.content
        
        return {
            "relevant": relevant,
            "reasoning": reasoning,
            "next_action": "generate_answer" if relevant else "rewrite_question"
        }
        
    except Exception as e:
        print(f"Grading error: {e}")
        return {
            "relevant": True,  # Default to relevant on error
            "reasoning": "Error in grading, defaulting to relevant",
            "next_action": "generate_answer"
        }

# Test the grader
print("🧪 Testing document grader...")

test_contexts = [
    ("What is machine learning?", "Machine learning is a subset of artificial intelligence that enables computers to learn..."),
    ("What's the weather today?", "Machine learning is a subset of artificial intelligence that enables computers to learn..."),
    ("Explain vector databases", "Vector databases are specialized databases designed to store and query high-dimensional vectors...")
]

for question, context in test_contexts:
    grade_result = grade_documents(question, context)
    print(f"   Q: '{question}' → Relevant: {grade_result['relevant']} → Action: {grade_result['next_action']}")

print("✅ Document grader is working!")

## ✏️ Section 8: Build Query Rewriter for Better Retrieval

When initial retrieval returns irrelevant documents, the query rewriter improves the search terms for better results.

In [None]:
# Query Rewriter
print("✏️ Setting up query rewriter...")

def rewrite_question(original_question: str) -> str:
    """
    Rewrite the original question to improve retrieval results
    """
    
    rewrite_prompt = f"""
    You are a query optimization expert. Your task is to rewrite the user's question 
    to improve information retrieval.
    
    Original Question: {original_question}
    
    Instructions:
    - Analyze the underlying intent of the question
    - Rewrite to be more specific and searchable
    - Include relevant keywords and context
    - Make it clearer and more precise
    - Keep the core meaning unchanged
    
    Provide only the rewritten question, nothing else.
    """
    
    try:
        response = llm.invoke(rewrite_prompt)
        rewritten = response.content.strip()
        return rewritten
    except Exception as e:
        print(f"Query rewriting error: {e}")
        return original_question  # Return original on error

# Test the query rewriter
print("🧪 Testing query rewriter...")

test_questions = [
    "What is AI?",
    "Tell me about ML",
    "How does it work?",
    "Explain the concept",
    "What are the types?"
]

for question in test_questions:
    rewritten = rewrite_question(question)
    print(f"   Original: '{question}'")
    print(f"   Rewritten: '{rewritten}'")
    print()

print("✅ Query rewriter is working!")

## 💬 Section 9: Generate Final Answers with Context

This function combines retrieved context with user queries to produce comprehensive, accurate responses.

In [None]:
# Final Answer Generation
print("💬 Setting up answer generation...")

def generate_final_answer(context: str, query: str, source_type: str = "unknown") -> str:
    """
    Generate final answer using retrieved context and user query
    """
    
    answer_prompt = f"""
    You are a helpful AI assistant. Use the provided context to answer the user's question accurately and comprehensively.
    
    Context Information:
    {context}
    
    User Question: {query}
    
    Instructions:
    - Base your answer primarily on the provided context
    - Be accurate and informative
    - If the context doesn't fully answer the question, acknowledge this
    - Provide a clear and well-structured response
    - Include relevant details from the context
    - Keep the answer focused and helpful
    
    Answer:
    """
    
    try:
        response = llm.invoke(answer_prompt)
        answer = response.content.strip()
        
        # Add source information
        source_note = f"\n\n📚 *Source: {source_type.title()} knowledge base*"
        return answer + source_note
        
    except Exception as e:
        print(f"Answer generation error: {e}")
        return f"I apologize, but I encountered an error while generating the answer. The query was: {query}"

# Test answer generation
print("🧪 Testing answer generation...")

test_context = """
Agentic RAG represents a significant evolution in AI systems, combining the decision-making 
capabilities of AI agents with traditional RAG approaches. Unlike standard RAG systems that 
simply retrieve and generate, Agentic RAG systems can make intelligent routing decisions, 
evaluate retrieval quality, rewrite queries for better results, and combine multiple information sources.
"""

test_query = "What makes Agentic RAG different from traditional RAG?"
test_answer = generate_final_answer(test_context, test_query, "local")

print(f"Query: {test_query}")
print(f"Answer: {test_answer}")

print("\n✅ Answer generation is working!")

## 🔗 Section 10: Assemble the Complete Agentic RAG Pipeline

Now we'll integrate all components into a cohesive pipeline with proper routing logic and error handling.

In [None]:
# Complete Agentic RAG Pipeline
print("🔗 Assembling the complete Agentic RAG pipeline...")

class AgenticRAGPipeline:
    """
    Complete Agentic RAG Pipeline that intelligently routes queries
    between local documents and web search
    """
    
    def __init__(self, vector_db, max_retries=2):
        self.vector_db = vector_db
        self.max_retries = max_retries
        
    def process_query(self, query: str, verbose: bool = True) -> dict:
        """
        Main pipeline function to process user queries
        
        Returns:
            dict: Contains answer, source_type, processing_steps, and metadata
        """
        
        start_time = time.time()
        processing_steps = []
        
        if verbose:
            print(f"\n🔍 Processing query: '{query}'")
        
        try:
            # Step 1: Get initial context for routing decision
            processing_steps.append("Getting local context for routing")
            if verbose:
                print("   📚 Getting local context for routing...")
            
            local_context = get_local_content("general knowledge", k=3)
            
            # Step 2: Route the query
            processing_steps.append("Making routing decision")
            if verbose:
                print("   🧭 Making routing decision...")
            
            can_answer_locally = check_local_knowledge(query, local_context)
            route = "LOCAL" if can_answer_locally else "WEB"
            
            if verbose:
                print(f"   🎯 Routing decision: {route}")
            
            # Step 3: Retrieve context based on routing decision
            if can_answer_locally:
                # Local retrieval path
                processing_steps.append("Retrieving from local documents")
                if verbose:
                    print("   📖 Retrieving from local documents...")
                
                context = get_local_content(query, k=5)
                source_type = "local"
                
                # Step 4: Grade the retrieved documents
                processing_steps.append("Grading document relevance")
                if verbose:
                    print("   ⚖️ Grading document relevance...")
                
                grade_result = grade_documents(query, context)
                
                # Step 5: Handle grading result
                if not grade_result["relevant"]:
                    processing_steps.append("Documents not relevant, rewriting query")
                    if verbose:
                        print("   ✏️ Documents not relevant, rewriting query...")
                    
                    # Rewrite and try again (limited retries)
                    rewritten_query = rewrite_question(query)
                    if verbose:
                        print(f"   🔄 Rewritten query: '{rewritten_query}'")
                    
                    context = get_local_content(rewritten_query, k=5)
                    
            else:
                # Web search path
                processing_steps.append("Searching web for external information")
                if verbose:
                    print("   🌐 Searching web for external information...")
                
                context = get_web_content(query)
                source_type = "web"
            
            # Step 6: Generate final answer
            processing_steps.append("Generating final answer")
            if verbose:
                print("   💬 Generating final answer...")
            
            answer = generate_final_answer(context, query, source_type)
            
            # Calculate processing time
            processing_time = time.time() - start_time
            
            # Return comprehensive result
            result = {
                "query": query,
                "answer": answer,
                "source_type": source_type,
                "processing_steps": processing_steps,
                "processing_time": f"{processing_time:.2f}s",
                "context_length": len(context),
                "route_decision": route,
                "success": True
            }
            
            if verbose:
                print(f"   ✅ Processing completed in {processing_time:.2f}s")
            
            return result
            
        except Exception as e:
            error_result = {
                "query": query,
                "answer": f"I apologize, but I encountered an error while processing your query: {str(e)}",
                "source_type": "error",
                "processing_steps": processing_steps + [f"Error: {str(e)}"],
                "processing_time": f"{time.time() - start_time:.2f}s",
                "success": False
            }
            
            if verbose:
                print(f"   ❌ Error: {e}")
            
            return error_result

# Initialize the pipeline
agentic_rag = AgenticRAGPipeline(vector_db)

print("✅ Agentic RAG Pipeline assembled successfully!")
print("🚀 Ready to process queries intelligently!")

## 🎮 Section 11: Demo - Interactive Query Processing

Let's test our Agentic RAG system with various types of queries to see how it intelligently routes and processes them.

In [None]:
# Interactive Demo - Test Various Query Types
print("🎮 Starting Interactive Agentic RAG Demo!\n")

# Define test queries that should demonstrate different routing behaviors
demo_queries = [
    {
        "query": "What is Agentic RAG and how does it work?",
        "expected_route": "LOCAL",
        "description": "Query about content in our local documents"
    },
    {
        "query": "Explain the different types of machine learning",
        "expected_route": "LOCAL", 
        "description": "Another local knowledge query"
    },
    {
        "query": "What are vector databases used for?",
        "expected_route": "LOCAL",
        "description": "Technical query covered in our documents"
    },
    {
        "query": "What are the latest AI news and developments in 2024?",
        "expected_route": "WEB",
        "description": "Current events requiring web search"
    },
    {
        "query": "What's the weather forecast for tomorrow?",
        "expected_route": "WEB", 
        "description": "Real-time information not in local docs"
    }
]

# Helper function to display results nicely
def display_result(result: dict, query_info: dict):
    """Display query results in a formatted way"""
    print("="*80)
    print(f"🔍 QUERY: {result['query']}")
    print(f"📝 Description: {query_info['description']}")
    print(f"🎯 Expected Route: {query_info['expected_route']} | Actual Route: {result['route_decision']}")
    print(f"📊 Source: {result['source_type'].upper()} | Time: {result['processing_time']}")
    print(f"📏 Context Length: {result['context_length']} characters")
    print("\n📋 Processing Steps:")
    for i, step in enumerate(result['processing_steps'], 1):
        print(f"   {i}. {step}")
    print(f"\n💬 ANSWER:")
    print(result['answer'])
    print("="*80)
    print()

# Run the demo
print("🚀 Running Agentic RAG Demo with different query types...\n")

for i, query_info in enumerate(demo_queries, 1):
    print(f"📌 Demo Query #{i}")
    
    # Process the query
    result = agentic_rag.process_query(query_info["query"], verbose=False)
    
    # Display results
    display_result(result, query_info)
    
    # Add a small delay between queries for readability
    time.sleep(1)

print("🎉 Demo completed! The Agentic RAG system successfully:")
print("   ✅ Routed queries to appropriate sources")  
print("   ✅ Retrieved relevant information")
print("   ✅ Generated comprehensive answers")
print("   ✅ Provided detailed processing insights")

## 🔄 Section 12: Compare Responses - Local vs Web-Enhanced

Let's run side-by-side comparisons to see how the system handles the same query types with different routing decisions.

In [None]:
# Comparison Analysis: Local vs Web Responses
print("🔄 Comparison Analysis: Local vs Web Enhanced Responses\n")

def compare_responses(query: str):
    """
    Force both local and web responses for the same query to compare approaches
    """
    print(f"🔍 Analyzing Query: '{query}'\n")
    
    # Force local response
    print("📚 LOCAL RESPONSE:")
    print("-" * 50)
    local_context = get_local_content(query, k=5)
    local_answer = generate_final_answer(local_context, query, "local")
    print(f"Context Length: {len(local_context)} characters")
    print(f"Answer: {local_answer}\n")
    
    # Force web response (simulate)
    print("🌐 WEB-ENHANCED RESPONSE:")
    print("-" * 50)
    # For demo purposes, we'll simulate web content since we don't want to make actual API calls
    web_context = f"""
    [Simulated Web Content for: {query}]
    This would contain up-to-date information from web search about {query.lower()}, 
    including recent developments, current statistics, and latest research findings 
    that might not be available in the local document database.
    """
    web_answer = generate_final_answer(web_context, query, "web")
    print(f"Context Length: {len(web_context)} characters")
    print(f"Answer: {web_answer}\n")
    
    print("📊 COMPARISON SUMMARY:")
    print("-" * 50)
    print(f"• Local response length: {len(local_answer)} characters")
    print(f"• Web response length: {len(web_answer)} characters") 
    print(f"• Local context length: {len(local_context)} characters")
    print(f"• Web context length: {len(web_context)} characters")
    print("\n" + "="*80 + "\n")

# Test queries for comparison
comparison_queries = [
    "What is Agentic RAG?",
    "Explain machine learning fundamentals", 
    "How do vector databases work?"
]

print("Running side-by-side comparisons...\n")

for query in comparison_queries:
    compare_responses(query)

print("🎯 KEY INSIGHTS from the comparison:")
print("""
1. 📚 LOCAL RESPONSES:
   • Faster processing (no web API calls)
   • Consistent with our curated knowledge base
   • Limited to information available in local documents
   • Ideal for domain-specific knowledge

2. 🌐 WEB-ENHANCED RESPONSES:  
   • Access to current and broader information
   • Takes longer due to search and scraping
   • Can provide more recent developments
   • Better for current events and expanding knowledge

3. 🧠 AGENTIC ROUTING BENEFITS:
   • Automatically chooses the best source
   • Optimizes for both speed and completeness
   • Reduces unnecessary web searches
   • Provides consistent user experience
""")

## 🎯 Conclusion and Next Steps

### What We've Built

Congratulations! You've successfully built a complete **Agentic RAG system** that:

✅ **Intelligently Routes Queries** - Decides between local and web sources  
✅ **Processes Local Documents** - Chunks, embeds, and searches vector database  
✅ **Web Search & Scraping** - Uses AI agents for external information gathering  
✅ **Grades Document Relevance** - Evaluates retrieval quality  
✅ **Rewrites Queries** - Optimizes search terms for better results  
✅ **Generates Contextual Answers** - Combines information sources effectively  

### Key Components Recap

| Component | Purpose | Technology |
|-----------|---------|------------|
| 🧭 **Router** | Decides information source | Groq LLM + Prompt Engineering |
| 📚 **Vector DB** | Local document search | FAISS + HuggingFace Embeddings |
| 🌐 **Web Agents** | External information gathering | CrewAI + Serper + Web Scraping |
| ⚖️ **Grader** | Relevance evaluation | LLM-based scoring |
| ✏️ **Rewriter** | Query optimization | LLM-based rewriting |
| 💬 **Generator** | Final answer synthesis | Context-aware generation |

### Architecture Benefits

🚀 **Performance**: Fast local retrieval when possible  
🎯 **Accuracy**: Web search for up-to-date information  
🧠 **Intelligence**: Automatic routing and optimization  
🔄 **Adaptability**: Self-improving through query rewriting  
📈 **Scalability**: Easy to add new data sources  

### Next Steps & Extensions

1. **📊 Add Analytics**: Track routing decisions and performance metrics
2. **🔧 Optimize Embeddings**: Fine-tune for your specific domain
3. **🌍 Multi-Language Support**: Extend to other languages
4. **📱 Build UI**: Create a web interface for user interaction
5. **🔗 Database Integration**: Connect to live databases
6. **🧪 A/B Testing**: Compare different routing strategies
7. **📈 Monitoring**: Add logging and performance tracking
8. **🔒 Security**: Implement authentication and rate limiting

### Try It Yourself!

Use the interactive demo above to test with your own queries, or modify the code to:
- Add your own PDF documents
- Experiment with different LLM models
- Adjust routing logic and prompts
- Integrate additional data sources

**Happy building with Agentic RAG!** 🚀

In [None]:
# 🎮 Try Your Own Query!
# Uncomment and modify the code below to test with your own questions

"""
# Example: Test with your own query
my_query = "Your question here"

# Process the query
result = agentic_rag.process_query(my_query, verbose=True)

# Display the result
print("\n" + "="*80)
print(f"🔍 YOUR QUERY: {result['query']}")
print(f"🎯 Route Decision: {result['route_decision']}")
print(f"📊 Source: {result['source_type'].upper()}")
print(f"⏱️ Processing Time: {result['processing_time']}")
print(f"\n💬 ANSWER:")
print(result['answer'])
print("="*80)
"""

print("🎉 Agentic RAG Demo Complete!")
print("📝 Edit the cell above to test your own queries")
print("🚀 The system is ready for your experiments!")

# Optional: Create a simple interactive loop (uncomment to use)
"""
def interactive_demo():
    print("🎮 Interactive Agentic RAG Demo")
    print("Type 'quit' to exit\n")
    
    while True:
        query = input("💬 Ask me anything: ")
        if query.lower() in ['quit', 'exit', 'q']:
            print("👋 Goodbye!")
            break
            
        result = agentic_rag.process_query(query, verbose=False)
        print(f"\n🤖 Answer ({result['source_type']} source):")
        print(result['answer'])
        print(f"\n⏱️ Processed in {result['processing_time']}")
        print("-" * 50 + "\n")

# Uncomment the line below to start interactive mode
# interactive_demo()
"""