# RAG with Milvus, LangChain & Anthropic Claude

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/behroozazarkhalili/Milvus-RAG/blob/master/RAG_Milvus_LangChain_Anthropic.ipynb)

**Note**: If the repository is private, you'll need to make it public for the Colab badge to work, or manually upload the notebook to Colab.

**Architecture**: Document Processing → Embedding Generation → Vector Storage → Retrieval → Generation

- **Milvus**: Vector database for similarity search
- **LangChain**: LLM application framework  
- **Anthropic Claude**: Response generation

## Setup & Dependencies

In [None]:
# Install Dependencies
!pip install -q pymilvus langchain langchain-community anthropic sentence-transformers python-dotenv

## Prerequisites & Environment Setup

### 🔧 **System Requirements**
- **Python**: 3.8+ with pip package manager
- **Docker**: For Milvus vector database server
- **Memory**: 4GB+ RAM recommended for embedding processing
- **Storage**: 2GB+ available space for vector indices

### 🚀 **Quick Start Setup**

#### 1. **Start Milvus Vector Database**

**Step-by-Step Milvus Installation:**

```bash
# Download the official Milvus standalone script
curl -sfL https://raw.githubusercontent.com/milvus-io/milvus/master/scripts/standalone_embed.sh -o milvus.sh

# Make the script executable  
chmod +x milvus.sh

# Start Milvus server with all required services
bash milvus.sh start
```

**What Each Command Does:**

**🔽 Download Script:**
- `curl -sfL`: Download silently with redirect following and failure on HTTP errors
- Downloads the official Milvus standalone deployment script
- Saves as `milvus.sh` in your current directory

**🔧 Make Executable:**
- `chmod +x`: Grants execution permissions to the script
- Required for bash script execution

**🚀 Start Services:**
- `bash milvus.sh start`: Launches complete Milvus stack via Docker
  - **Milvus Server**: Vector database engine (port 19530)
  - **Etcd**: Metadata storage and service discovery
  - **MinIO**: Object storage for vector data persistence

**✅ Verification:**
```bash
# Check if all Milvus containers are running
docker ps | grep milvus

# Expected output: 3 running containers
# - milvus-standalone
# - milvus-etcd  
# - milvus-minio
```

*Milvus will be accessible at `localhost:19530` for your RAG application*

#### 2. **Configure API Access**
Create `.env` file in project root:
```bash
# Required: Anthropic Claude API
ANTHROPIC_API_KEY=your-anthropic-api-key-here

# Optional: Additional configurations
MILVUS_HOST=localhost
MILVUS_PORT=19530
```

#### 3. **Prepare Documents**
```bash
# Create directories for your documents
mkdir -p pdf_files sample_docs

# Add your PDF files to pdf_files/
# Supported formats: PDF, TXT, MD
```

### 🎯 **Ready to Run?**
✅ Milvus server running  
✅ Environment variables set  
✅ Documents in place  
✅ Dependencies installed  

**→ Proceed to next cell to import libraries and start building your RAG system!**

In [None]:
# Import Libraries & Setup Logging
import os
import json
from typing import List, Dict, Optional, Tuple, Any
from dataclasses import dataclass
from pathlib import Path
import logging
from dotenv import load_dotenv

# LangChain imports
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.schema import Document
from langchain.document_loaders import TextLoader, PyPDFLoader, DirectoryLoader

# Milvus imports - using modern MilvusClient only
from pymilvus import MilvusClient

# Embedding and LLM imports
from sentence_transformers import SentenceTransformer
import anthropic
import numpy as np

# Setup logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Load environment variables
load_dotenv()

## System Configuration

In [None]:
# RAG Configuration
@dataclass
class RAGConfig:
    """Configuration class for RAG application."""
    
    # Milvus configuration
    milvus_host: str = "localhost"
    milvus_port: str = "19530"
    collection_name: str = "rag_documents"
    
    # Embedding configuration
    embedding_model: str = "all-MiniLM-L6-v2"
    embedding_dim: int = 384
    
    # Text processing configuration
    chunk_size: int = 1000
    chunk_overlap: int = 200
    
    # Retrieval configuration
    top_k: int = 5
    
    # Anthropic configuration
    ANTHROPIC_API_KEY: Optional[str] = "your_api_key"
    model_name: str = "claude-sonnet-4-20250514"
    max_tokens: int = 1000
    
    def __post_init__(self):
        """Post-initialization to set API key from environment if not provided."""
        if self.ANTHROPIC_API_KEY is None:
            self.ANTHROPIC_API_KEY = os.getenv("ANTHROPIC_API_KEY")

        if not self.ANTHROPIC_API_KEY:
            raise ValueError("ANTHROPIC_API_KEY must be set in environment or config")

# Initialize configuration
config = RAGConfig()

## Core Components

In [None]:
# Document Processor
class DocumentProcessor:
    """Handles document loading and text chunking."""
    
    def __init__(self, chunk_size: int = 1000, chunk_overlap: int = 200):
        """
        Initialize document processor.
        
        Args:
            chunk_size: Maximum size of text chunks
            chunk_overlap: Overlap between consecutive chunks
        """
        self.text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=chunk_size,
            chunk_overlap=chunk_overlap,
            length_function=len,
            separators=["\n\n", "\n", " ", ""]
        )
        
    def load_documents_from_directory(self, directory_path: str, file_types: List[str] = None) -> List[Document]:
        """
        Load documents from a directory.
        
        Args:
            directory_path: Path to directory containing documents
            file_types: List of file extensions to load (e.g., ['.txt', '.pdf'])
            
        Returns:
            List of loaded documents
        """
        if file_types is None:
            file_types = [".txt", ".pdf", ".md"]
            
        documents = []
        
        for file_type in file_types:
            if file_type == ".pdf":
                loader = DirectoryLoader(
                    directory_path,
                    glob=f"**/*{file_type}",
                    loader_cls=PyPDFLoader
                )
            else:
                loader = DirectoryLoader(
                    directory_path,
                    glob=f"**/*{file_type}",
                    loader_cls=TextLoader
                )
            
            docs = loader.load()
            documents.extend(docs)
            
        logger.info(f"Loaded {len(documents)} documents from {directory_path}")
        return documents
    
    def load_single_document(self, file_path: str) -> List[Document]:
        """
        Load a single document.
        
        Args:
            file_path: Path to the document file
            
        Returns:
            List containing the loaded document
        """
        file_extension = Path(file_path).suffix.lower()
        
        if file_extension == ".pdf":
            loader = PyPDFLoader(file_path)
        else:
            loader = TextLoader(file_path)
            
        documents = loader.load()
        logger.info(f"Loaded document from {file_path}")
        return documents
    
    def chunk_documents(self, documents: List[Document]) -> List[Document]:
        """
        Split documents into smaller chunks.
        
        Args:
            documents: List of documents to chunk
            
        Returns:
            List of document chunks
        """
        chunks = self.text_splitter.split_documents(documents)
        logger.info(f"Split {len(documents)} documents into {len(chunks)} chunks")
        return chunks

### Embedding Generator

In [None]:
# Embedding Generator
class EmbeddingGenerator:
    """Handles text embedding generation using sentence transformers."""
    
    def __init__(self, model_name: str = "all-MiniLM-L6-v2"):
        """
        Initialize embedding generator.
        
        Args:
            model_name: Name of the sentence transformer model
        """
        self.model = SentenceTransformer(model_name)
        self.embedding_dim = self.model.get_sentence_embedding_dimension()
        logger.info(f"Initialized embedding model: {model_name} (dim: {self.embedding_dim})")
    
    def embed_text(self, text: str) -> np.ndarray:
        """
        Generate embedding for a single text.
        
        Args:
            text: Input text to embed
            
        Returns:
            Embedding vector as numpy array
        """
        embedding = self.model.encode(text, convert_to_numpy=True)
        return embedding
    
    def embed_documents(self, documents: List[Document]) -> Tuple[List[str], List[np.ndarray], List[Dict[str, Any]]]:
        """
        Generate embeddings for a list of documents.
        
        Args:
            documents: List of documents to embed
            
        Returns:
            Tuple of (texts, embeddings, metadata)
        """
        texts = [doc.page_content for doc in documents]
        metadata = [doc.metadata for doc in documents]
        
        logger.info(f"Generating embeddings for {len(texts)} documents...")
        embeddings = self.model.encode(texts, convert_to_numpy=True, show_progress_bar=True)
        
        return texts, embeddings, metadata

### Vector Store

In [None]:
class MilvusVectorStore:
    """Handles Milvus vector database operations using modern MilvusClient."""
    
    def __init__(self, host: str = "localhost", port: str = "19530", collection_name: str = "rag_documents"):
        """
        Initialize Milvus vector store.
        
        Args:
            host: Milvus server host
            port: Milvus server port
            collection_name: Name of the collection to use
        """
        self.host = host
        self.port = port
        self.collection_name = collection_name
        
        # Connect to Milvus using the modern MilvusClient
        self._connect()
    
    def _connect(self) -> None:
        """
        Establish connection to Milvus server using MilvusClient.
        """
        try:
            # Use the modern MilvusClient with uri endpoint
            uri = f"http://{self.host}:{self.port}"
            self.client = MilvusClient(uri=uri)
            logger.info(f"Connected to Milvus at {uri}")
        except Exception as e:
            logger.error(f"Failed to connect to Milvus: {e}")
            raise
    
    def create_collection(self, embedding_dim: int) -> None:
        """
        Create a new collection using MilvusClient's simplified approach.
        
        Args:
            embedding_dim: Dimension of the embedding vectors
        """
        try:
            # Drop existing collection if it exists
            if self.client.has_collection(self.collection_name):
                self.client.drop_collection(self.collection_name)
                logger.info(f"Dropped existing collection: {self.collection_name}")
            
            # MilvusClient uses a simplified schema creation approach
            self.client.create_collection(
                collection_name=self.collection_name,
                dimension=embedding_dim,
                metric_type="COSINE",
                index_type="IVF_FLAT",
                index_params={"nlist": 1024}
            )
            
            logger.info(f"Created collection: {self.collection_name} with dimension {embedding_dim}")
            
        except Exception as e:
            logger.error(f"Error creating collection: {e}")
            raise
    
    def load_collection(self) -> None:
        """
        Load existing collection.
        """
        if self.client.has_collection(self.collection_name):
            logger.info(f"Collection {self.collection_name} exists and is ready")
        else:
            raise ValueError(f"Collection {self.collection_name} does not exist")
    
    def add_documents(self, texts: List[str], embeddings: List[np.ndarray], metadata: List[Dict[str, Any]]) -> None:
        """
        Add documents to the collection using MilvusClient.
        
        Args:
            texts: List of document texts
            embeddings: List of embedding vectors
            metadata: List of metadata dictionaries
        """
        # Prepare data for MilvusClient insertion (include id field)
        data = []
        for i in range(len(texts)):
            data.append({
                "id": i,  # Add required id field
                "text": texts[i],
                "vector": embeddings[i].tolist(),
                "metadata": json.dumps(metadata[i])
            })
        
        # Insert data using MilvusClient
        result = self.client.insert(
            collection_name=self.collection_name,
            data=data
        )
        
        logger.info(f"Added {len(texts)} documents to collection")
        return result
    
    def search(self, query_embedding: np.ndarray, top_k: int = 5) -> List[Dict[str, Any]]:
        """
        Search for similar documents using MilvusClient.
        
        Args:
            query_embedding: Query embedding vector
            top_k: Number of top results to return
            
        Returns:
            List of search results with text, metadata, and similarity scores
        """
        # Ensure collection exists
        if not self.client.has_collection(self.collection_name):
            raise ValueError(f"Collection {self.collection_name} does not exist")
        
        # Use MilvusClient search method with proper vector field specification
        results = self.client.search(
            collection_name=self.collection_name,
            data=[query_embedding.tolist()],
            anns_field="vector",  # Specify the vector field name
            search_params={"metric_type": "COSINE", "params": {"nprobe": 10}},
            limit=top_k,
            output_fields=["text", "metadata"]
        )
        
        # Format results
        formatted_results = []
        for hit in results[0]:
            formatted_results.append({
                "text": hit["text"],
                "metadata": json.loads(hit["metadata"]),
                "score": 1.0 - hit["distance"],  # Convert distance to similarity score for COSINE
                "id": hit["id"]
            })
        
        return formatted_results
    
    def close(self) -> None:
        """
        Close the MilvusClient connection.
        """
        if hasattr(self, 'client'):
            self.client.close()
            logger.info("Closed MilvusClient connection")

### Language Model

In [None]:
class ClaudeGenerator:
    """Handles text generation using Anthropic Claude."""
    
    def __init__(self, api_key: str, model_name: str = "claude-sonnet-4-20250514", max_tokens: int = 1000):
        """
        Initialize Claude generator.
        
        Args:
            api_key: Anthropic API key
            model_name: Claude model to use
            max_tokens: Maximum tokens to generate
        """
        self.client = anthropic.Anthropic(api_key=api_key)
        self.model_name = model_name
        self.max_tokens = max_tokens
        logger.info(f"Initialized Claude generator with model: {model_name}")
    
    def generate_response(self, query: str, context_documents: List[Dict[str, Any]], 
                         system_prompt: Optional[str] = None) -> str:
        """
        Generate response based on query and retrieved context.
        
        Args:
            query: User query
            context_documents: Retrieved documents from vector search
            system_prompt: Optional system prompt to guide the model
            
        Returns:
            Generated response text
        """
        # Prepare context from retrieved documents
        context = "\n\n".join([
            f"Document {i+1}:\n{doc['text']}"
            for i, doc in enumerate(context_documents)
        ])
        
        # Default system prompt
        if system_prompt is None:
            system_prompt = (
                "You are a helpful AI assistant that answers questions based on the provided context. "
                "Use only the information from the context to answer questions. "
                "If the context doesn't contain enough information to answer the question, "
                "say so explicitly and suggest what additional information might be needed."
            )
        
        # Construct the prompt
        user_prompt = f"""
Context:
{context}

Question: {query}

Please provide a comprehensive answer based on the context above.
"""
        
        try:
            # Generate response using Claude
            response = self.client.messages.create(
                model=self.model_name,
                max_tokens=self.max_tokens,
                system=system_prompt,
                messages=[
                    {"role": "user", "content": user_prompt}
                ]
            )
            
            return response.content[0].text
            
        except Exception as e:
            logger.error(f"Error generating response: {e}")
            raise e 

## RAG Pipeline

In [None]:
class RAGPipeline:
    """Main RAG pipeline that orchestrates all components."""
    
    def __init__(self, config: RAGConfig):
        """
        Initialize RAG pipeline with configuration.
        
        Args:
            config: RAG configuration object
        """
        self.config = config
        
        # Initialize components
        self.doc_processor = DocumentProcessor(
            chunk_size=config.chunk_size,
            chunk_overlap=config.chunk_overlap
        )
        
        self.embedding_generator = EmbeddingGenerator(config.embedding_model)
        
        self.vector_store = MilvusVectorStore(
            host=config.milvus_host,
            port=config.milvus_port,
            collection_name=config.collection_name
        )
        
        self.llm = ClaudeGenerator(
            api_key=config.ANTHROPIC_API_KEY,
            model_name=config.model_name,
            max_tokens=config.max_tokens
        )
        
        logger.info("RAG pipeline initialized successfully")
    
    def index_documents(self, document_source: str, is_directory: bool = True) -> None:
        """
        Index documents into the vector store.
        
        Args:
            document_source: Path to documents (file or directory)
            is_directory: Whether the source is a directory or single file
        """
        logger.info(f"Starting document indexing from: {document_source}")
        
        # Load documents
        if is_directory:
            documents = self.doc_processor.load_documents_from_directory(document_source)
        else:
            documents = self.doc_processor.load_single_document(document_source)
        
        if not documents:
            logger.warning("No documents found to index")
            return
        
        # Chunk documents
        chunks = self.doc_processor.chunk_documents(documents)
        
        # Generate embeddings
        texts, embeddings, metadata = self.embedding_generator.embed_documents(chunks)
        
        # Create collection with correct embedding dimension
        self.vector_store.create_collection(self.embedding_generator.embedding_dim)
        
        # Add documents to vector store
        self.vector_store.add_documents(texts, embeddings, metadata)
        
        logger.info(f"Successfully indexed {len(chunks)} document chunks")
    
    def query(self, question: str, top_k: Optional[int] = None, 
              system_prompt: Optional[str] = None) -> Dict[str, Any]:
        """
        Query the RAG system.
        
        Args:
            question: User question
            top_k: Number of documents to retrieve (uses config default if None)
            system_prompt: Optional system prompt for the LLM
            
        Returns:
            Dictionary containing answer, retrieved documents, and metadata
        """
        if top_k is None:
            top_k = self.config.top_k
        
        logger.info(f"Processing query: {question[:100]}...")
        
        try:
            # Ensure collection exists
            if not self.vector_store.client.has_collection(self.vector_store.collection_name):
                raise ValueError(f"Collection {self.vector_store.collection_name} does not exist. Please index documents first.")
            
            # Generate query embedding
            query_embedding = self.embedding_generator.embed_text(question)
            
            # Retrieve relevant documents
            retrieved_docs = self.vector_store.search(query_embedding, top_k=top_k)
            
            if not retrieved_docs:
                return {
                    "answer": "No relevant documents found for your query.",
                    "retrieved_documents": [],
                    "num_retrieved": 0
                }
            
            # Generate answer using Claude
            answer = self.llm.generate_response(
                query=question,
                context_documents=retrieved_docs,
                system_prompt=system_prompt
            )
            
            logger.info("Query processed successfully")
            
            return {
                "answer": answer,
                "retrieved_documents": retrieved_docs,
                "num_retrieved": len(retrieved_docs)
            }
            
        except Exception as e:
            logger.error(f"Error processing query: {e}")
            return {
                "answer": f"Error processing query: {str(e)}",
                "retrieved_documents": [],
                "num_retrieved": 0
            }
    
    def get_collection_stats(self) -> Dict[str, Any]:
        """
        Get statistics about the current collection using MilvusClient.
        
        Returns:
            Dictionary with collection statistics
        """
        try:
            # Check if collection exists
            if not self.vector_store.client.has_collection(self.vector_store.collection_name):
                return {
                    "error": f"Collection {self.vector_store.collection_name} does not exist. Please index documents first."
                }
            
            # Get collection statistics using MilvusClient
            collection_info = self.vector_store.client.describe_collection(self.vector_store.collection_name)
            
            # Get entity count - this might not be available in MilvusClient, so we'll provide what we can
            stats = {
                "collection_name": self.config.collection_name,
                "embedding_dim": self.embedding_generator.embedding_dim,
                "embedding_model": self.config.embedding_model,
                "collection_exists": True,
                "schema": collection_info if collection_info else "Schema information not available"
            }
            
            return stats
            
        except Exception as e:
            logger.error(f"Error getting collection stats: {e}")
            return {"error": str(e)}

## Demo: Load Sample Documents

In [None]:
# Load Sample Documents from Files
sample_documents_dir = "sample_docs"

# Verify sample documents exist
import glob
sample_files = glob.glob(f"{sample_documents_dir}/*.txt")

if sample_files:
    print(f"Found {len(sample_files)} sample documents:")
    for file in sample_files:
        print(f"  📄 {file}")
else:
    print(f"⚠️ No sample documents found in {sample_documents_dir}/")
    print("📝 Expected files: ai_overview.txt, vector_databases.txt, rag_systems.txt")

In [None]:
# Initialize RAG Pipeline
rag = RAGPipeline(config)

print("RAG Pipeline initialized successfully!")

In [None]:
# Index Sample Documents & Get Stats
rag.index_documents(sample_documents_dir, is_directory=True)

# Get collection statistics
stats = rag.get_collection_stats()
print("\nCollection Statistics:")
for key, value in stats.items():
    print(f"{key}: {value}")

In [None]:
# Test Sample Queries
queries = [
    "What is artificial intelligence?",
    "How does Milvus work as a vector database?",
    "Explain the RAG pipeline steps",
    "What are the differences between machine learning and deep learning?"
]

print("Testing RAG Pipeline with sample queries...\n")

for i, query in enumerate(queries, 1):
    print(f"Query {i}: {query}")
    print("-" * 50)
    
    result = rag.query(query, top_k=3)
    
    print(f"Answer: {result['answer']}")
    print(f"\nRetrieved {result['num_retrieved']} documents:")
    
    for j, doc in enumerate(result['retrieved_documents'], 1):
        print(f"  {j}. Score: {doc['score']:.4f}")
        print(f"     Text: {doc['text'][:100]}...")
        print(f"     Source: {doc['metadata'].get('source', 'Unknown')}")
    
    print("\n" + "=" * 80 + "\n")

### Custom System Prompts

In [None]:
# Example with Custom System Prompt
custom_system_prompt = """
You are an expert AI researcher and educator. When answering questions:
1. Provide detailed, technical explanations
2. Include relevant examples when possible
3. Mention any limitations or caveats
4. Cite the specific documents used in your response
5. If asked about comparisons, provide a structured analysis
"""

query = "Compare machine learning and deep learning approaches"
result = rag.query(query, top_k=2, system_prompt=custom_system_prompt)

print(f"Query: {query}")
print(f"Answer with custom prompt: {result['answer']}")

## Utility Functions

In [None]:
# Interactive Session & Utility Functions
def interactive_rag_session(rag_pipeline: RAGPipeline) -> None:
    """Run an interactive RAG session."""
    print("Interactive RAG Session Started!")
    print("Type 'quit' to exit, 'stats' for collection statistics")
    print("-" * 50)
    
    while True:
        query = input("\nEnter your question: ").strip()
        
        if query.lower() in ['quit', 'exit', 'q']:
            print("Goodbye!")
            break
        
        if query.lower() == 'stats':
            stats = rag_pipeline.get_collection_stats()
            print("\nCollection Statistics:")
            for key, value in stats.items():
                print(f"  {key}: {value}")
            continue
        
        if not query:
            print("Please enter a valid question.")
            continue
        
        print("\nProcessing...")
        result = rag_pipeline.query(query)
        
        print(f"\nAnswer: {result['answer']}")
        print(f"\nBased on {result['num_retrieved']} retrieved documents.")

# Uncomment to run interactive session
# interactive_rag_session(rag)

## Cleanup

In [None]:
# Resource Cleanup Functions
def cleanup_resources() -> None:
    """Clean up resources and connections."""
    try:
        # Close MilvusClient connections if they exist
        if 'rag' in globals() and hasattr(rag.vector_store, 'client'):
            rag.vector_store.close()
            
        if 'pdf_rag' in globals() and hasattr(pdf_rag.vector_store, 'client'):
            pdf_rag.vector_store.close()
            
        print("Closed MilvusClient connections")
    except Exception as e:
        print(f"Error during cleanup: {e}")

# Perform cleanup
cleanup_resources()

print("Cleanup completed!")

## PDF RAG Workflow

**Overview**: Complete workflow for processing PDF documents using the RAG pipeline.

**Requirements**: 
- PDF files placed in `pdf_files/` directory
- Milvus server running (`bash milvus.sh start`)
- Environment variables configured

**Workflow Steps**:
1. **Setup PDF Pipeline** - Initialize PDF-specific RAG configuration
2. **Index PDF Documents** - Process and embed PDF content into vector store  
3. **Test PDF Queries** - Validate system with sample questions
4. **Interactive Session** - Live question-answering interface

In [None]:
# Step 1: Setup PDF RAG Pipeline
print("🚀 Setting up PDF RAG Pipeline...")

# Create PDF-specific configuration and pipeline
pdf_config = RAGConfig(collection_name="pdf_documents")
pdf_rag = RAGPipeline(pdf_config)

# Check for PDF files
import glob
pdf_files = glob.glob("pdf_files/*.pdf")
print(f"📁 Found {len(pdf_files)} PDF files:")
for pdf_file in pdf_files:
    print(f"  📄 {pdf_file}")

if pdf_files:
    # Index the PDF documents
    print("\n📚 Indexing PDF documents...")
    pdf_rag.index_documents("pdf_files/", is_directory=True)
    
    # Get statistics
    pdf_stats = pdf_rag.get_collection_stats()
    print("\n📊 PDF Collection Statistics:")
    for key, value in pdf_stats.items():
        print(f"  {key}: {value}")
    
    print(f"\n✅ PDF RAG ready! Indexed {pdf_stats.get('num_entities', 0)} chunks.")
else:
    print("\n⚠️ No PDF files found in pdf_files/ directory.")
    print("📝 Please add PDF files to test PDF RAG functionality.")

In [None]:
# Step 2: Test PDF RAG Queries
if 'pdf_rag' in locals() and pdf_files:
    pdf_queries = [
        "What is Milvus?",
        "What are the key features of RAG systems?", 
        "How do vector databases work?",
        "What are the benefits of using Milvus for AI applications?"
    ]

    print("🧪 Testing PDF RAG with sample queries...\n")

    for i, query in enumerate(pdf_queries, 1):
        print(f"🔍 PDF Query {i}: {query}")
        print("-" * 50)
        
        result = pdf_rag.query(query, top_k=3)
        
        # Display answer
        answer = result['answer']
        print(f"💡 Answer: {answer}")
        print(f"\n📚 Retrieved {result['num_retrieved']} documents:")
        
        for j, doc in enumerate(result['retrieved_documents'], 1):
            print(f"  {j}. 📊 Score: {doc['score']:.4f}")
            print(f"     📄 Source: {doc['metadata'].get('source', 'Unknown')}")
            print(f"     📝 Preview: {doc['text'][:100]}...")

        print("\n" + "=" * 60 + "\n")

    print("🎯 PDF RAG testing completed!")
else:
    print("⚠️ PDF RAG pipeline not available.")
    print("📝 Run the setup cell above to initialize PDF indexing first.")

In [None]:
# Step 3: Interactive PDF RAG Session
def interactive_pdf_rag_session(pdf_rag_pipeline: RAGPipeline) -> None:
    """Run an interactive RAG session with PDF documents."""
    print("🔍 PDF RAG Interactive Session Started!")
    print("Commands: 'quit'/'exit'/'q' to exit, 'stats' for collection statistics")
    print("-" * 60)
    
    while True:
        query = input("\n📄 Ask about your PDFs: ").strip()
        
        if query.lower() in ['quit', 'exit', 'q']:
            print("👋 Goodbye!")
            break
        
        if query.lower() == 'stats':
            stats = pdf_rag_pipeline.get_collection_stats()
            print("\n📊 PDF Collection Statistics:")
            for key, value in stats.items():
                print(f"  {key}: {value}")
            continue
        
        if not query:
            print("⚠️ Please enter a valid question.")
            continue
        
        print("\n🔄 Processing...")
        result = pdf_rag_pipeline.query(query)
        
        print(f"\n💡 Answer: {result['answer']}")
        print(f"📚 Based on {result['num_retrieved']} document chunks.")
        
        # Show source information
        sources = set([doc['metadata'].get('source', 'Unknown') for doc in result['retrieved_documents']])
        print(f"🗂️ Sources: {', '.join(sources)}")

# Ready to run interactive session
print("🎯 Interactive PDF RAG function ready!")
print("💡 To start: interactive_pdf_rag_session(pdf_rag)")

## Extensions & Features

**Current Implementation:**
- ✅ Modular RAG pipeline design
- ✅ Support for both sample docs and PDF files  
- ✅ Interactive query sessions
- ✅ Comprehensive error handling & logging
- ✅ Configurable via RAGConfig class

**Possible Extensions:**
- 🔄 Response caching system
- 📊 Advanced retrieval & generation metrics
- 🖼️ Multi-modal support (images, tables)
- 🔍 Hybrid dense/sparse search
- 🌐 Web interface for RAG system
- 📈 Performance monitoring & analytics