# Lab 2: Basic RAG Implementation

**Week 4 - RAG Fundamentals**

**Provided by:** ADC ENGINEERING & CONSULTING LTD

## Objectives

In this lab, you will:
- Understand Retrieval-Augmented Generation (RAG) architecture
- Implement document chunking strategies
- Build a complete RAG pipeline from scratch
- Integrate retrieval with LLM generation
- Handle different document types
- Implement context assembly techniques
- Evaluate RAG system quality
- Optimize retrieval and generation balance
- Build a question-answering system

## Prerequisites

- Completed Week 4 Lab 1 (Embeddings & Semantic Search)
- Understanding of vector databases
- OpenAI API key configured
- Python 3.9+

## Setup and Installation

In [None]:
# Install required packages
!pip install openai python-dotenv tiktoken numpy scikit-learn pandas pypdf python-docx --quiet

In [None]:
import os
import json
import re
import numpy as np
from typing import List, Dict, Optional, Tuple, Any, Callable
from dataclasses import dataclass, field
from datetime import datetime
from pathlib import Path
import pickle

from openai import OpenAI
from dotenv import load_dotenv
import tiktoken

# For similarity
from sklearn.metrics.pairwise import cosine_similarity

# Load environment variables
load_dotenv()

# Initialize OpenAI client
client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))

print("✓ Setup complete!")

## Part 1: Understanding RAG Architecture

RAG (Retrieval-Augmented Generation) combines retrieval and generation:

### RAG Pipeline:
1. **Document Ingestion**: Load and preprocess documents
2. **Chunking**: Split documents into manageable pieces
3. **Embedding**: Generate vector embeddings for chunks
4. **Indexing**: Store embeddings in vector database
5. **Retrieval**: Find relevant chunks for a query
6. **Context Assembly**: Combine retrieved chunks
7. **Generation**: Generate answer using LLM + context

Let's build each component:

## Part 2: Document Chunking

Splitting documents into optimal chunks is crucial for RAG quality.

In [None]:
@dataclass
class Chunk:
    """A document chunk."""
    content: str
    chunk_id: str
    document_id: str
    chunk_index: int
    metadata: Dict[str, Any] = field(default_factory=dict)
    start_char: int = 0
    end_char: int = 0

class DocumentChunker:
    """
    Chunk documents using various strategies.
    """
    
    def __init__(
        self,
        chunk_size: int = 500,
        chunk_overlap: int = 50,
        tokenizer_name: str = "cl100k_base"
    ):
        """
        Initialize chunker.
        
        Args:
            chunk_size: Target chunk size in tokens
            chunk_overlap: Overlap between chunks in tokens
            tokenizer_name: Tokenizer to use
        """
        self.chunk_size = chunk_size
        self.chunk_overlap = chunk_overlap
        self.tokenizer = tiktoken.get_encoding(tokenizer_name)
    
    def chunk_by_tokens(
        self,
        text: str,
        document_id: str = "doc"
    ) -> List[Chunk]:
        """
        Chunk text by token count.
        
        Args:
            text: Text to chunk
            document_id: Document identifier
        
        Returns:
            List of chunks
        """
        # Tokenize
        tokens = self.tokenizer.encode(text)
        
        chunks = []
        start_idx = 0
        chunk_index = 0
        
        while start_idx < len(tokens):
            # Get chunk tokens
            end_idx = min(start_idx + self.chunk_size, len(tokens))
            chunk_tokens = tokens[start_idx:end_idx]
            
            # Decode back to text
            chunk_text = self.tokenizer.decode(chunk_tokens)
            
            # Create chunk
            chunk = Chunk(
                content=chunk_text,
                chunk_id=f"{document_id}_chunk_{chunk_index}",
                document_id=document_id,
                chunk_index=chunk_index,
                metadata={
                    "token_count": len(chunk_tokens),
                    "char_count": len(chunk_text)
                },
                start_char=start_idx,
                end_char=end_idx
            )
            
            chunks.append(chunk)
            
            # Move to next chunk with overlap
            start_idx += self.chunk_size - self.chunk_overlap
            chunk_index += 1
        
        return chunks
    
    def chunk_by_sentences(
        self,
        text: str,
        document_id: str = "doc"
    ) -> List[Chunk]:
        """
        Chunk text by sentences, respecting token limits.
        
        Args:
            text: Text to chunk
            document_id: Document identifier
        
        Returns:
            List of chunks
        """
        # Simple sentence splitting
        sentences = re.split(r'(?<=[.!?])\s+', text)
        
        chunks = []
        current_chunk = []
        current_tokens = 0
        chunk_index = 0
        
        for sentence in sentences:
            sentence_tokens = len(self.tokenizer.encode(sentence))
            
            # If adding this sentence exceeds limit, save current chunk
            if current_tokens + sentence_tokens > self.chunk_size and current_chunk:
                chunk_text = " ".join(current_chunk)
                
                chunk = Chunk(
                    content=chunk_text,
                    chunk_id=f"{document_id}_chunk_{chunk_index}",
                    document_id=document_id,
                    chunk_index=chunk_index,
                    metadata={
                        "sentence_count": len(current_chunk),
                        "token_count": current_tokens
                    }
                )
                
                chunks.append(chunk)
                chunk_index += 1
                
                # Start new chunk with overlap
                if self.chunk_overlap > 0 and len(current_chunk) > 1:
                    # Keep last sentence for overlap
                    current_chunk = [current_chunk[-1]]
                    current_tokens = len(self.tokenizer.encode(current_chunk[0]))
                else:
                    current_chunk = []
                    current_tokens = 0
            
            current_chunk.append(sentence)
            current_tokens += sentence_tokens
        
        # Add final chunk
        if current_chunk:
            chunk_text = " ".join(current_chunk)
            chunk = Chunk(
                content=chunk_text,
                chunk_id=f"{document_id}_chunk_{chunk_index}",
                document_id=document_id,
                chunk_index=chunk_index,
                metadata={
                    "sentence_count": len(current_chunk),
                    "token_count": current_tokens
                }
            )
            chunks.append(chunk)
        
        return chunks
    
    def chunk_by_paragraphs(
        self,
        text: str,
        document_id: str = "doc"
    ) -> List[Chunk]:
        """
        Chunk text by paragraphs.
        
        Args:
            text: Text to chunk
            document_id: Document identifier
        
        Returns:
            List of chunks
        """
        # Split by double newlines
        paragraphs = re.split(r'\n\s*\n', text)
        
        chunks = []
        current_chunk = []
        current_tokens = 0
        chunk_index = 0
        
        for para in paragraphs:
            para = para.strip()
            if not para:
                continue
            
            para_tokens = len(self.tokenizer.encode(para))
            
            # If paragraph alone exceeds limit, chunk it separately
            if para_tokens > self.chunk_size:
                # Save current chunk if any
                if current_chunk:
                    chunk_text = "\n\n".join(current_chunk)
                    chunks.append(Chunk(
                        content=chunk_text,
                        chunk_id=f"{document_id}_chunk_{chunk_index}",
                        document_id=document_id,
                        chunk_index=chunk_index
                    ))
                    chunk_index += 1
                    current_chunk = []
                    current_tokens = 0
                
                # Chunk the large paragraph by sentences
                para_chunks = self.chunk_by_sentences(para, f"{document_id}_para")
                for pc in para_chunks:
                    pc.chunk_id = f"{document_id}_chunk_{chunk_index}"
                    pc.chunk_index = chunk_index
                    chunks.append(pc)
                    chunk_index += 1
                
                continue
            
            # If adding this paragraph exceeds limit, save current chunk
            if current_tokens + para_tokens > self.chunk_size and current_chunk:
                chunk_text = "\n\n".join(current_chunk)
                chunks.append(Chunk(
                    content=chunk_text,
                    chunk_id=f"{document_id}_chunk_{chunk_index}",
                    document_id=document_id,
                    chunk_index=chunk_index,
                    metadata={"paragraph_count": len(current_chunk)}
                ))
                chunk_index += 1
                current_chunk = []
                current_tokens = 0
            
            current_chunk.append(para)
            current_tokens += para_tokens
        
        # Add final chunk
        if current_chunk:
            chunk_text = "\n\n".join(current_chunk)
            chunks.append(Chunk(
                content=chunk_text,
                chunk_id=f"{document_id}_chunk_{chunk_index}",
                document_id=document_id,
                chunk_index=chunk_index,
                metadata={"paragraph_count": len(current_chunk)}
            ))
        
        return chunks

# Test chunking strategies
sample_text = """
Artificial intelligence (AI) is transforming the world. Machine learning algorithms can now recognize patterns in data with unprecedented accuracy.

Deep learning, a subset of machine learning, uses neural networks with multiple layers. These networks can learn hierarchical representations of data.

Natural language processing (NLP) enables computers to understand human language. Recent advances in transformer models have revolutionized NLP tasks.

Computer vision allows machines to interpret visual information. Applications range from facial recognition to autonomous vehicles.

The future of AI holds both promise and challenges. Ethical considerations must guide AI development and deployment.
"""

print("="*80)
print("DOCUMENT CHUNKING STRATEGIES")
print("="*80)

chunker = DocumentChunker(chunk_size=100, chunk_overlap=20)

# Test each strategy
strategies = [
    ("Token-based", lambda: chunker.chunk_by_tokens(sample_text, "sample")),
    ("Sentence-based", lambda: chunker.chunk_by_sentences(sample_text, "sample")),
    ("Paragraph-based", lambda: chunker.chunk_by_paragraphs(sample_text, "sample"))
]

for strategy_name, chunk_func in strategies:
    chunks = chunk_func()
    print(f"\n{strategy_name} Chunking:")
    print(f"  Total chunks: {len(chunks)}")
    
    for i, chunk in enumerate(chunks[:2]):  # Show first 2 chunks
        print(f"\n  Chunk {i+1}:")
        print(f"    Content: {chunk.content[:100]}...")
        print(f"    Metadata: {chunk.metadata}")

### Exercise 2.1: Implement Smart Chunking

Create an intelligent chunking strategy:

In [None]:
# TODO: Implement smart chunking

class SmartChunker(DocumentChunker):
    """
    TODO: Implement intelligent chunking that:
    
    1. Detects document structure (headings, sections)
    2. Keeps related content together
    3. Avoids splitting mid-sentence or mid-paragraph
    4. Handles code blocks specially (don't split code)
    5. Preserves markdown/formatting
    6. Adds context from headings to chunks
    """
    
    def chunk_with_structure(
        self,
        text: str,
        document_id: str = "doc"
    ) -> List[Chunk]:
        """
        TODO: Chunk while preserving document structure.
        
        Should handle:
        - Markdown headings (# ## ###)
        - Code blocks (```)
        - Lists
        - Tables
        """
        pass
    
    def add_context_to_chunks(
        self,
        chunks: List[Chunk],
        context_window: int = 2
    ) -> List[Chunk]:
        """
        TODO: Add surrounding context to each chunk.
        
        For each chunk, add snippets from previous/next chunks
        to provide better context.
        """
        pass

# Test your smart chunker
# smart_chunker = SmartChunker(chunk_size=300)
# markdown_text = """
# # Introduction
# This is the introduction.
# 
# ## Technical Details
# Here are the technical details...
# """
# chunks = smart_chunker.chunk_with_structure(markdown_text)

## Part 3: Building the RAG System

Complete RAG implementation with retrieval and generation.

In [None]:
def get_embedding(text: str, model: str = "text-embedding-3-small") -> List[float]:
    """Get embedding for text."""
    text = text.replace("\n", " ")
    response = client.embeddings.create(input=[text], model=model)
    return response.data[0].embedding

class RAGSystem:
    """
    Complete Retrieval-Augmented Generation system.
    """
    
    def __init__(
        self,
        embedding_model: str = "text-embedding-3-small",
        llm_model: str = "gpt-3.5-turbo",
        chunk_size: int = 500,
        chunk_overlap: int = 50
    ):
        """
        Initialize RAG system.
        
        Args:
            embedding_model: Model for embeddings
            llm_model: Model for generation
            chunk_size: Chunk size in tokens
            chunk_overlap: Overlap between chunks
        """
        self.embedding_model = embedding_model
        self.llm_model = llm_model
        self.chunker = DocumentChunker(chunk_size, chunk_overlap)
        
        # Storage
        self.chunks: List[Chunk] = []
        self.embeddings: Optional[np.ndarray] = None
        
        # Statistics
        self.query_count = 0
        self.total_retrieval_time = 0.0
        self.total_generation_time = 0.0
    
    def ingest_document(
        self,
        text: str,
        document_id: str,
        chunking_strategy: str = "sentences",
        metadata: Optional[Dict] = None
    ):
        """
        Ingest a document into the system.
        
        Args:
            text: Document text
            document_id: Unique document identifier
            chunking_strategy: Strategy to use ('tokens', 'sentences', 'paragraphs')
            metadata: Optional metadata for the document
        """
        print(f"Ingesting document: {document_id}")
        
        # Chunk document
        if chunking_strategy == "tokens":
            chunks = self.chunker.chunk_by_tokens(text, document_id)
        elif chunking_strategy == "sentences":
            chunks = self.chunker.chunk_by_sentences(text, document_id)
        elif chunking_strategy == "paragraphs":
            chunks = self.chunker.chunk_by_paragraphs(text, document_id)
        else:
            raise ValueError(f"Unknown strategy: {chunking_strategy}")
        
        # Add metadata to chunks
        if metadata:
            for chunk in chunks:
                chunk.metadata.update(metadata)
        
        print(f"  Created {len(chunks)} chunks")
        
        # Generate embeddings
        print(f"  Generating embeddings...")
        for chunk in chunks:
            embedding = get_embedding(chunk.content, self.embedding_model)
            chunk.metadata['embedding'] = embedding
        
        # Add to storage
        self.chunks.extend(chunks)
        self._rebuild_embeddings()
        
        print(f"✓ Document ingested. Total chunks: {len(self.chunks)}")
    
    def ingest_documents(self, documents: List[Dict[str, Any]]):
        """
        Ingest multiple documents.
        
        Args:
            documents: List of dicts with 'text', 'id', and optional 'metadata'
        """
        for doc in documents:
            self.ingest_document(
                text=doc['text'],
                document_id=doc['id'],
                metadata=doc.get('metadata')
            )
    
    def _rebuild_embeddings(self):
        """Rebuild embeddings matrix."""
        if self.chunks:
            embeddings_list = [chunk.metadata['embedding'] for chunk in self.chunks]
            self.embeddings = np.array(embeddings_list)
    
    def retrieve(
        self,
        query: str,
        top_k: int = 3,
        min_similarity: float = 0.0
    ) -> List[Tuple[Chunk, float]]:
        """
        Retrieve relevant chunks for a query.
        
        Args:
            query: Search query
            top_k: Number of chunks to retrieve
            min_similarity: Minimum similarity threshold
        
        Returns:
            List of (chunk, similarity_score) tuples
        """
        import time
        start_time = time.time()
        
        if not self.chunks:
            return []
        
        # Get query embedding
        query_embedding = np.array(get_embedding(query, self.embedding_model))
        
        # Calculate similarities
        similarities = cosine_similarity(
            query_embedding.reshape(1, -1),
            self.embeddings
        )[0]
        
        # Get top-k indices
        top_indices = np.argsort(similarities)[::-1][:top_k]
        
        # Filter by minimum similarity
        results = []
        for idx in top_indices:
            score = similarities[idx]
            if score >= min_similarity:
                results.append((self.chunks[idx], float(score)))
        
        # Update stats
        self.total_retrieval_time += time.time() - start_time
        
        return results
    
    def generate_answer(
        self,
        query: str,
        context_chunks: List[Chunk],
        temperature: float = 0.7,
        max_tokens: int = 500
    ) -> str:
        """
        Generate answer using LLM with retrieved context.
        
        Args:
            query: User query
            context_chunks: Retrieved chunks
            temperature: LLM temperature
            max_tokens: Maximum tokens in response
        
        Returns:
            Generated answer
        """
        import time
        start_time = time.time()
        
        # Assemble context
        context = "\n\n".join([
            f"[Source {i+1}]: {chunk.content}"
            for i, chunk in enumerate(context_chunks)
        ])
        
        # Create prompt
        prompt = f"""Answer the question based on the context provided. If the answer cannot be found in the context, say "I don't have enough information to answer that."

Context:
{context}

Question: {query}

Answer:"""
        
        # Generate response
        response = client.chat.completions.create(
            model=self.llm_model,
            messages=[
                {"role": "system", "content": "You are a helpful assistant that answers questions based on the provided context."},
                {"role": "user", "content": prompt}
            ],
            temperature=temperature,
            max_tokens=max_tokens
        )
        
        answer = response.choices[0].message.content
        
        # Update stats
        self.total_generation_time += time.time() - start_time
        
        return answer
    
    def query(
        self,
        question: str,
        top_k: int = 3,
        min_similarity: float = 0.0,
        return_sources: bool = True,
        verbose: bool = False
    ) -> Dict[str, Any]:
        """
        Query the RAG system.
        
        Args:
            question: User question
            top_k: Number of chunks to retrieve
            min_similarity: Minimum similarity threshold
            return_sources: Include source chunks in response
            verbose: Print intermediate steps
        
        Returns:
            Dict with answer and metadata
        """
        self.query_count += 1
        
        if verbose:
            print(f"\n{'='*80}")
            print(f"Query: {question}")
            print(f"{'='*80}")
        
        # Retrieve relevant chunks
        retrieved = self.retrieve(question, top_k, min_similarity)
        
        if verbose:
            print(f"\nRetrieved {len(retrieved)} chunks:")
            for i, (chunk, score) in enumerate(retrieved, 1):
                print(f"  {i}. Score: {score:.4f}")
                print(f"     {chunk.content[:100]}...")
        
        if not retrieved:
            return {
                "answer": "I don't have any relevant information to answer that question.",
                "sources": [],
                "num_sources": 0
            }
        
        # Generate answer
        chunks = [chunk for chunk, _ in retrieved]
        answer = self.generate_answer(question, chunks)
        
        if verbose:
            print(f"\nAnswer: {answer}")
        
        # Prepare response
        result = {
            "answer": answer,
            "num_sources": len(retrieved)
        }
        
        if return_sources:
            result["sources"] = [
                {
                    "chunk_id": chunk.chunk_id,
                    "document_id": chunk.document_id,
                    "content": chunk.content,
                    "score": score,
                    "metadata": {k: v for k, v in chunk.metadata.items() if k != 'embedding'}
                }
                for chunk, score in retrieved
            ]
        
        return result
    
    def get_statistics(self) -> Dict[str, Any]:
        """Get system statistics."""
        return {
            "total_chunks": len(self.chunks),
            "total_queries": self.query_count,
            "avg_retrieval_time": self.total_retrieval_time / max(self.query_count, 1),
            "avg_generation_time": self.total_generation_time / max(self.query_count, 1),
            "embedding_model": self.embedding_model,
            "llm_model": self.llm_model
        }
    
    def save(self, filepath: str):
        """Save RAG system to file."""
        with open(filepath, 'wb') as f:
            pickle.dump(self, f)
        print(f"✓ RAG system saved to {filepath}")
    
    @staticmethod
    def load(filepath: str) -> 'RAGSystem':
        """Load RAG system from file."""
        with open(filepath, 'rb') as f:
            rag = pickle.load(f)
        print(f"✓ RAG system loaded from {filepath}")
        return rag

# Test RAG system
print("\n" + "="*80)
print("BUILDING RAG SYSTEM")
print("="*80)

# Create RAG system
rag = RAGSystem(chunk_size=300, chunk_overlap=50)

# Sample knowledge base
knowledge_base = [
    {
        "id": "ai_basics",
        "text": """
Artificial Intelligence (AI) is the simulation of human intelligence by machines. 
It encompasses various subfields including machine learning, deep learning, and natural language processing.

Machine learning algorithms learn patterns from data without explicit programming. 
They improve their performance through experience.

Deep learning uses neural networks with multiple layers to learn hierarchical representations. 
It has achieved remarkable results in image recognition, speech recognition, and language translation.

Natural Language Processing (NLP) enables computers to understand, interpret, and generate human language. 
Modern NLP uses transformer architectures like BERT and GPT.
        """,
        "metadata": {"topic": "AI", "category": "fundamentals"}
    },
    {
        "id": "python_programming",
        "text": """
Python is a high-level, interpreted programming language known for its simplicity and readability. 
It was created by Guido van Rossum and first released in 1991.

Python features dynamic typing and automatic memory management. 
It supports multiple programming paradigms including procedural, object-oriented, and functional programming.

Python's extensive standard library and third-party packages make it suitable for various applications. 
Popular libraries include NumPy for numerical computing, Pandas for data analysis, and TensorFlow for machine learning.

Python is widely used in data science, web development, automation, and scientific computing. 
Its syntax emphasizes code readability and allows programmers to express concepts in fewer lines of code.
        """,
        "metadata": {"topic": "Programming", "category": "languages"}
    },
    {
        "id": "web_development",
        "text": """
Web development involves building applications that run on the internet. 
It consists of front-end development (user interface) and back-end development (server-side logic).

Front-end technologies include HTML for structure, CSS for styling, and JavaScript for interactivity. 
Popular frameworks include React, Vue, and Angular.

Back-end development handles server logic, databases, and APIs. 
Common technologies include Node.js, Python (Django/Flask), and Java (Spring).

RESTful APIs enable communication between front-end and back-end systems. 
They use HTTP methods (GET, POST, PUT, DELETE) to perform operations on resources.

Modern web development emphasizes responsive design, security, and performance optimization. 
Progressive Web Apps (PWAs) combine the best of web and mobile applications.
        """,
        "metadata": {"topic": "Web Dev", "category": "technology"}
    }
]

# Ingest documents
rag.ingest_documents(knowledge_base)

# Query the system
test_queries = [
    "What is machine learning?",
    "Tell me about Python programming",
    "How do RESTful APIs work?",
    "What is quantum computing?"  # Not in knowledge base
]

print("\n" + "="*80)
print("TESTING RAG SYSTEM")
print("="*80)

for query in test_queries:
    result = rag.query(query, top_k=2, verbose=True)
    print("\n" + "-"*80 + "\n")

# Show statistics
print("\n" + "="*80)
print("SYSTEM STATISTICS")
print("="*80)
stats = rag.get_statistics()
for key, value in stats.items():
    if isinstance(value, float):
        print(f"{key}: {value:.4f}")
    else:
        print(f"{key}: {value}")

### Exercise 3.1: Enhance the RAG System

Add features to improve RAG quality:

In [None]:
# TODO: Enhance RAG system

class EnhancedRAGSystem(RAGSystem):
    """
    TODO: Add these enhancements:
    
    1. Re-ranking
       - Re-rank retrieved chunks using cross-encoder
       - Improve relevance of final context
    
    2. Query expansion
       - Generate multiple variations of user query
       - Retrieve with all variations and combine
    
    3. Chunk deduplication
       - Remove duplicate or near-duplicate chunks
       - Avoid redundant context
    
    4. Citation generation
       - Track which chunks contributed to answer
       - Generate citations/references
    
    5. Confidence scoring
       - Estimate confidence in the answer
       - Flag low-confidence responses
    """
    
    def rerank_chunks(
        self,
        query: str,
        chunks: List[Tuple[Chunk, float]],
        top_k: int = 3
    ) -> List[Tuple[Chunk, float]]:
        """
        TODO: Re-rank chunks using more sophisticated scoring.
        
        Could use:
        - Query-chunk relevance (semantic)
        - Chunk quality metrics
        - Source diversity
        - Recency (if timestamps available)
        """
        pass
    
    def expand_query(self, query: str) -> List[str]:
        """
        TODO: Generate query variations.
        
        Strategies:
        - Rephrase using LLM
        - Add synonyms
        - Break complex queries into sub-queries
        """
        pass
    
    def generate_with_citations(
        self,
        query: str,
        context_chunks: List[Chunk]
    ) -> Tuple[str, List[str]]:
        """
        TODO: Generate answer with citations.
        
        Return:
        - Answer with [1], [2] markers
        - List of source citations
        """
        pass

# Test your enhancements
# enhanced_rag = EnhancedRAGSystem()
# enhanced_rag.ingest_documents(knowledge_base)
# result = enhanced_rag.query("What is AI?", top_k=3)

## Part 4: Multi-Document RAG

Handle multiple document formats and sources.

In [None]:
class DocumentLoader:
    """
    Load documents from various sources.
    """
    
    @staticmethod
    def load_text_file(filepath: str) -> Dict[str, Any]:
        """Load plain text file."""
        with open(filepath, 'r', encoding='utf-8') as f:
            text = f.read()
        
        return {
            "id": Path(filepath).stem,
            "text": text,
            "metadata": {
                "source": filepath,
                "type": "text"
            }
        }
    
    @staticmethod
    def load_json_file(filepath: str, text_field: str = "text") -> Dict[str, Any]:
        """Load JSON file."""
        with open(filepath, 'r', encoding='utf-8') as f:
            data = json.load(f)
        
        # Handle list of documents
        if isinstance(data, list):
            return [
                {
                    "id": item.get("id", f"doc_{i}"),
                    "text": item.get(text_field, str(item)),
                    "metadata": {k: v for k, v in item.items() if k not in ["id", text_field]}
                }
                for i, item in enumerate(data)
            ]
        
        # Single document
        return {
            "id": data.get("id", Path(filepath).stem),
            "text": data.get(text_field, str(data)),
            "metadata": {k: v for k, v in data.items() if k not in ["id", text_field]}
        }
    
    @staticmethod
    def load_markdown(filepath: str) -> Dict[str, Any]:
        """Load markdown file."""
        with open(filepath, 'r', encoding='utf-8') as f:
            text = f.read()
        
        # Extract title from first heading
        title_match = re.search(r'^#\s+(.+)$', text, re.MULTILINE)
        title = title_match.group(1) if title_match else Path(filepath).stem
        
        return {
            "id": Path(filepath).stem,
            "text": text,
            "metadata": {
                "source": filepath,
                "type": "markdown",
                "title": title
            }
        }
    
    @staticmethod
    def load_directory(
        dirpath: str,
        extensions: List[str] = ['.txt', '.md', '.json']
    ) -> List[Dict[str, Any]]:
        """
        Load all documents from a directory.
        
        Args:
            dirpath: Directory path
            extensions: File extensions to load
        
        Returns:
            List of document dicts
        """
        documents = []
        dir_path = Path(dirpath)
        
        for ext in extensions:
            for filepath in dir_path.glob(f"**/*{ext}"):
                try:
                    if ext == '.json':
                        doc = DocumentLoader.load_json_file(str(filepath))
                    elif ext == '.md':
                        doc = DocumentLoader.load_markdown(str(filepath))
                    else:
                        doc = DocumentLoader.load_text_file(str(filepath))
                    
                    # Handle list of documents
                    if isinstance(doc, list):
                        documents.extend(doc)
                    else:
                        documents.append(doc)
                except Exception as e:
                    print(f"Error loading {filepath}: {e}")
        
        return documents

class MultiDocumentRAG(RAGSystem):
    """
    RAG system with multi-document support.
    """
    
    def __init__(self, *args, **kwargs):
        """Initialize multi-document RAG."""
        super().__init__(*args, **kwargs)
        self.document_metadata: Dict[str, Dict] = {}
    
    def ingest_from_file(self, filepath: str):
        """Ingest document from file."""
        ext = Path(filepath).suffix.lower()
        
        if ext == '.json':
            doc = DocumentLoader.load_json_file(filepath)
        elif ext == '.md':
            doc = DocumentLoader.load_markdown(filepath)
        else:
            doc = DocumentLoader.load_text_file(filepath)
        
        # Handle list of documents
        if isinstance(doc, list):
            for d in doc:
                self.ingest_document(d['text'], d['id'], metadata=d.get('metadata'))
                self.document_metadata[d['id']] = d.get('metadata', {})
        else:
            self.ingest_document(doc['text'], doc['id'], metadata=doc.get('metadata'))
            self.document_metadata[doc['id']] = doc.get('metadata', {})
    
    def ingest_from_directory(self, dirpath: str):
        """Ingest all documents from directory."""
        documents = DocumentLoader.load_directory(dirpath)
        
        for doc in documents:
            self.ingest_document(doc['text'], doc['id'], metadata=doc.get('metadata'))
            self.document_metadata[doc['id']] = doc.get('metadata', {})
    
    def query_with_filters(
        self,
        question: str,
        filters: Optional[Dict[str, Any]] = None,
        top_k: int = 3
    ) -> Dict[str, Any]:
        """
        Query with metadata filters.
        
        Args:
            question: User question
            filters: Metadata filters (e.g., {"topic": "AI"})
            top_k: Number of results
        
        Returns:
            Query results
        """
        # Retrieve chunks
        retrieved = self.retrieve(question, top_k=top_k * 2)  # Get more for filtering
        
        # Apply filters
        if filters:
            filtered = []
            for chunk, score in retrieved:
                match = True
                for key, value in filters.items():
                    if chunk.metadata.get(key) != value:
                        match = False
                        break
                if match:
                    filtered.append((chunk, score))
            retrieved = filtered[:top_k]
        else:
            retrieved = retrieved[:top_k]
        
        # Generate answer
        if not retrieved:
            return {
                "answer": "No relevant information found matching the filters.",
                "sources": [],
                "num_sources": 0
            }
        
        chunks = [chunk for chunk, _ in retrieved]
        answer = self.generate_answer(question, chunks)
        
        return {
            "answer": answer,
            "num_sources": len(retrieved),
            "sources": [
                {
                    "chunk_id": chunk.chunk_id,
                    "document_id": chunk.document_id,
                    "content": chunk.content,
                    "score": score,
                    "metadata": {k: v for k, v in chunk.metadata.items() if k != 'embedding'}
                }
                for chunk, score in retrieved
            ]
        }
    
    def list_documents(self) -> List[Dict[str, Any]]:
        """List all ingested documents."""
        return [
            {
                "document_id": doc_id,
                "metadata": metadata
            }
            for doc_id, metadata in self.document_metadata.items()
        ]

# Test multi-document RAG
print("\n" + "="*80)
print("MULTI-DOCUMENT RAG")
print("="*80)

multi_rag = MultiDocumentRAG()

# Ingest our previous knowledge base
multi_rag.ingest_documents(knowledge_base)

# List documents
print("\nIngested documents:")
for doc in multi_rag.list_documents():
    print(f"  - {doc['document_id']}: {doc['metadata']}")

# Query with filters
print("\n" + "="*80)
print("Querying with filters")
print("="*80)

result = multi_rag.query_with_filters(
    "Tell me about programming",
    filters={"topic": "Programming"},
    top_k=2
)

print(f"\nQuestion: Tell me about programming")
print(f"Filter: topic='Programming'")
print(f"\nAnswer: {result['answer']}")
print(f"\nSources used: {result['num_sources']}")

### Exercise 4.1: Build a Document Management System

Create a system to manage documents in RAG:

In [None]:
# TODO: Build document management system

class DocumentManager:
    """
    TODO: Implement document management features:
    
    1. Document versioning
       - Track document versions
       - Update documents without losing history
    
    2. Document deletion
       - Remove documents and their chunks
       - Update embeddings index
    
    3. Document search
       - Find documents by metadata
       - List documents by filters
    
    4. Document statistics
       - Track document usage (how often retrieved)
       - Identify most/least useful documents
    
    5. Document refresh
       - Detect when source documents change
       - Auto-refresh embeddings
    """
    
    def __init__(self, rag_system: RAGSystem):
        """Initialize document manager."""
        self.rag = rag_system
        self.document_versions: Dict[str, List[str]] = {}
    
    def update_document(self, document_id: str, new_text: str):
        """TODO: Update document and maintain version history."""
        pass
    
    def delete_document(self, document_id: str):
        """TODO: Remove document and its chunks."""
        pass
    
    def get_document_stats(self, document_id: str) -> Dict:
        """TODO: Get statistics for a document."""
        pass

# Test document management
# manager = DocumentManager(multi_rag)
# manager.update_document("ai_basics", "Updated content...")
# stats = manager.get_document_stats("ai_basics")

## Part 5: RAG Evaluation

Evaluate RAG system quality and performance.

In [None]:
class RAGEvaluator:
    """
    Evaluate RAG system performance.
    """
    
    def __init__(self, rag_system: RAGSystem):
        """Initialize evaluator."""
        self.rag = rag_system
    
    def evaluate_retrieval(
        self,
        test_cases: List[Dict[str, Any]]
    ) -> Dict[str, float]:
        """
        Evaluate retrieval quality.
        
        Args:
            test_cases: List of dicts with 'query' and 'relevant_doc_ids'
        
        Returns:
            Metrics dict
        """
        total_precision = 0.0
        total_recall = 0.0
        total_f1 = 0.0
        
        for test_case in test_cases:
            query = test_case['query']
            relevant_ids = set(test_case['relevant_doc_ids'])
            
            # Retrieve chunks
            retrieved = self.rag.retrieve(query, top_k=5)
            retrieved_doc_ids = set(chunk.document_id for chunk, _ in retrieved)
            
            # Calculate metrics
            if not retrieved_doc_ids:
                precision = 0.0
                recall = 0.0
                f1 = 0.0
            else:
                true_positives = len(relevant_ids & retrieved_doc_ids)
                precision = true_positives / len(retrieved_doc_ids)
                recall = true_positives / len(relevant_ids) if relevant_ids else 0.0
                f1 = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0.0
            
            total_precision += precision
            total_recall += recall
            total_f1 += f1
        
        n = len(test_cases)
        return {
            "precision": total_precision / n,
            "recall": total_recall / n,
            "f1": total_f1 / n
        }
    
    def evaluate_generation(
        self,
        test_cases: List[Dict[str, Any]],
        use_llm: bool = True
    ) -> Dict[str, float]:
        """
        Evaluate generation quality.
        
        Args:
            test_cases: List of dicts with 'query' and 'expected_answer'
            use_llm: Use LLM to evaluate answer quality
        
        Returns:
            Metrics dict
        """
        if not use_llm:
            # Simple string matching (not very useful)
            return {"note": "LLM evaluation disabled"}
        
        total_score = 0.0
        scores = []
        
        for test_case in test_cases:
            query = test_case['query']
            expected = test_case['expected_answer']
            
            # Get actual answer
            result = self.rag.query(query, verbose=False)
            actual = result['answer']
            
            # Use LLM to evaluate
            eval_prompt = f"""Evaluate how well the actual answer matches the expected answer.
Rate from 0-10 where:
- 10: Perfect match in meaning
- 7-9: Mostly correct with minor differences
- 4-6: Partially correct
- 1-3: Mostly incorrect
- 0: Completely incorrect

Expected answer: {expected}
Actual answer: {actual}

Rating (just the number):"""
            
            response = client.chat.completions.create(
                model="gpt-3.5-turbo",
                messages=[{"role": "user", "content": eval_prompt}],
                temperature=0.0,
                max_tokens=10
            )
            
            try:
                score = float(response.choices[0].message.content.strip())
                scores.append(score)
                total_score += score
            except:
                scores.append(0)
        
        return {
            "avg_score": total_score / len(test_cases),
            "min_score": min(scores),
            "max_score": max(scores),
            "scores": scores
        }
    
    def evaluate_latency(
        self,
        queries: List[str],
        num_runs: int = 10
    ) -> Dict[str, float]:
        """
        Evaluate system latency.
        
        Args:
            queries: List of test queries
            num_runs: Number of runs per query
        
        Returns:
            Latency metrics
        """
        import time
        
        retrieval_times = []
        generation_times = []
        total_times = []
        
        for query in queries:
            for _ in range(num_runs):
                start = time.time()
                
                # Retrieval
                retrieval_start = time.time()
                retrieved = self.rag.retrieve(query, top_k=3)
                retrieval_time = time.time() - retrieval_start
                retrieval_times.append(retrieval_time)
                
                # Generation
                generation_start = time.time()
                chunks = [chunk for chunk, _ in retrieved]
                if chunks:
                    self.rag.generate_answer(query, chunks)
                generation_time = time.time() - generation_start
                generation_times.append(generation_time)
                
                total_time = time.time() - start
                total_times.append(total_time)
        
        return {
            "avg_retrieval_ms": np.mean(retrieval_times) * 1000,
            "avg_generation_ms": np.mean(generation_times) * 1000,
            "avg_total_ms": np.mean(total_times) * 1000,
            "p50_total_ms": np.percentile(total_times, 50) * 1000,
            "p95_total_ms": np.percentile(total_times, 95) * 1000,
            "p99_total_ms": np.percentile(total_times, 99) * 1000
        }

# Test RAG evaluation
print("\n" + "="*80)
print("RAG EVALUATION")
print("="*80)

evaluator = RAGEvaluator(rag)

# Retrieval evaluation
retrieval_test_cases = [
    {
        "query": "What is machine learning?",
        "relevant_doc_ids": ["ai_basics"]
    },
    {
        "query": "How do I use Python for data analysis?",
        "relevant_doc_ids": ["python_programming"]
    },
    {
        "query": "What are RESTful APIs?",
        "relevant_doc_ids": ["web_development"]
    }
]

print("\nRetrieval Evaluation:")
retrieval_metrics = evaluator.evaluate_retrieval(retrieval_test_cases)
for metric, value in retrieval_metrics.items():
    print(f"  {metric}: {value:.4f}")

# Latency evaluation
print("\nLatency Evaluation:")
test_queries = ["What is AI?", "Tell me about Python"]
latency_metrics = evaluator.evaluate_latency(test_queries, num_runs=5)
for metric, value in latency_metrics.items():
    print(f"  {metric}: {value:.2f}")

### Exercise 5.1: Build Comprehensive Evaluation Suite

Create a complete evaluation framework:

In [None]:
# TODO: Build evaluation suite

class ComprehensiveEvaluator(RAGEvaluator):
    """
    TODO: Add comprehensive evaluation metrics:
    
    1. Retrieval metrics
       - Precision@K, Recall@K, F1@K
       - Mean Average Precision (MAP)
       - Normalized Discounted Cumulative Gain (NDCG)
    
    2. Generation metrics
       - Faithfulness (answer grounded in context?)
       - Relevance (answers the question?)
       - Coherence
       - BLEU/ROUGE scores
    
    3. End-to-end metrics
       - User satisfaction (simulated)
       - Answer completeness
       - Citation quality
    
    4. Efficiency metrics
       - Tokens used per query
       - Cost per query
       - Cache hit rate
    """
    
    def evaluate_faithfulness(
        self,
        answer: str,
        context: str
    ) -> float:
        """
        TODO: Evaluate if answer is grounded in context.
        
        Use LLM to check if answer contains information
        not present in the context.
        """
        pass
    
    def calculate_ndcg(
        self,
        query: str,
        retrieved: List[Tuple[Chunk, float]],
        relevance_scores: List[float]
    ) -> float:
        """TODO: Calculate NDCG score."""
        pass
    
    def evaluate_cost_efficiency(
        self,
        queries: List[str]
    ) -> Dict[str, float]:
        """
        TODO: Calculate cost efficiency.
        
        Track:
        - Tokens used
        - API calls made
        - Estimated cost
        """
        pass

# Test comprehensive evaluation
# comp_eval = ComprehensiveEvaluator(rag)
# metrics = comp_eval.evaluate_faithfulness(answer, context)

## Challenge Projects

### Challenge 1: Conversational RAG

Build RAG system with conversation memory:

In [None]:
class ConversationalRAG(RAGSystem):
    """
    RAG system with conversation history.
    
    TODO: Implement:
    1. Maintain conversation history
    2. Use history for context in queries
    3. Handle follow-up questions ("What about its applications?")
    4. Conversation summarization for long conversations
    5. Multi-turn query understanding
    6. Conversation branching/reset
    """
    
    def __init__(self, *args, **kwargs):
        """Initialize conversational RAG."""
        super().__init__(*args, **kwargs)
        self.conversations: Dict[str, List[Dict]] = {}
    
    def chat(
        self,
        user_message: str,
        conversation_id: str = "default",
        top_k: int = 3
    ) -> str:
        """
        TODO: Chat with conversation memory.
        
        Should:
        - Understand context from previous messages
        - Resolve pronouns/references
        - Handle follow-ups naturally
        """
        pass
    
    def reset_conversation(self, conversation_id: str = "default"):
        """TODO: Reset conversation history."""
        pass

# Usage:
# conv_rag = ConversationalRAG()
# conv_rag.ingest_documents(knowledge_base)
# conv_rag.chat("What is AI?")
# conv_rag.chat("Tell me more about its applications")  # Follow-up

### Challenge 2: Hybrid Search RAG

Combine semantic and keyword search:

In [None]:
class HybridSearchRAG(RAGSystem):
    """
    RAG with hybrid semantic + keyword search.
    
    TODO: Implement:
    1. BM25 for keyword search
    2. Combine BM25 and semantic scores
    3. Reciprocal Rank Fusion (RRF)
    4. Query analysis to choose strategy
    5. Per-query weight adjustment
    """
    
    def __init__(self, *args, **kwargs):
        """Initialize hybrid search RAG."""
        super().__init__(*args, **kwargs)
        self.bm25_index = None
    
    def build_bm25_index(self):
        """TODO: Build BM25 index for keyword search."""
        pass
    
    def hybrid_retrieve(
        self,
        query: str,
        top_k: int = 3,
        semantic_weight: float = 0.7
    ) -> List[Tuple[Chunk, float]]:
        """
        TODO: Retrieve using hybrid search.
        
        Combine:
        - Semantic similarity scores
        - BM25 keyword scores
        """
        pass

# Usage:
# hybrid_rag = HybridSearchRAG()
# hybrid_rag.ingest_documents(knowledge_base)
# hybrid_rag.build_bm25_index()
# results = hybrid_rag.hybrid_retrieve("machine learning algorithms", semantic_weight=0.6)

### Challenge 3: Adaptive RAG

RAG system that adapts based on performance:

In [None]:
class AdaptiveRAG(RAGSystem):
    """
    Self-improving RAG system.
    
    TODO: Implement:
    1. Track query success/failure
    2. Learn optimal top_k per query type
    3. Adjust chunking strategy based on performance
    4. A/B test different retrieval strategies
    5. User feedback integration
    6. Automatic reindexing when performance degrades
    """
    
    def __init__(self, *args, **kwargs):
        """Initialize adaptive RAG."""
        super().__init__(*args, **kwargs)
        self.performance_history = []
        self.optimal_params = {}
    
    def query_with_feedback(
        self,
        question: str,
        user_feedback: Optional[Dict] = None
    ) -> Dict[str, Any]:
        """
        TODO: Query and learn from feedback.
        
        Feedback could include:
        - Was answer helpful? (yes/no)
        - What was wrong? (irrelevant/incomplete/incorrect)
        - Better answer (for training)
        """
        pass
    
    def optimize_parameters(self):
        """TODO: Analyze history and adjust parameters."""
        pass
    
    def suggest_improvements(self) -> List[str]:
        """TODO: Suggest system improvements based on performance."""
        pass

# Usage:
# adaptive_rag = AdaptiveRAG()
# adaptive_rag.ingest_documents(knowledge_base)
# result = adaptive_rag.query_with_feedback("What is AI?", feedback={"helpful": True})
# adaptive_rag.optimize_parameters()
# suggestions = adaptive_rag.suggest_improvements()

## Summary

In this lab, you've learned:

1. ✅ RAG architecture and pipeline
2. ✅ Document chunking strategies
3. ✅ Building complete RAG systems
4. ✅ Integrating retrieval with generation
5. ✅ Multi-document support
6. ✅ RAG evaluation metrics
7. ✅ Performance optimization
8. ✅ Quality assessment

### Key Takeaways

**RAG Benefits:**
- Grounds LLM responses in factual data
- Enables knowledge updates without retraining
- Reduces hallucinations
- Provides source attribution
- Cost-effective for domain-specific applications

**Critical Components:**
1. **Chunking**: Balance between context and specificity
2. **Retrieval**: Find truly relevant information
3. **Context Assembly**: Provide coherent context to LLM
4. **Generation**: Produce accurate, grounded answers

**Best Practices:**

**Chunking:**
- 300-500 tokens per chunk (adjust for use case)
- Use overlap to preserve context
- Respect document structure (paragraphs, sections)
- Don't split mid-sentence

**Retrieval:**
- Start with top_k=3-5, adjust based on testing
- Set minimum similarity threshold
- Consider multiple retrieval strategies
- Deduplicate similar chunks

**Generation:**
- Clear instructions in prompt
- Explicit grounding requirement
- Handle "no information" cases
- Include source attribution

**Evaluation:**
- Test with real queries
- Measure retrieval quality (precision/recall)
- Assess answer faithfulness
- Monitor latency and costs

### Common Pitfalls

1. **Chunks too large**: Irrelevant information
2. **Chunks too small**: Missing context
3. **No overlap**: Lost information at boundaries
4. **Wrong top_k**: Too few (miss info) or too many (noise)
5. **Poor prompts**: LLM ignores context or hallucinates
6. **No evaluation**: Unknown quality issues

### Next Steps

- Complete the challenge projects
- Experiment with different chunking strategies
- Test with your own documents
- Move on to Lab 3: Enterprise RAG System
- Explore production RAG frameworks (LangChain, LlamaIndex)

**Provided by:** ADC ENGINEERING & CONSULTING LTD