# RAG Application - Complete Interactive Notebook
## PDF-based Retrieval Augmented Generation System

This notebook provides an interactive interface for:
- üìÑ PDF document upload and processing
- üîç Vector-based document storage using FAISS
- üß† Semantic search and retrieval with sentence transformers
- üí¨ Question answering with Ollama LLM
- üìä Document management and analytics

## üéØ Prerequisites
- Ollama installed and running
- Python 3.8+
- Model: llama2 (or your preferred model)

**Based on files:** controller.py, rag_engine.py, server.py, utils.py, requirements.txt

## 1. Install Dependencies
Run this cell first to install all required packages.

In [None]:
# Install all required packages
!pip install fastapi==0.109.0 -q
!pip install uvicorn==0.27.0 -q
!pip install PyPDF2==3.0.1 -q
!pip install sentence-transformers==2.3.1 -q
!pip install faiss-cpu==1.7.4 -q
!pip install numpy==1.26.4 -q
!pip install ollama==0.1.6 -q
!pip install python-multipart==0.0.6 -q
!pip install reportlab==4.0.9 -q
!pip install pandas==2.2.0 -q
!pip install matplotlib==3.8.3 -q
!pip install ipywidgets==8.1.1 -q

print("‚úÖ All dependencies installed successfully!")

## 2. Import Required Libraries

In [None]:
import io
import os
import uuid
import json
import pickle
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from typing import List, Dict, Any, Optional
from PyPDF2 import PdfReader
from sentence_transformers import SentenceTransformer
import faiss
import ollama
from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas
from IPython.display import display, HTML, clear_output
import ipywidgets as widgets

# Suppress warnings
import warnings
warnings.filterwarnings('ignore')

print("‚úÖ All libraries imported successfully!")

## 3. Utility Functions (from utils.py)

In [None]:
class TemplateManager:
    """Simple template manager for RAG prompts"""
    
    # Prompt Templates
    DEFAULT = """Based on the following context, answer the question.

Context:
{context}

Question: {query}

Answer:"""
    
    DETAILED = """You are a helpful assistant. Use the following context to answer the question in detail.

Context:
{context}

Question: {query}

Provide a comprehensive answer:"""
    
    CONCISE = """Answer briefly using only the context provided.

Context:
{context}

Question: {query}

Brief Answer:"""
    
    @staticmethod
    def get(template_type: str = "default") -> str:
        """Get template by type"""
        templates = {
            "default": TemplateManager.DEFAULT,
            "detailed": TemplateManager.DETAILED,
            "concise": TemplateManager.CONCISE
        }
        return templates.get(template_type, TemplateManager.DEFAULT)


def process_pdf_file(pdf_content: bytes, chunk_size: int = 500) -> List[str]:
    """Process PDF file and extract text chunks"""
    pdf_file = io.BytesIO(pdf_content)
    pdf_reader = PdfReader(pdf_file)
    
    # Extract text from all pages
    text = ""
    for page in pdf_reader.pages:
        text += page.extract_text()
    
    # Split into chunks
    chunks = []
    words = text.split()
    
    for i in range(0, len(words), chunk_size):
        chunk = " ".join(words[i:i + chunk_size])
        chunks.append(chunk)
    
    return chunks

print('‚úÖ Utility functions loaded!')

## 4. RAG Engine (from rag_engine.py)

In [None]:
class RAGEngine:
    def __init__(self, model_name="all-MiniLM-L6-v2", llm_model="llama3.2"):
        print(f"üîÑ Initializing RAG Engine with FAISS...")
        self.embedding_model = SentenceTransformer(model_name)
        self.llm_model = llm_model
        self.dimension = 384  # all-MiniLM-L6-v2 embedding size
        self.index = faiss.IndexFlatL2(self.dimension)
        self.chunks = []
        self.metadata = []
        print(f"‚úÖ RAG Engine initialized with FAISS")
    
    def add_document(self, doc_id: str, content: str, filename: str):
        """Add a document chunk to FAISS vector store"""
        # Generate embedding
        embedding = self.embedding_model.encode([content])[0]
        
        # Add to FAISS index
        self.index.add(np.array([embedding], dtype=np.float32))
        
        # Store chunk and metadata
        self.chunks.append(content)
        self.metadata.append({
            'id': doc_id,
            'filename': filename
        })
        
        print(f"‚úÖ Added chunk from: {filename} (Total chunks: {len(self.chunks)})")
    
    def search(self, query: str, top_k: int = 3, filter_filenames: List[str] = None) -> List[Dict]:
        """Search for relevant chunks based on query with optional filename filtering"""
        if not self.chunks:
            return []
        
        # Encode query
        query_embedding = self.embedding_model.encode([query])
        query_vector = np.array(query_embedding, dtype=np.float32)
        
        # If filtering, search more results to ensure we get enough after filtering
        search_k = top_k if not filter_filenames else min(len(self.chunks), top_k * 3)
        
        # Search in FAISS
        distances, indices = self.index.search(query_vector, min(search_k, len(self.chunks)))
        
        results = []
        for i, idx in enumerate(indices[0]):
            if idx < len(self.chunks):
                # Filter by filename if specified
                if filter_filenames and self.metadata[idx]['filename'] not in filter_filenames:
                    continue
                
                # Convert L2 distance to similarity score (0-1)
                similarity = 1 / (1 + distances[0][i])
                results.append({
                    'id': self.metadata[idx]['id'],
                    'filename': self.metadata[idx]['filename'],
                    'content': self.chunks[idx],
                    'score': float(similarity)
                })
                
                # Stop when we have enough results
                if len(results) >= top_k:
                    break
        
        return results
    
    def search_with_id_filter(self, query: str, top_k: int = 3, filter_filenames: List[str] = None) -> List[Dict]:
        """Search with FAISS ID filtering for better performance"""
        if not self.chunks:
            return []
        
        # Encode query
        query_embedding = self.embedding_model.encode([query])
        query_vector = np.array(query_embedding, dtype=np.float32)
        
        if filter_filenames:
            # Create ID selector for filtering
            valid_ids = [i for i, meta in enumerate(self.metadata) 
                         if meta['filename'] in filter_filenames]
        
            if not valid_ids:
                return []
        
            # Use IDSelectorBatch for filtering
            id_selector = faiss.IDSelectorBatch(valid_ids)
            params = faiss.SearchParametersIVF(sel=id_selector)
            distances, indices = self.index.search(query_vector, top_k, params=params)
        else:
            distances, indices = self.index.search(query_vector, min(top_k, len(self.chunks)))
        
        results = []
        for i, idx in enumerate(indices[0]):
            if idx >= 0 and idx < len(self.chunks):
                similarity = 1 / (1 + distances[0][i])
                results.append({
                    'id': self.metadata[idx]['id'],
                    'filename': self.metadata[idx]['filename'],
                    'content': self.chunks[idx],
                    'score': float(similarity)
                })
        
        return results
    
    def create_prompt(self, query: str, context: str, template_type: str = "default") -> str:
        """Create a prompt for the LLM from query and context"""
        template = TemplateManager.get(template_type)
        return template.format(context=context, query=query)
    
    def generate_answer(self, query: str, top_k: int = 3, template_type: str = "default", filter_filenames: List[str] = None) -> Dict:
        """Generate answer using RAG with optional document filtering"""
        # Search for relevant chunks with optional filtering
        results = self.search(query, top_k, filter_filenames=filter_filenames)
        
        if not results:
            filter_msg = f" in documents: {', '.join(filter_filenames)}" if filter_filenames else ""
            return {
                'answer': f'No relevant information found{filter_msg}.',
                'sources': [],
                'filenames': [],
                'num_sources': 0
            }
        
        # Build context from results
        context = "\n\n".join([r['content'] for r in results])
        
        # Create prompt using template manager
        prompt = self.create_prompt(query, context, template_type)
        
        # Generate answer using Ollama
        try:
            response = ollama.generate(
                model=self.llm_model,
                prompt=prompt
            )
            answer = response['response']
        except Exception as e:
            answer = f"Error generating answer: {str(e)}"
        
        return {
            'answer': answer,
            'sources': [r['content'][:200] + '...' for r in results],
            'filenames': list(set([r['filename'] for r in results])),
            'num_sources': len(results)
        }
    
    def get_all_documents(self) -> List[Dict]:
        """Get all stored chunks"""
        return [
            {
                'id': meta['id'],
                'filename': meta['filename'],
                'content': content
            }
            for meta, content in zip(self.metadata, self.chunks)
        ]
    
    def save_state(self, filepath: str = "data/rag_state.pkl"):
        """Save FAISS index and data to disk"""
        os.makedirs(os.path.dirname(filepath), exist_ok=True)
        
        # Save FAISS index
        faiss.write_index(self.index, filepath.replace('.pkl', '.faiss'))
        
        # Save metadata and chunks
        state = {
            'chunks': self.chunks,
            'metadata': self.metadata
        }
        with open(filepath, 'wb') as f:
            pickle.dump(state, f)
        
        print(f"üíæ Saved state to {filepath}")
    
    def load_state(self, filepath: str = "data/rag_state.pkl"):
        """Load FAISS index and data from disk"""
        faiss_path = filepath.replace('.pkl', '.faiss')
        
        if os.path.exists(filepath) and os.path.exists(faiss_path):
            # Load FAISS index
            self.index = faiss.read_index(faiss_path)
            
            # Load metadata and chunks
            with open(filepath, 'rb') as f:
                state = pickle.load(f)
                self.chunks = state['chunks']
                self.metadata = state['metadata']
            
            print(f"üìÇ Loaded {len(self.chunks)} chunks from {filepath}")
            return True
        return False
    
    def clear(self):
        """Clear all data"""
        self.index = faiss.IndexFlatL2(self.dimension)
        self.chunks = []
        self.metadata = []
        print("üóëÔ∏è Cleared all data")

print('‚úÖ RAG Engine class loaded!')

## 5. Controller (from controller.py)

In [None]:
import faiss
import numpy as np


class RAGController:
    """Controller for RAG operations"""
    
    def __init__(self, llm_model: str = "llama2"):
        self.rag = RAGEngine(llm_model=llm_model)
        self.rag.load_state()
    
    def upload_pdf(self, pdf_content: bytes, filename: str) -> Dict[str, Any]:
        """Process and store PDF content"""
        # Validate file extension
        if not filename.endswith('.pdf'):
            raise ValueError("Only PDF files are allowed")
        
        # Process PDF and get chunks
        chunks = process_pdf_file(pdf_content)
        
        # Add each chunk to RAG engine
        for chunk in chunks:
            doc_id = str(uuid.uuid4())
            self.rag.add_document(doc_id, chunk, filename)
        
        # Save state
        self.rag.save_state()
        
        return {
            "message": "PDF processed successfully",
            "filename": filename,
            "chunks_created": len(chunks),
            "total_chunks": len(self.rag.chunks)
        }
    
    def query_documents(self, query: str, top_k: int = 3, template_type: str = "default", 
                       filter_filenames: Optional[List[str]] = None) -> Dict[str, Any]:
        """Query the RAG system with optional document filtering"""
        if len(self.rag.chunks) == 0:
            raise ValueError("No documents uploaded yet")
        
        # Validate filter_filenames if provided
        if filter_filenames:
            available_docs = self.get_document_list()
            invalid_docs = [f for f in filter_filenames if f not in available_docs]
            if invalid_docs:
                raise ValueError(f"Documents not found: {', '.join(invalid_docs)}")
        
        result = self.rag.generate_answer(query, top_k, template_type, filter_filenames)
        return result
    
    def get_stats(self) -> Dict[str, Any]:
        """Get statistics about the RAG system"""
        return {
            "total_chunks": len(self.rag.chunks),
            "total_documents": len(set([m['filename'] for m in self.rag.metadata])),
            "index_size": self.rag.index.ntotal
        }
    
    def get_document_list(self) -> List[str]:
        """Get list of all uploaded documents"""
        if not self.rag.metadata:
            return []
        return list(set([m['filename'] for m in self.rag.metadata]))
    
    def get_document_details(self) -> List[Dict[str, Any]]:
        """Get detailed information about each document"""
        if not self.rag.metadata:
            return []
        
        doc_info = {}
        for meta in self.rag.metadata:
            filename = meta['filename']
            if filename not in doc_info:
                doc_info[filename] = {
                    'filename': filename,
                    'chunk_count': 0
                }
            doc_info[filename]['chunk_count'] += 1
        
        return list(doc_info.values())
    
    def get_document_chunks(self, filename: str) -> Dict[str, Any]:
        """Get all chunks from a specific document"""
        chunks = [
            {
                'id': meta['id'],
                'content': content,
                'preview': content[:200] + '...' if len(content) > 200 else content
            }
            for meta, content in zip(self.rag.metadata, self.rag.chunks)
            if meta['filename'] == filename
        ]
        
        if not chunks:
            raise ValueError(f"Document not found: {filename}")
        
        return {
            'filename': filename,
            'chunk_count': len(chunks),
            'chunks': chunks
        }
    
    def delete_document(self, filename: str) -> Dict[str, Any]:
        """Delete a specific document and its chunks"""
        if filename not in self.get_document_list():
            raise ValueError(f"Document not found: {filename}")
        
        # Count chunks before deletion
        chunks_before = len(self.rag.chunks)
        
        # Filter out chunks from the specified document
        new_chunks = []
        new_metadata = []
        for meta, chunk in zip(self.rag.metadata, self.rag.chunks):
            if meta['filename'] != filename:
                new_chunks.append(chunk)
                new_metadata.append(meta)
        
        # Rebuild FAISS index
        self.rag.chunks = new_chunks
        self.rag.metadata = new_metadata
        self.rag.index = faiss.IndexFlatL2(self.rag.dimension)
        
        # Re-add all remaining chunks to index
        if new_chunks:
            embeddings = self.rag.embedding_model.encode(new_chunks)
            self.rag.index.add(np.array(embeddings, dtype=np.float32))
        
        # Save state
        self.rag.save_state()
        
        chunks_deleted = chunks_before - len(new_chunks)
        
        return {
            "message": f"Document '{filename}' deleted successfully",
            "chunks_deleted": chunks_deleted,
            "remaining_chunks": len(new_chunks)
        }
    
    def clear_all(self) -> Dict[str, str]:
        """Clear all data from the system"""
        self.rag.clear()
        return {"message": "All data cleared"}
    
    def has_documents(self) -> bool:
        """Check if any documents are loaded"""
        return len(self.rag.chunks) > 0

print('‚úÖ RAG Controller class loaded!')

## 6. Initialize RAG System

**Important:** Make sure Ollama is installed and running with the llama2 model.

Install Ollama: https://ollama.ai/

Pull model:

ollama pull llama2


In [None]:
# Initialize the RAG Controller
controller = RAGController(llm_model="llama2")

print("\n" + "="*60)
print("üöÄ RAG System Initialized!")
print("="*60)

# Display current stats
stats = controller.get_stats()
print(f"\nüìä Current Statistics:")
print(f"  - Total Documents: {stats['total_documents']}")
print(f"  - Total Chunks: {stats['total_chunks']}")
print(f"  - Index Size: {stats['index_size']}")

## 7. Create Sample PDF Documents

In [None]:
def create_sample_pdf(filename: str, title: str, content: str):
    """Create a sample PDF file for testing"""
    c = canvas.Canvas(filename, pagesize=letter)
    width, height = letter
    
    # Add title
    c.setFont("Helvetica-Bold", 16)
    c.drawString(50, height - 50, title)
    
    # Add content
    c.setFont("Helvetica", 11)
    text_object = c.beginText(50, height - 100)
    
    # Split content into lines and wrap
    lines = content.split('\n')
    for line in lines:
        if len(line) > 85:
            words = line.split()
            current_line = ""
            for word in words:
                if len(current_line + word) < 85:
                    current_line += word + " "
                else:
                    text_object.textLine(current_line.strip())
                    current_line = word + " "
            if current_line:
                text_object.textLine(current_line.strip())
        else:
            text_object.textLine(line)
    
    c.drawText(text_object)
    c.save()
    print(f"‚úÖ Created: {filename}")

# Create sample PDFs directory
os.makedirs("sample_pdfs", exist_ok=True)

# Sample content
ai_content = """Artificial Intelligence (AI) Overview

Artificial Intelligence is the simulation of human intelligence processes by machines, 
especially computer systems. These processes include learning, reasoning, and self-correction.

Key Applications:
- Natural Language Processing (NLP)
- Computer Vision and Image Recognition
- Robotics and Autonomous Systems
- Expert Systems and Decision Support
- Speech Recognition and Synthesis

Machine Learning is a subset of AI that provides systems the ability to automatically 
learn and improve from experience without being explicitly programmed.

Deep Learning is a subset of machine learning based on artificial neural networks.
"""

python_content = """Python Programming Language Guide

Python is a high-level, interpreted programming language known for its simplicity 
and readability. Created by Guido van Rossum and first released in 1991.

Key Features:
- Easy to learn and use with clean syntax
- Extensive standard library
- Cross-platform compatibility
- Strong community support

Popular Use Cases:
1. Web Development (Django, Flask, FastAPI)
2. Data Science and Machine Learning
3. Automation and Scripting
"""

faiss_content = """FAISS: Facebook AI Similarity Search

FAISS is a library for efficient similarity search and clustering of dense vectors.

Key Features:
- Fast similarity search in high-dimensional spaces
- Supports billions of vectors
- GPU acceleration available
- Multiple index types for different use cases
"""

# Create the PDFs
create_sample_pdf("sample_pdfs/ai_overview.pdf", "Artificial Intelligence Overview", ai_content)
create_sample_pdf("sample_pdfs/python_guide.pdf", "Python Programming Guide", python_content)
create_sample_pdf("sample_pdfs/faiss_guide.pdf", "FAISS Library Guide", faiss_content)

print("\n‚úÖ Sample PDFs created in 'sample_pdfs' directory")

## 8. Upload PDF Documents

In [None]:
def upload_pdf_from_file(filepath: str):
    """Upload a PDF file to the RAG system"""
    try:
        with open(filepath, 'rb') as f:
            pdf_content = f.read()
        
        filename = os.path.basename(filepath)
        result = controller.upload_pdf(pdf_content, filename)
        
        print(f"\n‚úÖ {result['message']}")
        print(f"   Filename: {result['filename']}")
        print(f"   Chunks Created: {result['chunks_created']}")
        print(f"   Total Chunks in System: {result['total_chunks']}")
        
        return result
    except Exception as e:
        print(f"‚ùå Error uploading PDF: {str(e)}")
        return None

# Upload sample PDFs
print("üì§ Uploading sample PDFs...\n")
upload_pdf_from_file("sample_pdfs/ai_overview.pdf")
upload_pdf_from_file("sample_pdfs/python_guide.pdf")
upload_pdf_from_file("sample_pdfs/faiss_guide.pdf")

# Display updated stats
stats = controller.get_stats()
print(f"\nüìä Updated Statistics:")
print(f"   Total Documents: {stats['total_documents']}")
print(f"   Total Chunks: {stats['total_chunks']}")

## 9. List All Documents

In [None]:
# Get list of all documents
documents = controller.get_document_list()
print(f"üìö Total Documents: {len(documents)}\n")

for i, doc in enumerate(documents, 1):
    print(f"{i}. {doc}")

# Get detailed information
print("\n" + "="*60)
print("Document Details:")
print("="*60)

details = controller.get_document_details()
if details:
    df = pd.DataFrame(details)
    display(df)
else:
    print("No documents found")

## 10. Query Documents

In [None]:
def query_rag_system(query: str, top_k: int = 3, template_type: str = "default", 
                     filter_filenames: List[str] = None):
    """Query the RAG system and display results"""
    try:
        print(f"\nüîç Query: {query}")
        print(f"   Top K: {top_k}")
        print(f"   Template: {template_type}")
        if filter_filenames:
            print(f"   Filtering by: {', '.join(filter_filenames)}")
        print("\n" + "="*60)
        
        result = controller.query_documents(
            query=query,
            top_k=top_k,
            template_type=template_type,
            filter_filenames=filter_filenames
        )
        
        print(f"\nüí° Answer:\n{result['answer']}")
        print(f"\nüìÑ Sources Used: {result['num_sources']}")
        print(f"üìÅ Files: {', '.join(result['filenames'])}")
        
        if result['sources']:
            print("\nüìñ Source Excerpts:")
            for i, source in enumerate(result['sources'], 1):
                print(f"\n{i}. {source}")
        
        return result
    except Exception as e:
        print(f"‚ùå Error: {str(e)}")
        return None

# Example query
query_rag_system("What is Artificial Intelligence?", top_k=3, template_type="default")

## 11. Query with Document Filtering

In [None]:
# Query only from specific documents
query_rag_system(
    query="What are the key features of Python?",
    top_k=2,
    template_type="concise",
    filter_filenames=["python_guide.pdf"]
)

## 12. Run Multiple Queries

In [None]:
# Run multiple queries
queries = [
    "What is Machine Learning?",
    "What are Python's use cases?",
    "Explain FAISS and its applications"
]

results = []
for query in queries:
    print("\n" + "#"*70)
    result = query_rag_system(query, top_k=2, template_type="concise")
    if result:
        results.append({
            'query': query,
            'answer': result['answer'][:150] + '...' if len(result['answer']) > 150 else result['answer'],
            'sources': result['num_sources'],
            'files': ', '.join(result['filenames'])
        })

# Display summary
print("\n" + "="*70)
print("Query Summary:")
print("="*70)
if results:
    df_results = pd.DataFrame(results)
    display(df_results)

## 13. View Document Chunks

In [None]:
def view_document_chunks(filename: str, max_display: int = 3):
    """View chunks from a specific document"""
    try:
        result = controller.get_document_chunks(filename)
        print(f"\nüìÑ Document: {result['filename']}")
        print(f"üìä Total Chunks: {result['chunk_count']}")
        print("\n" + "="*60)
        
        for i, chunk in enumerate(result['chunks'][:max_display], 1):
            print(f"\nChunk {i}:")
            print(f"ID: {chunk['id']}")
            print(f"Preview: {chunk['preview']}")
            print("-" * 60)
        
        if result['chunk_count'] > max_display:
            print(f"\n... and {result['chunk_count'] - max_display} more chunks")
        
        return result
    except Exception as e:
        print(f"‚ùå Error: {str(e)}")
        return None

# View chunks from a document
view_document_chunks("ai_overview.pdf", max_display=2)

## 14. Interactive Query Interface

In [None]:
# Create interactive widgets
query_input = widgets.Textarea(
    value='What is Deep Learning?',
    placeholder='Enter your question here',
    description='Query:',
    layout=widgets.Layout(width='90%', height='80px')
)

top_k_slider = widgets.IntSlider(
    value=3,
    min=1,
    max=10,
    step=1,
    description='Top K:',
    continuous_update=False
)

template_dropdown = widgets.Dropdown(
    options=['default', 'detailed', 'concise'],
    value='default',
    description='Template:'
)

# Document filter
doc_list = controller.get_document_list()
filter_select = widgets.SelectMultiple(
    options=['All'] + doc_list,
    value=['All'],
    description='Filter Docs:',
    rows=min(5, len(doc_list) + 1)
)

query_button = widgets.Button(
    description='üîç Search',
    button_style='success',
    icon='search'
)

output_area = widgets.Output()

def on_query_button_clicked(b):
    with output_area:
        clear_output()
        filter_files = None if 'All' in filter_select.value else list(filter_select.value)
        query_rag_system(
            query=query_input.value,
            top_k=top_k_slider.value,
            template_type=template_dropdown.value,
            filter_filenames=filter_files
        )

query_button.on_click(on_query_button_clicked)

# Display widgets
display(widgets.VBox([
    widgets.HTML("<h3>üîç Interactive Query Interface</h3>"),
    query_input,
    widgets.HBox([top_k_slider, template_dropdown]),
    filter_select,
    query_button,
    output_area
]))

## 15. System Statistics and Visualization

In [None]:
# Get and display statistics
stats = controller.get_stats()
details = controller.get_document_details()

print("üìä System Statistics")
print("="*60)
print(f"Total Documents: {stats['total_documents']}")
print(f"Total Chunks: {stats['total_chunks']}")
print(f"Index Size: {stats['index_size']}")

# Visualize document distribution
if details:
    df_details = pd.DataFrame(details)
    
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
    
    # Bar chart
    ax1.bar(df_details['filename'], df_details['chunk_count'], color='skyblue', edgecolor='navy')
    ax1.set_xlabel('Document', fontsize=12)
    ax1.set_ylabel('Number of Chunks', fontsize=12)
    ax1.set_title('Chunks per Document', fontsize=14, fontweight='bold')
    ax1.tick_params(axis='x', rotation=45)
    ax1.grid(axis='y', alpha=0.3)
    
    # Pie chart
    colors = plt.cm.Set3(range(len(df_details)))
    ax2.pie(df_details['chunk_count'], labels=df_details['filename'], 
            autopct='%1.1f%%', colors=colors, startangle=90)
    ax2.set_title('Chunk Distribution', fontsize=14, fontweight='bold')
    
    plt.tight_layout()
    plt.show()
    
    # Display table
    print("\nDocument Details:")
    display(df_details)
else:
    print("\nNo documents found")

## 16. Delete Document

In [None]:
def delete_document(filename: str):
    """Delete a document from the system"""
    try:
        result = controller.delete_document(filename)
        print(f"\n‚úÖ {result['message']}")
        print(f"   Chunks Deleted: {result['chunks_deleted']}")
        print(f"   Remaining Chunks: {result['remaining_chunks']}")
        return result
    except Exception as e:
        print(f"‚ùå Error: {str(e)}")
        return None

# Example: Delete a document (uncomment to use)
# delete_document("faiss_guide.pdf")

print("Delete function ready. Use: delete_document('filename.pdf')")

## 17. Save and Load System State

In [None]:
# Save current state
controller.rag.save_state()
print("üíæ System state saved!")

# To load state (automatically done on initialization)
# controller.rag.load_state()
# print("üìÇ System state loaded!")

## 18. Clear All Data

In [None]:
# Clear all data (use with caution!)
def clear_all_data():
    """Clear all data from the system"""
    result = controller.clear_all()
    print(f"\n{result['message']}")
    print("‚ö†Ô∏è  All documents and chunks have been removed")

# Uncomment to clear all data
# clear_all_data()

print("Clear function ready. Use: clear_all_data()")

## 19. Quick Start Guide

### Step-by-Step Instructions:

1. **Make sure Ollama is running** with the llama2 model

2. **Run cells 1-6** to set up the system

3. **Create sample PDFs** (cell 7) or upload your own

4. **Upload PDFs** to the system (cell 8)

5. **Query documents** using cells 10-14

6. **Use the interactive interface** (cell 14) for easy querying

### Common Operations:

- **List documents**: `controller.get_document_list()`
- **Get stats**: `controller.get_stats()`
- **Query**: `query_rag_system("your question")`
- **Delete document**: `delete_document("filename.pdf")`
- **Clear all**: `clear_all_data()`

### Tips:

- Adjust `top_k` to control number of retrieved chunks
- Use different templates: 'default', 'detailed', 'concise'
- Filter by specific documents for focused queries
- Save state regularly with `controller.rag.save_state()`


## üéâ Congratulations!

You now have a fully functional RAG system. Explore the cells above to:
- Upload your own PDFs
- Query documents with natural language
- Manage your document collection
- Visualize system statistics

**Happy querying! üöÄ**