# 🚀 RAG Lab: Build Your Own Retrieval Augmented Generation System

## What is RAG?
Retrieval Augmented Generation (RAG) is a technique that combines:
- **Retrieval**: Finding relevant information from a knowledge base
- **Generation**: Using an AI model to generate responses based on retrieved context

## What You'll Learn:
1. How to chunk documents for better retrieval
2. Store document embeddings in ChromaDB
3. Retrieve relevant chunks based on user queries
4. Generate contextual responses using OpenAI
5. Build an interactive interface with Gradio

## Architecture:
Upload File → Chunk Text → Create Embeddings → Store in ChromaDB → Query → Retrieve → Generate Response

In [None]:
import gdown
from IPython.display import Image

# URL format to download from Google Drive
file_id = "1v9eDyAFqlUv-7_SRdpVp9iOeNMEmBcxs"
url = f"https://drive.google.com/uc?export=download&id={file_id}"

# Download the image
gdown.download(url, "image.png", quiet=False)

# Display the image
Image("image.png")


# CELL 2: Install Required Dependencies


In [None]:

"""
Installing all necessary packages for our RAG system
"""
! pip install gradio chromadb openai tiktoken python-docx PyPDF2 sentence-transformers


# CELL 3: Import Libraries


- `import gradio as gr`  
  *Builds interactive web-based user interfaces for ML apps.*

- `import chromadb`  
  *Vector database for storing and querying embeddings.*

- `import openai`  
  *Accesses OpenAI models and APIs.*

- `import os`  
  *Interacts with the operating system (files, paths, env vars).*

- `import io`  
  *Handles streams and in-memory files.*

- `import json`  
  *Parses and generates JSON data.*

- `from typing import List, Dict, Tuple`  
  *Provides type hints for lists, dictionaries, and tuples.*

- `import tiktoken`  
  *Tokenizes text, often used with OpenAI models.*

- `from sentence_transformers import SentenceTransformer`  
  *Generates sentence embeddings with pre-trained models.*

- `import PyPDF2`  
  *Reads and manipulates PDF files.*

- `import docx`  
  *Reads and writes Microsoft Word (.docx) files.*

- `import re`  
  *Performs pattern matching and text manipulation with regular expressions.*

- `from datetime import datetime`  
  *Works with dates and times.*


In [2]:
"""
Importing all required libraries
"""
import gradio as gr
import chromadb
import openai
import os
import io
import json
from typing import List, Dict, Tuple
import tiktoken
from sentence_transformers import SentenceTransformer
import PyPDF2
import docx
import re
from datetime import datetime

print("✅ All libraries imported successfully!")

✅ All libraries imported successfully!




## Cell 4 - Configuration Settings for the RAG System

This section defines and explains the main settings used to configure your Retrieval-Augmented Generation (RAG) system.

---

### Configuration Dictionary

- **chunk_size**: The number of characters in each text chunk. This controls how much text is processed at a time.
- **chunk_overlap**: The number of characters that overlap between consecutive chunks to ensure context is maintained.
- **max_chunks_to_retrieve**: The maximum number of relevant text chunks to retrieve for answering each query.
- **embedding_model**: The name of the sentence transformer model used to convert text into embeddings (numerical vector representations).
- **openai_model**: The specific OpenAI model (such as GPT-4o) used for generating responses.
- **collection_name**: The name of the collection where your documents are stored in the vector database.

---

### Embedding Model Initialization

- Loads the specified sentence transformer embedding model (e.g., `all-MiniLM-L6-v2`).
- This model is used to convert text into embeddings, which are essential for semantic search and retrieval in the RAG pipeline.

---

### OpenAI API Key Setup

- Stores your OpenAI API key, which is required to access OpenAI's language models.
- Checks if the API key is set and provides feedback:
  - If the key is present, it confirms successful loading.
  - If the key is missing, it prompts you to set it and provides instructions.

> # **Note:**  
> # Make sure to replace the placeholder with your actual OpenAI API key.  
> # You can also set the key as an environment variable using:  
> # 'OPENAI_API_KEY' = 'your-api-key-here'`


In [20]:
"""
Configuration settings for our RAG system
"""
# Configuration
CONFIG = {
    "chunk_size": 500,
    "chunk_overlap": 50,
    "max_chunks_to_retrieve": 5,
    "embedding_model": "all-MiniLM-L6-v2",
    "openai_model": "gpt-4o",
    "collection_name": "rag_documents"
}

# Initialize the sentence transformer model for embeddings
def load_embedding_model():
    return SentenceTransformer(CONFIG["embedding_model"])

embedding_model = load_embedding_model()

# Note: You'll need to set your OpenAI API key
# Either set it as an environment variable or replace the line below
OPENAI_API_KEY = 'your-api-key-here'
if OPENAI_API_KEY:
    print("✅ OpenAI API key loaded from environment")
else:
    print("⚠️ Please set your OPENAI_API_KEY environment variable")
    print("You can do this by running: os.environ['OPENAI_API_KEY'] = 'your-api-key-here'")


✅ OpenAI API key loaded from environment


## Cell 5 - Functions to Extract Text from Different File Formats

This section describes the logic and purpose of functions used to extract text from various document formats, making your RAG system flexible and file-type agnostic.

---

### PDF Extraction

- Extracts text from PDF files by reading each page and concatenating the text.
- Handles errors gracefully and notifies if extraction fails.

---

### DOCX Extraction

- Extracts text from Microsoft Word (.docx) files by reading each paragraph and joining them.
- Handles errors and prints a message if extraction fails.

---

### TXT Extraction

- Reads plain text files (.txt) and returns their content.
- Handles errors and notifies if extraction fails.

---

### General File Extraction

- Determines file type based on the file extension.
- Calls the appropriate extraction function for PDF, DOCX, or TXT files.
- Returns a helpful message if the file format is unsupported.

---

> **Note:**  
> These functions help your system automatically process and extract text from uploaded files, regardless of whether they're PDFs, Word documents, or plain text files.


In [5]:
"""
Functions to extract text from different file formats
"""
def extract_text_from_pdf(file_path: str) -> str:
    """Extract text from PDF file"""
    text = ""
    try:
        with open(file_path, 'rb') as file:
            pdf_reader = PyPDF2.PdfReader(file)
            for page in pdf_reader.pages:
                text += page.extract_text() + "\n"
    except Exception as e:
        print(f"Error reading PDF: {e}")
    return text

def extract_text_from_docx(file_path: str) -> str:
    """Extract text from DOCX file"""
    text = ""
    try:
        doc = docx.Document(file_path)
        for paragraph in doc.paragraphs:
            text += paragraph.text + "\n"
    except Exception as e:
        print(f"Error reading DOCX: {e}")
    return text

def extract_text_from_txt(file_path: str) -> str:
    """Extract text from TXT file"""
    try:
        with open(file_path, 'r', encoding='utf-8') as file:
            return file.read()
    except Exception as e:
        print(f"Error reading TXT: {e}")
        return ""

def extract_text_from_file(file_path: str) -> str:
    """Extract text based on file extension"""
    file_extension = file_path.lower().split('.')[-1]

    if file_extension == 'pdf':
        return extract_text_from_pdf(file_path)
    elif file_extension == 'docx':
        return extract_text_from_docx(file_path)
    elif file_extension == 'txt':
        return extract_text_from_txt(file_path)
    else:
        return "Unsupported file format. Please upload PDF, DOCX, or TXT files."

## Cell 6 - Functions to Split Text into Chunks for Better Retrieval

This section explains the logic and purpose of functions that prepare long texts for efficient retrieval by splitting them into manageable, overlapping chunks.

---

### Text Cleaning

- **Purpose:**  
  Cleans and normalizes the input text by removing extra whitespace and trimming leading/trailing spaces.
- **Benefit:**  
  Ensures that the text is tidy and consistent before further processing.

---

### Splitting Text into Overlapping Chunks

- **Purpose:**  
  Splits the cleaned text into overlapping chunks based on a specified chunk size and overlap.
- **How it works:**  
  - The text is split into words.
  - Chunks are created by moving a sliding window across the words, with each chunk overlapping the previous one by a set number of words.
  - Each chunk is stored with metadata: the chunk text, a unique chunk ID, word count, and character count.
- **Benefit:**  
  Overlapping chunks help maintain context between segments, which improves retrieval accuracy in downstream tasks.

---

### Chunk Statistics

- **Purpose:**  
  Calculates and returns useful statistics about the generated chunks, such as:
  - Total number of chunks
  - Average words and characters per chunk
  - Minimum and maximum words per chunk

---

> **Note:**  
> Chunking large texts into smaller, overlapping segments is a best practice for retrieval-augmented systems, as it improves both search relevance and model performance.


In [6]:
"""
Functions to split text into chunks for better retrieval
"""
def clean_text(text: str) -> str:
    """Clean and normalize text"""
    # Remove extra whitespace and normalize
    text = re.sub(r'\s+', ' ', text)
    text = text.strip()
    return text

def split_text_into_chunks(text: str, chunk_size: int = 500, overlap: int = 50) -> List[Dict]:
    """Split text into overlapping chunks"""
    text = clean_text(text)
    words = text.split()
    chunks = []

    for i in range(0, len(words), chunk_size - overlap):
        chunk_words = words[i:i + chunk_size]
        chunk_text = ' '.join(chunk_words)

        chunk_info = {
            "text": chunk_text,
            "chunk_id": f"chunk_{len(chunks)}",
            "word_count": len(chunk_words),
            "char_count": len(chunk_text)
        }
        chunks.append(chunk_info)

        # Break if we've processed all words
        if i + chunk_size >= len(words):
            break

    return chunks

def get_chunk_statistics(chunks: List[Dict]) -> Dict:
    """Get statistics about the chunks"""
    if not chunks:
        return {}

    word_counts = [chunk["word_count"] for chunk in chunks]
    char_counts = [chunk["char_count"] for chunk in chunks]

    return {
        "total_chunks": len(chunks),
        "avg_words_per_chunk": sum(word_counts) / len(word_counts),
        "avg_chars_per_chunk": sum(char_counts) / len(char_counts),
        "min_words": min(word_counts),
        "max_words": max(word_counts)
    }

## Cell 7 - ChromaDB Vector Database Setup

This section outlines the class and methods used to manage your vector database (ChromaDB) for storing and retrieving document embeddings in a RAG system.

---

### RAGVectorStore Class

#### Initialization

- **Purpose:**  
  Sets up a ChromaDB client and initializes a collection for storing document chunks and their embeddings.
- **Features:**  
  - Deletes any existing collection with the same name for a fresh start.
  - Creates a new collection with metadata describing its purpose.
  - Prints a success or error message based on the outcome.

#### Adding Chunks

- **Purpose:**  
  Adds processed text chunks and their corresponding embeddings to the vector store.
- **Features:**  
  - Prepares data (IDs, documents, metadata) for insertion.
  - Stores each chunk with metadata such as word count, character count, and timestamp.
  - Prints how many chunks were added or an error message if the process fails.

#### Searching for Similar Chunks

- **Purpose:**  
  Searches the vector store for chunks most similar to a given query embedding.
- **Features:**  
  - Returns the top N most relevant chunks, along with their metadata and similarity distances.
  - Handles errors gracefully and returns empty results if the search fails.

#### Collection Information

- **Purpose:**  
  Provides information about the current collection.
- **Features:**  
  - Returns the collection's name, the number of stored documents, and its status (active or empty).
  - Handles errors and reports the status accordingly.

---

### Usage

- The `RAGVectorStore` class is initialized with the configured collection name, setting up your vector database for immediate use in retrieval-augmented workflows.


In [7]:

"""
ChromaDB vector database setup and management
"""
class RAGVectorStore:
    def __init__(self, collection_name: str = "rag_documents"):
        """Initialize ChromaDB client and collection"""
        try:
            self.client = chromadb.Client()
            self.collection_name = collection_name

            # Delete existing collection if it exists (for fresh start)
            try:
                self.client.delete_collection(collection_name)
            except:
                pass

            # Create new collection
            self.collection = self.client.create_collection(
                name=collection_name,
                metadata={"description": "RAG document chunks"}
            )
            print(f"✅ ChromaDB collection '{collection_name}' created successfully")

        except Exception as e:
            print(f"❌ Error initializing ChromaDB: {e}")
            self.collection = None

    def add_chunks(self, chunks: List[Dict], embeddings: List[List[float]]) -> bool:
        """Add chunks and their embeddings to the vector store"""
        if not self.collection:
            return False

        try:
            # Prepare data for ChromaDB
            ids = [chunk["chunk_id"] for chunk in chunks]
            documents = [chunk["text"] for chunk in chunks]
            metadatas = [
                {
                    "word_count": chunk["word_count"],
                    "char_count": chunk["char_count"],
                    "timestamp": datetime.now().isoformat()
                }
                for chunk in chunks
            ]

            # Add to collection
            self.collection.add(
                embeddings=embeddings,
                documents=documents,
                metadatas=metadatas,
                ids=ids
            )

            print(f"✅ Added {len(chunks)} chunks to vector store")
            return True

        except Exception as e:
            print(f"❌ Error adding chunks to vector store: {e}")
            return False

    def search(self, query: str, query_embedding: List[float], n_results: int = 5) -> Dict:
        """Search for similar chunks"""
        if not self.collection:
            return {"chunks": [], "distances": []}

        try:
            results = self.collection.query(
                query_embeddings=[query_embedding],
                n_results=n_results
            )

            # Format results
            chunks = []
            for i, doc in enumerate(results['documents'][0]):
                chunk_info = {
                    "text": doc,
                    "metadata": results['metadatas'][0][i],
                    "distance": results['distances'][0][i],
                    "id": results['ids'][0][i]
                }
                chunks.append(chunk_info)

            return {
                "chunks": chunks,
                "distances": results['distances'][0]
            }

        except Exception as e:
            print(f"❌ Error searching vector store: {e}")
            return {"chunks": [], "distances": []}

    def get_collection_info(self) -> Dict:
        """Get information about the collection"""
        if not self.collection:
            return {}

        try:
            count = self.collection.count()
            return {
                "name": self.collection_name,
                "document_count": count,
                "status": "active" if count > 0 else "empty"
            }
        except:
            return {"status": "error"}

# Initialize vector store
vector_store = RAGVectorStore(CONFIG["collection_name"])

✅ ChromaDB collection 'rag_documents' created successfully


## Cell 8 - Functions to Generate Embeddings for Text Chunks

This section explains the functions responsible for converting text or text chunks into numerical vector representations (embeddings), which are essential for semantic search and retrieval in a RAG system.

---

### Batch Embedding Generation

- **Purpose:**  
  Generates embeddings for a list of text chunks at once.
- **How it works:**  
  - Uses the preloaded sentence transformer model to encode each text chunk.
  - Returns a list of embeddings, where each embedding is a list of floating-point numbers.
- **Benefit:**  
  Efficiently processes multiple chunks in a single call, speeding up large-scale document processing.

---

### Single Embedding Generation

- **Purpose:**  
  Generates an embedding for a single text input.
- **How it works:**  
  - Encodes the input text using the same sentence transformer model.
  - Returns the embedding as a list of floating-point numbers.
- **Benefit:**  
  Useful for generating an embedding for a user query or a single document segment.

---

> **Note:**  
> Embeddings capture the semantic meaning of text, allowing for effective similarity search and retrieval in your vector database.


In [8]:

"""
Functions to generate embeddings for text chunks
"""
def generate_embeddings(texts: List[str]) -> List[List[float]]:
    """Generate embeddings for a list of texts"""
    try:
        embeddings = embedding_model.encode(texts, convert_to_tensor=False)
        return embeddings.tolist()
    except Exception as e:
        print(f"❌ Error generating embeddings: {e}")
        return []

def generate_single_embedding(text: str) -> List[float]:
    """Generate embedding for a single text"""
    try:
        embedding = embedding_model.encode([text], convert_to_tensor=False)
        return embedding[0].tolist()
    except Exception as e:
        print(f"❌ Error generating single embedding: {e}")
        return []


## Cell 9 - OpenAI Integration for Response

This section describes how the system integrates with OpenAI's Chat Completions API to generate answers based on retrieved context chunks.

---

### Purpose

- Generates a comprehensive, context-aware response to a user's query by leveraging the power of OpenAI's advanced language models (such as GPT-4o).

---

### How It Works

- **API Key Check:**  
  Ensures that the OpenAI API key is set before proceeding. If not, it returns an error message prompting the user to configure the key.
- **Context Preparation:**  
  Combines the retrieved context chunks into a formatted string, clearly labeling each chunk for reference.
- **Prompt Construction:**  
  - **System Prompt:** Instructs the model to act as a helpful assistant, use the provided context, and cite which chunks were used in the answer.
  - **User Prompt:** Presents the combined context and the user's question, asking for a comprehensive answer.
- **OpenAI Client Initialization:**  
  Initializes the OpenAI client with the provided API key.
- **API Call:**  
  Sends the constructed prompt to the Chat Completions API, specifying the model, message structure, maximum tokens, and temperature for response creativity.
- **Response Extraction:**  
  Extracts and returns the generated answer from the API's response.

---

> **Note:**  
> Make sure your OpenAI API key is set correctly to enable response generation. This integration allows your RAG system to deliver accurate, context-based answers by combining retrieval and generation capabilities.


In [17]:
"""
OpenAI integration for response generation using Chat Completions API
"""
from openai import OpenAI

def generate_response(query: str, context_chunks: List[Dict], model: str = "gpt-4o") -> str:
    """Generate response using OpenAI Chat Completions API with retrieved context."""

    if not OPENAI_API_KEY:
        return "❌ OpenAI API key not set. Please configure your API key."

    try:
        # Prepare context from retrieved chunks
        context = "\n\n".join([f"Chunk {i+1}:\n{chunk['text']}" for i, chunk in enumerate(context_chunks)])

        # Create the prompt
        system_prompt = (
            "You are a helpful AI assistant that answers questions based on the provided context. "
            "Use the context information to provide accurate and relevant answers. "
            "If the context doesn't contain enough information to answer the question, say so clearly. "
            "Always cite which chunks you used in your response."
        )

        user_prompt = f"""Context:
{context}

Question: {query}

Please provide a comprehensive answer based on the context above."""

        # Initialize OpenAI client
        client = OpenAI(api_key=OPENAI_API_KEY)

        # Call the Chat Completions API (current standard API)
        response = client.chat.completions.create(
            model=model,  # e.g., "gpt-4o", "gpt-4", "gpt-3.5-turbo"
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_prompt}
            ],
            max_tokens=1000,
            temperature=0.7
        )

        # Extract and return the model's answer
        return response.choices[0].message.content

    except Exception as e:
        return f"❌ Error generating response: {str(e)}"

## Cell 10 - Main RAG Pipeline

This section describes the core pipeline class that ties together all components of the Retrieval-Augmented Generation (RAG) system, handling document processing, storage, retrieval, and response generation.

---

### RAGPipeline Class

#### Initialization

- **Purpose:**  
  Sets up the pipeline with access to the vector store and initializes containers for document chunks and statistics.

---

#### Document Processing

- **Purpose:**  
  Handles the entire workflow for uploading and processing a document.
- **Steps:**
  1. **Extracts text** from the uploaded file using the appropriate extraction function.
  2. **Splits the text into chunks** using the configured chunk size and overlap.
  3. **Generates embeddings** for each chunk.
  4. **Stores the chunks and embeddings** in the ChromaDB vector database.
  5. **Calculates and stores chunk statistics** for reporting and optimization.
- **Feedback:**  
  Returns a success status, a detailed status message with statistics, and the chunk stats. If any step fails, it returns an error message.

---

#### Query Processing

- **Purpose:**  
  Handles user queries by retrieving relevant information from the stored document chunks and generating a context-aware response.
- **Steps:**
  1. **Generates an embedding** for the user's question.
  2. **Searches the vector store** for the most relevant chunks using the query embedding.
  3. **Generates a response** using the OpenAI model, providing the retrieved chunks as context.
  4. **Formats the retrieved chunks** for display, including similarity scores and brief previews.
  5. **Creates a summary** of the retrieved chunks for transparency.
- **Feedback:**  
  Returns the generated answer, a list of retrieved chunk details, and a formatted summary. Handles errors gracefully and provides clear messages if any step fails.

---

### Usage

- The `RAGPipeline` class is instantiated and ready for use, enabling seamless document ingestion and intelligent, context-driven question answering.

---

> **Note:**  
> This pipeline is the heart of your RAG system, ensuring smooth coordination between extraction, chunking, embedding, storage, retrieval, and generation for robust, explainable AI responses.


In [18]:
"""
Main RAG pipeline that orchestrates the entire process
"""
class RAGPipeline:
    def __init__(self):
        self.vector_store = vector_store
        self.chunks = []
        self.chunk_stats = {}

    def process_document(self, file_path: str) -> Tuple[bool, str, Dict]:
        """Process uploaded document and store in vector database"""
        try:
            # Extract text
            text = extract_text_from_file(file_path)
            if not text or text.startswith("Unsupported"):
                return False, text, {}

            # Create chunks
            self.chunks = split_text_into_chunks(
                text,
                CONFIG["chunk_size"],
                CONFIG["chunk_overlap"]
            )

            if not self.chunks:
                return False, "No chunks created from document", {}

            # Generate embeddings
            chunk_texts = [chunk["text"] for chunk in self.chunks]
            embeddings = generate_embeddings(chunk_texts)

            if not embeddings:
                return False, "Failed to generate embeddings", {}

            # Store in vector database
            success = self.vector_store.add_chunks(self.chunks, embeddings)

            if not success:
                return False, "Failed to store chunks in vector database", {}

            # Get statistics
            self.chunk_stats = get_chunk_statistics(self.chunks)

            status_message = f"""✅ Document processed successfully!

📊 **Processing Statistics:**
- Total chunks created: {self.chunk_stats['total_chunks']}
- Average words per chunk: {self.chunk_stats['avg_words_per_chunk']:.1f}
- Average characters per chunk: {self.chunk_stats['avg_chars_per_chunk']:.1f}
- Word count range: {self.chunk_stats['min_words']} - {self.chunk_stats['max_words']} words

🗃️ **Database Status:**
- Chunks stored in ChromaDB: {len(self.chunks)}
- Embeddings generated: ✅
- Ready for queries: ✅"""

            return True, status_message, self.chunk_stats

        except Exception as e:
            return False, f"❌ Error processing document: {str(e)}", {}

    def query(self, question: str) -> Tuple[str, List[Dict], str]:
        """Process query and return response with retrieved chunks"""
        if not self.chunks:
            return "❌ No document loaded. Please upload a document first.", [], ""

        try:
            # Generate query embedding
            query_embedding = generate_single_embedding(question)
            if not query_embedding:
                return "❌ Failed to generate query embedding.", [], ""

            # Search for relevant chunks
            search_results = self.vector_store.search(
                question,
                query_embedding,
                CONFIG["max_chunks_to_retrieve"]
            )

            retrieved_chunks = search_results["chunks"]
            if not retrieved_chunks:
                return "❌ No relevant chunks found.", [], ""

            # Generate response
            response = generate_response(question, retrieved_chunks, CONFIG["openai_model"])

            # Format retrieved chunks for display
            chunks_display = []
            for i, chunk in enumerate(retrieved_chunks):
                chunks_display.append({
                    "chunk_number": i + 1,
                    "text": chunk["text"][:200] + "..." if len(chunk["text"]) > 200 else chunk["text"],
                    "full_text": chunk["text"],
                    "similarity_score": f"{1 - chunk['distance']:.3f}",
                    "word_count": chunk["metadata"].get("word_count", "N/A")
                })

            # Create chunks summary
            chunks_summary = f"""🔍 **Retrieved {len(retrieved_chunks)} relevant chunks:**

""" + "\n".join([
    f"**Chunk {chunk['chunk_number']}** (Similarity: {chunk['similarity_score']}):\n{chunk['text']}\n"
    for chunk in chunks_display
])

            return response, chunks_display, chunks_summary

        except Exception as e:
            return f"❌ Error processing query: {str(e)}", [], ""

# Initialize RAG pipeline
rag_pipeline = RAGPipeline()

## Cell 11 - Gradio Interface Functions

This section explains the functions that power the user interface for your RAG system using Gradio, enabling file uploads, question answering, and system status checks.

---

### File Upload and Processing

- **Purpose:**  
  Handles the uploading and processing of documents through the Gradio interface.
- **How it works:**  
  - Checks if a file is uploaded.
  - Processes the document using the RAG pipeline.
  - Formats and displays document processing statistics and a preview of the first few chunks.
  - Provides feedback if an error occurs or if no file is uploaded.

---

### Query Processing

- **Purpose:**  
  Handles user-submitted questions.
- **How it works:**  
  - Checks if the question is non-empty.
  - Passes the question to the RAG pipeline for retrieval and response generation.
  - Displays the answer, a summary of the retrieved chunks, and a timestamp for when the query was processed.

---

### System Status

- **Purpose:**  
  Provides a real-time overview of the system's configuration and readiness.
- **How it works:**  
  - Retrieves information about the vector database, such as status and document count.
  - Displays configuration details, including the embedding model, OpenAI model, chunk size, chunk overlap, and retrieval settings.
  - Indicates whether the system is ready, based on the presence of a valid OpenAI API key.

---

> **Note:**  
> These functions enable a smooth and interactive user experience, allowing users to upload documents, ask questions, and monitor the system's health directly from the Gradio web interface.


In [11]:

"""
Gradio interface functions
"""
def upload_and_process_file(file):
    """Handle file upload and processing"""
    if file is None:
        return "❌ Please upload a file first.", "{}", ""

    try:
        success, message, stats = rag_pipeline.process_document(file.name)

        # Format stats for display
        stats_json = json.dumps(stats, indent=2) if stats else "{}"

        # Create chunks preview
        chunks_preview = ""
        if success and rag_pipeline.chunks:
            chunks_preview = "📄 **Document Chunks Preview:**\n\n"
            for i, chunk in enumerate(rag_pipeline.chunks[:3]):  # Show first 3 chunks
                preview_text = chunk["text"][:150] + "..." if len(chunk["text"]) > 150 else chunk["text"]
                chunks_preview += f"**Chunk {i+1}:**\n{preview_text}\n\n"

            if len(rag_pipeline.chunks) > 3:
                chunks_preview += f"... and {len(rag_pipeline.chunks) - 3} more chunks"

        return message, stats_json, chunks_preview

    except Exception as e:
        return f"❌ Error: {str(e)}", "{}", ""

def process_query(question):
    """Handle user queries"""
    if not question.strip():
        return "❌ Please enter a question.", "", ""

    response, chunks, chunks_summary = rag_pipeline.query(question.strip())

    return response, chunks_summary, f"Query processed at {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}"

def get_system_status():
    """Get current system status"""
    db_info = vector_store.get_collection_info()

    status = f"""🖥️ **System Status:**

📊 **Database:** {db_info.get('status', 'unknown').title()}
📁 **Documents in DB:** {db_info.get('document_count', 0)}
🧠 **Embedding Model:** {CONFIG['embedding_model']}
🤖 **OpenAI Model:** {CONFIG['openai_model']}
📏 **Chunk Size:** {CONFIG['chunk_size']} words
🔄 **Chunk Overlap:** {CONFIG['chunk_overlap']} words
🎯 **Max Retrieval:** {CONFIG['max_chunks_to_retrieve']} chunks

⚙️ **Configuration Ready:** {'✅' if OPENAI_API_KEY else '❌ (API Key Required)'}"""

    return status

## Cell 12 - Creating the Gradio Interface

This section describes the layout and functionality of the Gradio-based web interface for your RAG system, enabling users to upload documents, process them, ask questions, and view results in an interactive, user-friendly way.


In [13]:
"""
Create the Gradio interface
"""
def create_gradio_interface():
    """Create and configure the Gradio interface"""

    with gr.Blocks(
        title="🚀 RAG Lab - Retrieval Augmented Generation",
        theme=gr.themes.Soft(),
        css="""
        .gradio-container {
            max-width: 1200px !important;
        }
        .panel {
            border: 1px solid #ddd;
            border-radius: 8px;
            padding: 15px;
            margin: 10px 0;
        }
        """
    ) as app:

        # Header
        gr.Markdown("""
        # 🚀 RAG Lab - Retrieval Augmented Generation System

        **Learn and experiment with RAG technology!** Upload documents, explore chunking strategies, and see how retrieval-augmented generation works in real-time.

        ---
        """)

        # System Status
        with gr.Row():
            with gr.Column():
                status_display = gr.Markdown(get_system_status())
                gr.Button("🔄 Refresh Status").click(
                    fn=get_system_status,
                    outputs=status_display
                )

        gr.Markdown("---")

        # Main Interface
        with gr.Row():
            # Left Column - Document Upload and Processing
            with gr.Column(scale=1):
                gr.Markdown("## 📤 **Step 1: Upload Document**")

                file_upload = gr.File(
                    label="Upload Document (PDF, DOCX, TXT)",
                    file_types=[".pdf", ".docx", ".txt"]
                )

                process_btn = gr.Button("🔄 Process Document", variant="primary")

                gr.Markdown("## 📊 **Processing Results**")

                processing_status = gr.Markdown("Upload a document to see processing results...")

                with gr.Accordion("📈 Chunk Statistics", open=False):
                    chunk_stats = gr.Code(
                        label="Statistics (JSON)",
                        language="json",
                        value="{}"
                    )

                with gr.Accordion("📄 Chunks Preview", open=False):
                    chunks_preview = gr.Markdown("Process a document to see chunks preview...")

            # Right Column - Query Interface
            with gr.Column(scale=1):
                gr.Markdown("## 🤔 **Step 2: Ask Questions**")

                query_input = gr.Textbox(
                    label="Enter your question",
                    placeholder="What would you like to know about the document?",
                    lines=2
                )

                query_btn = gr.Button("🔍 Search & Generate Answer", variant="secondary")

                gr.Markdown("## 💬 **AI Response**")

                response_output = gr.Markdown("Ask a question to get an AI-generated response...")

                gr.Markdown("## 🔍 **Retrieved Context**")

                with gr.Accordion("📋 Retrieved Chunks", open=True):
                    retrieved_chunks = gr.Markdown("Retrieved chunks will appear here...")

                query_timestamp = gr.Markdown("")

        # Instructions
        gr.Markdown("""
        ---
        ## 📚 **How to Use This RAG Lab:**

        1. **Upload a Document**: Choose a PDF, DOCX, or TXT file containing the information you want to query
        2. **Process the Document**: Click "Process Document" to chunk the text and create embeddings
        3. **Review the Results**: Check the statistics and preview the chunks created
        4. **Ask Questions**: Enter questions about the document content
        5. **Explore Results**: See both the AI response and the retrieved chunks that informed the answer

        ## 🔧 **What's Happening Behind the Scenes:**

        - **Text Chunking**: Documents are split into overlapping chunks for better retrieval
        - **Embeddings**: Each chunk is converted to a vector representation using SentenceTransformers
        - **Vector Storage**: Chunks and embeddings are stored in ChromaDB for fast similarity search
        - **Retrieval**: User queries are embedded and matched against stored chunks
        - **Generation**: OpenAI generates responses using retrieved chunks as context

        ## ⚙️ **Current Configuration:**
        - Chunk Size: {chunk_size} words
        - Overlap: {chunk_overlap} words
        - Max Retrieved: {max_retrieve} chunks
        - Embedding Model: {embed_model}
        - Generation Model: {gen_model}
        """.format(
            chunk_size=CONFIG["chunk_size"],
            chunk_overlap=CONFIG["chunk_overlap"],
            max_retrieve=CONFIG["max_chunks_to_retrieve"],
            embed_model=CONFIG["embedding_model"],
            gen_model=CONFIG["openai_model"]
        ))

        # Event Handlers
        process_btn.click(
            fn=upload_and_process_file,
            inputs=[file_upload],
            outputs=[processing_status, chunk_stats, chunks_preview]
        )

        query_btn.click(
            fn=process_query,
            inputs=[query_input],
            outputs=[response_output, retrieved_chunks, query_timestamp]
        )

        # Allow Enter key to submit query
        query_input.submit(
            fn=process_query,
            inputs=[query_input],
            outputs=[response_output, retrieved_chunks, query_timestamp]
        )

    return app


# ==========================================
# CELL 13: Launch the Application
# ==========================================

# Test Document
Download the test document from: [Test Document Link](https://drive.google.com/file/d/1izzVVeydCl3UAqZvIL8Vkx1DYM77ZjSp/view?usp=sharing)

> **Note:** This is the same document that was used in RAG 2.1 lab where it failed to process correctly. This implementation successfully handles the document processing and retrieval that previously failed in Lab 2.1.

In [14]:

"""
Launch the Gradio application
"""
if __name__ == "__main__":
    # Create and launch the interface
    app = create_gradio_interface()

    print("""
    🚀 Starting RAG Lab...

    📋 **Pre-launch Checklist:**
    ✅ ChromaDB initialized
    ✅ Embedding model loaded
    ✅ Gradio interface created
    {} OpenAI API configured

    📡 **Launching Application...**
    """.format("✅" if OPENAI_API_KEY else "⚠️"))

    # Launch with public sharing enabled for Colab
    app.launch(
        share=True,  # Creates public link for Colab
        server_name="0.0.0.0",  # Allow external connections
        server_port=7860,  # Default Gradio port
        show_error=True,  # Show detailed errors
        quiet=False  # Show startup messages
    )


    🚀 Starting RAG Lab...
    
    📋 **Pre-launch Checklist:**
    ✅ ChromaDB initialized
    ✅ Embedding model loaded
    ✅ Gradio interface created
    ✅ OpenAI API configured
    
    📡 **Launching Application...**
    
Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://d9d76c7a7cba7609f4.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)
