# RAG (Retrieval-Augmented Generation) Step-by-Step Tutorial

This notebook provides a comprehensive, step-by-step explanation of how Retrieval-Augmented Generation works in our All-About-RAG system.

## What is RAG?

RAG combines the power of large language models with external knowledge retrieval. Instead of relying only on the model's training data, RAG:

1. **Retrieves** relevant information from a knowledge base
2. **Augments** the query with this context
3. **Generates** accurate, grounded responses

## Pipeline Overview

Our RAG system follows this pipeline:
1. **Document Loading** - Load documents from various formats
2. **Text Chunking** - Split documents into manageable pieces
3. **Embedding Generation** - Convert text to numerical vectors
4. **Vector Storage** - Store vectors in a searchable database
5. **Query Processing** - Handle user questions with retrieval + generation

## Step 1: Setup and Environment

First, let's set up our environment and import the necessary components.

In [3]:
# Import all RAG components
import sys
import os
sys.path.append(os.path.abspath('..'))  # Add project root to path

from src.rag.data_loader import load_all_documents
from src.rag.chunking import ChunkingPipeline
from src.rag.embedding import EmbeddingPipeline
from src.rag.vectorstore import FaissVectorStore
from src.rag.search import RAGSearch

# Additional imports for demonstration
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

print("✅ All RAG components imported successfully!")
print("📚 Ready to explore the RAG pipeline")

✅ All RAG components imported successfully!
📚 Ready to explore the RAG pipeline


## Step 2: Document Loading

The first step is loading documents from various formats (PDF, TXT, CSV, Excel, Word, JSON).

**What happens here:**
- Recursively scan the data directory
- Use specialized loaders for each file type
- Convert files into LangChain Document objects
- Handle errors gracefully for corrupted files

In [11]:
# Load documents from the data directory
print("🔍 Loading documents from data/ directory...")

docs = load_all_documents("../data")  # Use relative path from notebooks folder

print(f"✅ Loaded {len(docs)} document objects")
print(f"📄 Document types found: {[type(doc).__name__ for doc in docs[:3]]}")

# Show a sample document
if docs:
    print(f"\n📖 Sample document content (first 200 chars):")
    print(f"'{docs[0].page_content[:200]}...'")
    print(f"📊 Metadata: {docs[0].metadata}")
else:
    print("⚠️ No documents found. Please ensure you have files in the data/ directory.")

🔍 Loading documents from data/ directory...
[DEBUG] Data path: D:\GITHUB\All-About-RAG\data
[DEBUG] Supported formats: ['.pdf', '.txt', '.csv', '.xlsx', '.docx', '.json']
[DEBUG] Found 2 PDF files: ['D:\\GITHUB\\All-About-RAG\\data\\Neeraj_Tiwari_CV_Oct25 (1).pdf', 'D:\\GITHUB\\All-About-RAG\\data\\The Ultimate Python Cheat Sheet.pdf']
[DEBUG] Loading PDF: D:\GITHUB\All-About-RAG\data\Neeraj_Tiwari_CV_Oct25 (1).pdf
[DEBUG] Loaded 2 PDF docs from D:\GITHUB\All-About-RAG\data\Neeraj_Tiwari_CV_Oct25 (1).pdf
[DEBUG] Loading PDF: D:\GITHUB\All-About-RAG\data\The Ultimate Python Cheat Sheet.pdf
[DEBUG] Loaded 1 PDF docs from D:\GITHUB\All-About-RAG\data\The Ultimate Python Cheat Sheet.pdf
[DEBUG] Found 0 TXT files: []
[DEBUG] Found 0 CSV files: []
[DEBUG] Found 0 Excel files: []
[DEBUG] Found 0 Word files: []
[DEBUG] Found 0 JSON files: []
[DEBUG] Total loaded documents: 3
✅ Loaded 3 document objects
📄 Document types found: ['Document', 'Document', 'Document']

📖 Sample document content (fir

## Step 3: Text Chunking

Large documents need to be split into smaller chunks for effective processing.

**Why chunking?**
- LLMs have token limits
- Better retrieval precision
- Maintains context with overlap

**Our approach:**
- Chunk size: 1000 characters
- Overlap: 200 characters
- Recursive splitting on natural boundaries

In [12]:
# Initialize the chunking pipeline
print("✂️ Initializing text chunking pipeline...")

chunk_pipeline = ChunkingPipeline(
    chunk_size=1000,      # Maximum characters per chunk
    chunk_overlap=200     # Overlap between chunks for context
)

print("✅ Chunking pipeline ready!")
print(f"📏 Chunk size: {chunk_pipeline.chunk_size} characters")
print(f"🔄 Overlap: {chunk_pipeline.chunk_overlap} characters")

✂️ Initializing text chunking pipeline...
[INFO] Chunking config - Size: 1000, Overlap: 200
✅ Chunking pipeline ready!
📏 Chunk size: 1000 characters
🔄 Overlap: 200 characters


In [13]:
# Split documents into chunks
print("🔄 Splitting documents into chunks...")

chunks = chunk_pipeline.chunk_documents(docs)

print(f"✅ Created {len(chunks)} chunks from {len(docs)} documents")

# Analyze chunk statistics
if chunks:
    chunk_lengths = [len(chunk.page_content) for chunk in chunks]
    avg_length = sum(chunk_lengths) / len(chunk_lengths)

    print(f"📊 Average chunk length: {avg_length:.1f} characters")
    print(f"📈 Min/Max length: {min(chunk_lengths)} / {max(chunk_lengths)} characters")

    # Show a sample chunk
    print(f"\n📄 Sample chunk (first 150 chars):")
    print(f"'{chunks[0].page_content[:150]}...'")
else:
    print("⚠️ No chunks created. Please ensure documents were loaded successfully.")

🔄 Splitting documents into chunks...
[INFO] Split 3 documents into 22 chunks.
✅ Created 22 chunks from 3 documents
📊 Average chunk length: 909.6 characters
📈 Min/Max length: 301 / 996 characters

📄 Sample chunk (first 150 chars):
'Neeraj Tiwari 
             Jubilee Green, Papworth Everard, Cambridge-CB233RZ| neeraztiwari@gmail.com | +44 7881387635 
 
Accomplished AI and ML Spec...'


## Step 4: Embedding Generation

Convert text chunks into numerical vectors that capture semantic meaning.

**How embeddings work:**
- Use SentenceTransformer model (all-MiniLM-L6-v2)
- 384-dimensional vectors
- Capture semantic similarity
- Enable mathematical comparison of text

**Batch processing:**
- Process multiple chunks efficiently
- GPU acceleration when available
- Progress tracking for large datasets

In [14]:
# Initialize the embedding pipeline
print("🧠 Initializing embedding pipeline...")

embed_pipeline = EmbeddingPipeline(
    model_name="all-MiniLM-L6-v2"  # Fast, efficient model
)

print("✅ Embedding model loaded!")
print(f"🤖 Model: {embed_pipeline.model.get_sentence_embedding_dimension()}D vectors")
print(f"⚡ Device: {embed_pipeline.model.device}")

🧠 Initializing embedding pipeline...
[INFO] Auto-detected device: cpu
[INFO] Loaded embedding model: all-MiniLM-L6-v2
[INFO] Chunking config - Size: 1000, Overlap: 200
✅ Embedding model loaded!
🤖 Model: 384D vectors
⚡ Device: cpu


In [15]:
# Generate embeddings for all chunks
print("🔢 Generating embeddings for chunks...")
print("(This may take a moment for large document collections)")

embeddings = embed_pipeline.embed_chunks(chunks)

print("✅ Embeddings generated!")
print(f"📊 Embedding matrix shape: {embeddings.shape}")
print(f"🔢 Total vectors: {embeddings.shape[0]}")
print(f"📏 Vector dimensions: {embeddings.shape[1]}")

# Show a sample embedding
if len(embeddings) > 0:
    print(f"\n🧮 Sample embedding (first 5 dimensions): {embeddings[0][:5]}")

🔢 Generating embeddings for chunks...
(This may take a moment for large document collections)
[INFO] Generating embeddings for 22 chunks...


Batches: 100%|██████████| 1/1 [00:03<00:00,  3.50s/it]

[INFO] Embeddings shape: (22, 384)
✅ Embeddings generated!
📊 Embedding matrix shape: (22, 384)
🔢 Total vectors: 22
📏 Vector dimensions: 384

🧮 Sample embedding (first 5 dimensions): [-0.05511471 -0.08162459 -0.01212582 -0.05076062  0.03955764]





## Step 5: Vector Storage

Store embeddings in a vector database for fast similarity search.

**FAISS Vector Store:**
- Facebook AI Similarity Search library
- Optimized for L2 (Euclidean) distance
- Fast retrieval of similar vectors
- Persistent storage with metadata

**Why FAISS?**
- CPU-based (no GPU required)
- Scales to millions of vectors
- Industry standard for vector search

In [16]:
# Initialize vector store
print("💾 Initializing FAISS vector store...")

vector_store = FaissVectorStore(
    persist_dir="faiss_store",
    embedding_model="all-MiniLM-L6-v2"
)

print("✅ Vector store ready!")
print(f"📁 Storage directory: {vector_store.persist_dir}")

💾 Initializing FAISS vector store...
[INFO] Auto-detected device: cpu
[INFO] Loaded embedding model: all-MiniLM-L6-v2
[INFO] Vector store directory: faiss_store
✅ Vector store ready!
📁 Storage directory: faiss_store


In [17]:
# Add embeddings to the vector store
print("💾 Adding embeddings to vector store...")

# Prepare metadata for each chunk
metadata = [{"text": chunk.page_content, "source": chunk.metadata.get("source", "unknown")} 
           for chunk in chunks]

vector_store.add_embeddings(embeddings, metadata)
vector_store.save()

print("✅ Vectors stored and saved!")
print(f"🗄️ Total vectors in store: {vector_store.index.ntotal}")
print(f"💾 Index saved to: {vector_store.persist_dir}/")

💾 Adding embeddings to vector store...
[INFO] Added 22 vectors to Faiss index.
[INFO] Saved Faiss index and metadata to faiss_store
✅ Vectors stored and saved!
🗄️ Total vectors in store: 22
💾 Index saved to: faiss_store/


## Step 6: Query Processing

Now let's see how the system processes a user query end-to-end.

**Query Processing Steps:**
1. **Embed the query** - Convert question to vector
2. **Similarity search** - Find most relevant chunks
3. **Context assembly** - Combine retrieved chunks
4. **Response generation** - Use LLM with context

In [22]:
# Example query
query = "How many years of experience does the candidate have?"

print("❓ Processing query:")
print(f"'{query}'")
print()

# Step 6a: Embed the query
print("🔍 Step 1: Embedding the query")
query_embedding = embed_pipeline.model.encode([query])
print(f"✅ Query embedded (shape: {query_embedding.shape})")

❓ Processing query:
'How many years of experience does the candidate have?'

🔍 Step 1: Embedding the query
✅ Query embedded (shape: (1, 384))


In [23]:
# Step 6b: Similarity search
print("🔍 Step 2: Finding similar documents")

top_k = 3  # Number of results to retrieve
results = vector_store.search(query_embedding, top_k=top_k)

print(f"✅ Found {len(results)} most relevant chunks")

# Display results
for i, result in enumerate(results, 1):
    score = result.get('score', 'N/A')
    text_preview = result['metadata']['text'][:100] + "..."
    print(f"\n📄 Result {i} (relevance: {score}):")
    print(f"'{text_preview}'")

🔍 Step 2: Finding similar documents
✅ Found 3 most relevant chunks

📄 Result 1 (relevance: N/A):
'HPC Server, Camera, Lidar, Thermal Camera, Accelerometer, Gyroscope, FTP Camera, FTP Server. 
• Spec...'

📄 Result 2 (relevance: N/A):
'• Developed state-of-the-art DL algorithms in python, focusing on enhancing visual accuracy and deta...'

📄 Result 3 (relevance: N/A):
'Neeraj Tiwari 
             Jubilee Green, Papworth Everard, Cambridge-CB233RZ| neeraztiwari@gmail.c...'


In [24]:
# Step 6c: Context assembly
print("🔍 Step 3: Assembling context")

retrieved_texts = [r["metadata"]["text"] for r in results]
context = "\n\n".join(retrieved_texts)

print("✅ Context assembled!")
print(f"📊 Context length: {len(context)} characters")
print(f"📄 Number of chunks combined: {len(retrieved_texts)}")

🔍 Step 3: Assembling context
✅ Context assembled!
📊 Context length: 2681 characters
📄 Number of chunks combined: 3


In [25]:
# Step 6d: Response generation
print("🤖 Step 4: Generating response with LLM")

# Initialize RAG search system
rag_search = RAGSearch()

# Generate the final answer
response = rag_search.search_and_summarize(query, top_k=top_k)

print("✅ Response generated!")
print("\n" + "="*50)
print("🎯 FINAL ANSWER:")
print("="*50)
print(response)
print("="*50)

🤖 Step 4: Generating response with LLM
[INFO] Auto-detected device: cpu
[INFO] Auto-detected device: cpu
[INFO] Loaded embedding model: all-MiniLM-L6-v2
[INFO] Vector store directory: faiss_store
[INFO] Loaded Faiss index and metadata from faiss_store
[INFO] Gemini LLM initialized: gemini-2.5-flash
[INFO] LLM config - Temperature: 0.3, Max tokens: 1000
[INFO] Querying vector store for: 'How many years of experience does the candidate have?' (top_k=3)
✅ Response generated!

🎯 FINAL ANSWER:
The candidate has 6+ years of experience.


## Step 7: Complete RAG Pipeline Demo

Let's put it all together in a single function that demonstrates the entire pipeline.

In [27]:
def complete_rag_demo(query="What is this document about?"):
    """
    Complete RAG pipeline demonstration in one function.
    """
    print("🚀 Starting Complete RAG Pipeline Demo")
    print("="*50)
    
    # 1. Load documents
    print("1️⃣ Loading documents...")
    docs = load_all_documents("../data")  # Use relative path from notebooks folder
    print(f"   ✅ Loaded {len(docs)} documents")
    
    # 2. Chunk documents
    print("2️⃣ Chunking documents...")
    chunker = ChunkingPipeline()
    chunks = chunker.chunk_documents(docs)
    print(f"   ✅ Created {len(chunks)} chunks")
    
    # 3. Generate embeddings
    print("3️⃣ Generating embeddings...")
    embedder = EmbeddingPipeline()
    embeddings = embedder.embed_chunks(chunks)
    print(f"   ✅ Generated {embeddings.shape[0]} embeddings")
    
    # 4. Store in vector database
    print("4️⃣ Storing in vector database...")
    store = FaissVectorStore("faiss_store")
    metadata = [{"text": chunk.page_content} for chunk in chunks]
    store.add_embeddings(embeddings, metadata)
    store.save()
    print(f"   ✅ Stored {store.index.ntotal} vectors")
    
    # 5. Process query
    print("5️⃣ Processing query...")
    print(f"   Query: '{query}'")
    
    # Embed query
    query_vec = embedder.model.encode([query])
    
    # Search
    results = store.search(query_vec, top_k=3)
    
    # Generate response
    rag = RAGSearch()
    answer = rag.search_and_summarize(query, top_k=3)
    
    print("   ✅ Answer generated!")
    print("\n" + "="*50)
    print("🎯 ANSWER:")
    print("="*50)
    print(answer)
    print("="*50)
    
    return answer

# Run the demo
complete_rag_demo("Summarize the key points from the documents")

🚀 Starting Complete RAG Pipeline Demo
1️⃣ Loading documents...
[DEBUG] Data path: D:\GITHUB\All-About-RAG\data
[DEBUG] Supported formats: ['.pdf', '.txt', '.csv', '.xlsx', '.docx', '.json']
[DEBUG] Found 2 PDF files: ['D:\\GITHUB\\All-About-RAG\\data\\Neeraj_Tiwari_CV_Oct25 (1).pdf', 'D:\\GITHUB\\All-About-RAG\\data\\The Ultimate Python Cheat Sheet.pdf']
[DEBUG] Loading PDF: D:\GITHUB\All-About-RAG\data\Neeraj_Tiwari_CV_Oct25 (1).pdf
[DEBUG] Loaded 2 PDF docs from D:\GITHUB\All-About-RAG\data\Neeraj_Tiwari_CV_Oct25 (1).pdf
[DEBUG] Loading PDF: D:\GITHUB\All-About-RAG\data\The Ultimate Python Cheat Sheet.pdf
[DEBUG] Loaded 1 PDF docs from D:\GITHUB\All-About-RAG\data\The Ultimate Python Cheat Sheet.pdf
[DEBUG] Found 0 TXT files: []
[DEBUG] Found 0 CSV files: []
[DEBUG] Found 0 Excel files: []
[DEBUG] Found 0 Word files: []
[DEBUG] Found 0 JSON files: []
[DEBUG] Total loaded documents: 3
   ✅ Loaded 3 documents
2️⃣ Chunking documents...
[INFO] Chunking config - Size: 1000, Overlap: 200
[

Batches: 100%|██████████| 1/1 [00:03<00:00,  3.56s/it]


[INFO] Embeddings shape: (22, 384)
   ✅ Generated 22 embeddings
4️⃣ Storing in vector database...
[INFO] Auto-detected device: cpu
[INFO] Loaded embedding model: all-MiniLM-L6-v2
[INFO] Vector store directory: faiss_store
[INFO] Added 22 vectors to Faiss index.
[INFO] Saved Faiss index and metadata to faiss_store
   ✅ Stored 22 vectors
5️⃣ Processing query...
   Query: 'Summarize the key points from the documents'
[INFO] Auto-detected device: cpu
[INFO] Auto-detected device: cpu
[INFO] Loaded embedding model: all-MiniLM-L6-v2
[INFO] Vector store directory: faiss_store
[INFO] Loaded Faiss index and metadata from faiss_store
[INFO] Gemini LLM initialized: gemini-2.5-flash
[INFO] LLM config - Temperature: 0.3, Max tokens: 1000
[INFO] Querying vector store for: 'Summarize the key points from the documents' (top_k=3)
   ✅ Answer generated!

🎯 ANSWER:
The documents provide information on two main areas:

1.  **Python Programming Concepts:**
    *   **String Operations:** Covers indexing, sli

'The documents provide information on two main areas:\n\n1.  **Python Programming Concepts:**\n    *   **String Operations:** Covers indexing, slicing, and various string methods like `strip()`, `lower()`, `upper()`, `startswith()`, `endswith()`, `find()`, `replace()`, `join()`, `len()`, and the `in` operator for membership.\n    *   **Dictionary Operations:** Explains how to define, read, write, and iterate through dictionaries. It also details accessing keys (`keys()`) and values (`values()`) and using the `in` operator for membership checks.\n    *   **List & Set Comprehension:** Describes these as concise Python ways to create lists and sets.\n    *   **Membership Operator:** Highlights the `in` keyword for checking element presence in sets, lists, or dictionaries, noting that set membership is faster.\n\n2.  **Individual\'s Professional Profile/Expertise:**\n    *   **Hardware/Tools:** HPC Server, Camera, Lidar, Thermal Camera, Accelerometer, Gyroscope, FTP Camera, FTP Server.\n  

## Key Takeaways

🎯 **RAG combines retrieval and generation:**
- Retrieval finds relevant information
- Generation creates coherent responses
- Result: Accurate, up-to-date answers

🛠️ **Our implementation uses:**
- **LangChain** for document loading
- **SentenceTransformers** for embeddings
- **FAISS** for vector search
- **Google Gemini** for generation

📊 **Performance characteristics:**
- Fast similarity search (FAISS)
- Semantic understanding (embeddings)
- Contextual responses (LLM with retrieved docs)

🔄 **The pipeline is modular:**
- Each step can be optimized independently
- Easy to experiment with different models
- Scalable to large document collections

## Next Steps

- Try different embedding models
- Experiment with chunk sizes and overlap
- Add more document formats
- Implement advanced retrieval techniques
- Fine-tune the LLM prompts

Happy RAG experimenting! 🤖📚