# Building a RAG System from Scratch - Step by Step

**Retrieval-Augmented Generation (RAG)** is a technique that enhances Large Language Models by providing them with relevant context from a knowledge base before generating answers.

**Why RAG?**
- ‚úÖ Reduces hallucinations by grounding answers in real data
- ‚úÖ Enables LLMs to access up-to-date information
- ‚úÖ Allows working with private/proprietary documents
- ‚úÖ Can cite sources for answers

**What we'll build:**
A complete RAG system that can answer questions about a document (2024 State of the Union address)

## Step 1: Install Required Dependencies

In [None]:
import subprocess
import sys

packages = [
    "langchain",              # Core LangChain framework
    "langchain-chroma",       # Chroma vector store integration
    "langchain-openai",       # OpenAI models integration
    "langchain-core",         # Core LangChain utilities
    "python-dotenv",          # Environment variable management
    "chromadb"                # Vector database
]

print("Installing RAG dependencies...\n")
for package in packages:
    subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", package])
    print(f"‚úì {package}")

print("\n‚úÖ All packages installed successfully!")

**üìù Explanation:**
We install 6 essential packages:
- **langchain**: Main framework for building LLM applications
- **langchain-chroma**: Allows us to use ChromaDB as our vector database
- **langchain-openai**: Provides OpenAI's GPT models and embeddings
- **langchain-core**: Core utilities for chains and prompts
- **python-dotenv**: Loads API keys from .env file securely
- **chromadb**: Lightweight vector database for storing document embeddings

## Step 2: Import Libraries

In [None]:
from dotenv import load_dotenv
from langchain_chroma import Chroma
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_text_splitters import CharacterTextSplitter

# Load API keys from .env file
load_dotenv()

print("‚úÖ All imports successful and environment loaded!")

**üìù Explanation:**
Each import serves a specific purpose in our RAG pipeline:
- **Chroma**: Vector database for storing embeddings
- **PromptTemplate**: Structures prompts with variables
- **RunnablePassthrough**: Passes data through pipeline unchanged
- **StrOutputParser**: Extracts text from LLM response
- **OpenAIEmbeddings**: Converts text to vector embeddings
- **ChatOpenAI**: OpenAI's chat model (GPT)
- **CharacterTextSplitter**: Splits large documents into chunks
- **load_dotenv()**: Loads your OPENAI_API_KEY from .env file

## Step 3: Initialize Embeddings Model

In [None]:
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

print("‚úÖ Embeddings model initialized")
print(f"   Model: text-embedding-3-large")
print(f"   Dimensions: 3072 (vector size)")

**üìù Explanation:**
Embeddings convert text into numerical vectors that capture semantic meaning. Similar concepts have similar vectors.

**Why text-embedding-3-large?**
- High quality: 3072-dimensional vectors
- Captures nuanced meaning
- Good for semantic similarity search

**Example:** 
- "dog" and "puppy" ‚Üí similar vectors
- "dog" and "car" ‚Üí different vectors

These embeddings allow us to find relevant documents even when they don't contain exact keyword matches.

## Step 4: Create Vector Store (ChromaDB)

In [None]:
vector_store = Chroma(
    collection_name="state_of_union_rag",
    embedding_function=embeddings
)

print("‚úÖ Vector store created")
print(f"   Database: ChromaDB")
print(f"   Collection: state_of_union_rag")
print(f"   Ready to store document embeddings")

**üìù Explanation:**
ChromaDB is a vector database that stores and retrieves embeddings efficiently.

**What it does:**
- Stores document embeddings (vectors)
- Performs fast similarity searches
- Returns the most relevant documents for a query

**How it works:**
1. Documents ‚Üí Embeddings ‚Üí Stored in ChromaDB
2. Query ‚Üí Embedding ‚Üí Search similar vectors
3. ChromaDB returns most similar documents

**Why ChromaDB?**
- Lightweight and easy to use
- No separate server needed
- Perfect for development and small-to-medium projects

## Step 5: Load the Document

In [None]:
with open("2024_state_of_the_union.txt", "r") as f:
    document = f.read()

print("‚úÖ Document loaded successfully")
print(f"   File: 2024_state_of_the_union.txt")
print(f"   Total characters: {len(document):,}")
print(f"   Total words: ~{len(document.split()):,}")
print(f"\n   Preview (first 200 chars):")
print(f"   {document[:200]}...")

**üìù Explanation:**
We load the document that will serve as our knowledge base.

**Why this step?**
- RAG needs a source of information to retrieve from
- This document contains facts the LLM can reference
- In production, this could be PDFs, databases, APIs, etc.

**Note:** The document is likely too large to fit in a single LLM prompt (context window), which is why we need RAG and chunking in the next step.

## Step 6: Split Document into Chunks