In [None]:
# RAG Implementation - Step by Step

This notebook teaches RAG (Retrieval-Augmented Generation) fundamentals through practical implementation.

**What is RAG?**
RAG combines retrieval of relevant documents with language model generation to produce answers grounded in actual data.

**Steps we'll follow:**
1. Install and import dependencies
2. Setup vector database
3. Load and chunk documents
4. Create retriever
5. Build RAG chain
6. Test with queries

In [None]:
## Step 1: Install Required Packages

In [None]:
import subprocess
import sys

# List of packages needed for RAG
packages = [
    "langchain",
    "langchain-chroma",
    "langchain-openai",
    "langchain-core",
    "python-dotenv",
    "chromadb"
]

print("Installing packages...\n")

for package in packages:
    try:
        subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", package])
        print(f"✓ {package}")
    except:
        print(f"✗ {package} (already installed)")

print("\n✓ Setup complete!")

## Step 2: Import Libraries

In [None]:
from dotenv import load_dotenv
from langchain_chroma import Chroma
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_text_splitters import CharacterTextSplitter

# Load environment variables from .env file
load_dotenv()

print("✓ All imports successful")

## Step 3: Initialize Embeddings Model

Embeddings convert text into numerical vectors that capture semantic meaning.
We'll use OpenAI's text-embedding-3-large model.

In [None]:
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

print("✓ Embeddings model initialized")
print(f"  Model: text-embedding-3-large")
print(f"  Type: {type(embeddings).__name__}")

## Step 4: Create Vector Store (ChromaDB)

ChromaDB is a lightweight vector database. It will store our document embeddings
and allow us to search for similar documents semantically.