# Part 1: Simple RAG

In this notebook, we'll build a basic RAG system and see why it doesn't work well.

**What is RAG?**
- **R**etrieval — find relevant documents
- **A**ugmented — add them to the AI's context
- **G**eneration — generate an answer

By the end, you'll understand why simple RAG only achieves ~30% accuracy.

## Step 0: Setup

Run this cell first to clone the repo and install dependencies.

In [None]:
!git clone https://github.com/i33ym/rag-workshop.git 2>/dev/null || echo "Already cloned"
%cd rag-workshop

In [None]:
!pip install -q openai langchain langchain-openai langchain-community chromadb

## Step 1: Set Your API Key

Get your OpenAI API key from [platform.openai.com](https://platform.openai.com/api-keys)

In [None]:
import os
from getpass import getpass

os.environ["OPENAI_API_KEY"] = getpass("Enter your OpenAI API key: ")

## Step 2: Load the Documents

We'll load markdown files from the `docs/` folder.

In [None]:
from langchain_community.document_loaders import DirectoryLoader, TextLoader

loader = DirectoryLoader(
    "docs/",
    glob="**/*.md",
    loader_cls=TextLoader
)

documents = loader.load()
print(f"Loaded {len(documents)} documents")

## Step 3: Split Into Chunks

Documents are too long to process at once. We split them into smaller chunks.

**Why chunking matters:**
- LLMs have context limits
- Smaller chunks = more precise retrieval
- But too small = losing context

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)

chunks = splitter.split_documents(documents)
print(f"Split into {len(chunks)} chunks")

In [None]:
# Let's look at one chunk
print("=== Sample Chunk ===")
print(chunks[0].page_content[:500])
print("\n=== Metadata ===")
print(chunks[0].metadata)

## Step 4: Create Embeddings

**What are embeddings?**

Embeddings convert text into numbers (vectors) that capture meaning.

Similar texts have similar vectors. This lets us find relevant documents by comparing vectors.

In [None]:
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Test it
test_embedding = embeddings.embed_query("How do I authenticate?")
print(f"Embedding dimension: {len(test_embedding)}")
print(f"First 5 values: {test_embedding[:5]}")

## Step 5: Create Vector Store

A vector store holds all our chunk embeddings and lets us search by similarity.

We'll use ChromaDB (runs in memory, no setup needed).

In [None]:
from langchain_community.vectorstores import Chroma

vector_store = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings
)

print(f"Vector store created with {len(chunks)} chunks")

## Step 6: Test Retrieval

Let's search for relevant documents.

In [None]:
query = "How do I get an authorization token?"

results = vector_store.similarity_search(query, k=3)

print(f"Query: {query}\n")
for i, doc in enumerate(results):
    print(f"=== Result {i+1} ===")
    print(doc.page_content[:300])
    print()

## Step 7: Build Simple RAG

Now let's combine retrieval with generation.

In [None]:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

prompt = ChatPromptTemplate.from_template("""
Answer the question based only on the following context.
If you can't find the answer, say "I don't know."

Context:
{context}

Question: {question}

Answer:
""")

def simple_rag(question):
    # Retrieve
    docs = vector_store.similarity_search(question, k=3)
    context = "\n\n".join([doc.page_content for doc in docs])
    
    # Generate
    chain = prompt | llm | StrOutputParser()
    answer = chain.invoke({"context": context, "question": question})
    
    return answer, docs

In [None]:
# Test it!
question = "How do I get an authorization token?"

answer, docs = simple_rag(question)

print(f"Question: {question}\n")
print(f"Answer: {answer}")

## Step 8: Test More Questions

Let's see how well it performs on different types of questions.

In [None]:
test_questions = [
    "How do I create a payment?",
    "What error codes can the API return?",
    "How do I set up webhooks?",
    "What is the endpoint for checking payment status?",
    "How do I authenticate API requests?"
]

for q in test_questions:
    answer, _ = simple_rag(q)
    print(f"Q: {q}")
    print(f"A: {answer[:200]}...\n")

## Problems with Simple RAG

You probably noticed some issues:

### 1. Retrieval misses exact terms
Vector search is semantic — it finds similar *meanings*, not exact *words*.

If you search for `POST /api/v1/payment`, you might get docs about "creating payments" instead of the actual endpoint.

In [None]:
# Try an exact endpoint search
results = vector_store.similarity_search("POST /api/payment/create", k=3)

print("Searching for exact endpoint 'POST /api/payment/create':\n")
for i, doc in enumerate(results):
    print(f"Result {i+1}: {doc.page_content[:150]}...\n")

### 2. No relevance verification
Even if documents aren't really relevant, we still generate an answer from them.

In [None]:
# Ask about something NOT in the docs
answer, docs = simple_rag("How do I integrate with Stripe?")

print(f"Question about something NOT in docs:\n")
print(f"Answer: {answer}")
print(f"\n(Notice: it might hallucinate or give wrong info)")

### 3. All retrieved docs are used equally
Some retrieved documents are more relevant than others, but we treat them all the same.

In [None]:
# Look at similarity scores
results_with_scores = vector_store.similarity_search_with_score("How do I authenticate?", k=5)

print("Similarity scores (lower = more similar):\n")
for doc, score in results_with_scores:
    print(f"Score: {score:.3f} | {doc.page_content[:60]}...")

## Benchmark Results

Research comparing 18 RAG techniques found:

| Technique | Accuracy |
|-----------|----------|
| **Simple RAG** | **0.30** |
| Semantic Chunking | 0.20 |
| HyDE | 0.50 |
| Reranker | 0.70 |
| Hybrid Search | 0.83 |
| Adaptive RAG | 0.86 |

Simple RAG only gets 30% right. We can do much better.

## Summary

**What we built:**
- Loaded documents
- Split into chunks
- Created embeddings
- Built a vector store
- Combined retrieval + generation

**Why it's not enough:**
- Vector search misses exact matches
- No relevance verification
- No reranking of results
- No hallucination prevention

**Next notebook:** We'll fix all of these problems and build a production-ready system.

In [None]:
print("✅ Part 1 complete!")
print("Next: Open 02_production_rag.ipynb")