# Session 3: Building RAG Systems
## Retrieval-Augmented Generation

**Duration**: 90 minutes

In this session, you'll learn how to build a complete RAG (Retrieval-Augmented Generation) system that allows LLMs to answer questions based on your own documents.

### What You'll Build:
- Document loading and processing pipeline
- Intelligent text chunking system
- Embeddings generation with Sentence Transformers
- Vector database storage with ChromaDB
- Semantic search functionality
- Complete end-to-end RAG application

---

## Setup: Install Dependencies

In [None]:
# Install required libraries
!pip install -q openai
!pip install -q langchain langchain-community
!pip install -q sentence-transformers
!pip install -q chromadb
!pip install -q scikit-learn
!pip install -q numpy

print("‚úÖ All dependencies installed!")

In [None]:
# Configure API keys (from Colab Secrets)
import os

try:
    from google.colab import userdata
    os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')
    print("‚úÖ API keys loaded")
except:
    from getpass import getpass
    os.environ['OPENAI_API_KEY'] = getpass('Enter OpenAI API key: ')

## Part 1: Create Sample Documents

We'll create a small knowledge base of company policies.

In [None]:
import os

# Sample documents (company knowledge base)
documents = [
    {
        "content": """
Product Return Policy

Our company offers a 30-day return policy for all products.
To be eligible for a return, items must be unused and in their
original packaging. Customers can initiate a return by contacting
customer service at support@example.com or calling 1-800-RETURNS.

Refunds are processed within 5-7 business days after we receive
the returned item. The refund will be issued to the original
payment method. Shipping costs are non-refundable unless the
return is due to our error.
        """,
        "metadata": {"source": "return_policy.txt", "department": "customer_service"}
    },
    {
        "content": """
Shipping Information

We offer three shipping options:
- Standard Shipping: 5-7 business days ($5.99)
- Express Shipping: 2-3 business days ($12.99)
- Overnight Shipping: 1 business day ($24.99)

All orders over $50 qualify for free standard shipping.
International shipping is available to select countries.
Tracking information is provided via email once the order ships.
        """,
        "metadata": {"source": "shipping_info.txt", "department": "logistics"}
    },
    {
        "content": """
Customer Support Hours

Our customer support team is available:
- Monday to Friday: 9 AM - 9 PM EST
- Saturday: 10 AM - 6 PM EST
- Sunday: 12 PM - 5 PM EST

Contact methods:
- Phone: 1-800-SUPPORT
- Email: support@example.com
- Live Chat: Available on our website during business hours

Average response time for emails is 24 hours on business days.
        """,
        "metadata": {"source": "support_hours.txt", "department": "customer_service"}
    },
    {
        "content": """
Product Warranty

All products come with a standard 1-year manufacturer's warranty
covering defects in materials and workmanship. Extended warranty
plans are available for purchase at checkout.

Warranty claims can be submitted through our website or by
contacting customer service. Proof of purchase is required for
all warranty claims. The warranty does not cover damage from
misuse, accidents, or normal wear and tear.
        """,
        "metadata": {"source": "warranty_info.txt", "department": "product"}
    }
]

print(f"‚úÖ Created {len(documents)} sample documents")
print("\nDocument sources:")
for doc in documents:
    print(f"  - {doc['metadata']['source']}")

## Part 2: Text Chunking

We'll split documents into smaller chunks for better retrieval.

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

def chunk_documents(documents, chunk_size=400, chunk_overlap=100):
    """
    Split documents into chunks using intelligent text splitter
    """
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=chunk_overlap,
        length_function=len,
        separators=["\n\n", "\n", ". ", " ", ""]  # Split on paragraphs, then sentences
    )
    
    all_chunks = []
    
    for doc in documents:
        chunks = text_splitter.split_text(doc['content'])
        
        for i, chunk in enumerate(chunks):
            all_chunks.append({
                'content': chunk,
                'metadata': {
                    **doc['metadata'],
                    'chunk_index': i,
                    'total_chunks': len(chunks)
                }
            })
    
    return all_chunks

# Chunk all documents
chunked_docs = chunk_documents(documents, chunk_size=400, chunk_overlap=100)

print(f"‚úÖ Created {len(chunked_docs)} chunks from {len(documents)} documents")
print("\nChunk distribution:")

for doc in documents:
    source = doc['metadata']['source']
    count = sum(1 for c in chunked_docs if c['metadata']['source'] == source)
    print(f"  - {source}: {count} chunks")

In [None]:
# View a sample chunk
print("Example Chunk:")
print("=" * 60)
print(f"Source: {chunked_docs[0]['metadata']['source']}")
print(f"Chunk {chunked_docs[0]['metadata']['chunk_index'] + 1} of {chunked_docs[0]['metadata']['total_chunks']}")
print("-" * 60)
print(chunked_docs[0]['content'])
print("=" * 60)

## Part 3: Generate Embeddings

Convert text chunks into numerical vectors for semantic search.

In [None]:
from sentence_transformers import SentenceTransformer

# Load embedding model
print("Loading embedding model...")
model_name = 'all-MiniLM-L6-v2'
embedding_model = SentenceTransformer(model_name)

print(f"‚úÖ Loaded: {model_name}")
print(f"   Embedding dimension: {embedding_model.get_sentence_embedding_dimension()}")

In [None]:
# Generate embeddings for all chunks
chunk_texts = [chunk['content'] for chunk in chunked_docs]

print("Generating embeddings...")
embeddings = embedding_model.encode(
    chunk_texts,
    show_progress_bar=True,
    batch_size=32
)

print(f"\n‚úÖ Generated {len(embeddings)} embeddings")
print(f"   Shape: {embeddings.shape}")

In [None]:
# Test semantic similarity
from sklearn.metrics.pairwise import cosine_similarity

def calculate_similarity(text1, text2):
    """Calculate cosine similarity between two texts"""
    emb1 = embedding_model.encode([text1])
    emb2 = embedding_model.encode([text2])
    similarity = cosine_similarity(emb1, emb2)[0][0]
    return similarity

# Test examples
examples = [
    ("What is the return policy?", "How do I return a product?"),
    ("What is the return policy?", "What are shipping options?"),
]

print("Semantic Similarity Examples:")
print("-" * 60)
for text1, text2 in examples:
    sim = calculate_similarity(text1, text2)
    print(f"Text 1: {text1}")
    print(f"Text 2: {text2}")
    print(f"Similarity: {sim:.4f}\n")

## Part 4: Store in Vector Database

In [None]:
import chromadb
from chromadb.config import Settings

# Initialize ChromaDB
client = chromadb.Client(Settings(anonymized_telemetry=False))

# Create collection
collection_name = "company_knowledge_base"

# Delete if exists (for fresh start)
try:
    client.delete_collection(collection_name)
except:
    pass

collection = client.create_collection(
    name=collection_name,
    metadata={"description": "Company documentation for RAG"}
)

print(f"‚úÖ Created collection: {collection_name}")

In [None]:
# Add documents to collection
ids = [f"chunk_{i}" for i in range(len(chunked_docs))]
documents = [chunk['content'] for chunk in chunked_docs]
metadatas = [chunk['metadata'] for chunk in chunked_docs]
embeddings_list = [emb.tolist() for emb in embeddings]

collection.add(
    ids=ids,
    documents=documents,
    embeddings=embeddings_list,
    metadatas=metadatas
)

print(f"‚úÖ Added {collection.count()} documents to ChromaDB")

## Part 5: Semantic Search

In [None]:
def search_knowledge_base(query, n_results=3):
    """Search the knowledge base for relevant chunks"""
    
    # Generate embedding for query
    query_embedding = embedding_model.encode([query])[0]
    
    # Search vector database
    results = collection.query(
        query_embeddings=[query_embedding.tolist()],
        n_results=n_results,
        include=["documents", "metadatas", "distances"]
    )
    
    return results

# Test query
query = "What is the return policy?"
results = search_knowledge_base(query, n_results=3)

print(f"Query: '{query}'")
print("\nTop 3 Results:")
print("=" * 70)

for i in range(len(results['documents'][0])):
    print(f"\n[Result {i+1}]")
    print(f"Source: {results['metadatas'][0][i]['source']}")
    print(f"Distance: {results['distances'][0][i]:.4f}")
    print(f"Content: {results['documents'][0][i][:150]}...")
    print("-" * 70)

In [None]:
# Try multiple queries
test_queries = [
    "How long does shipping take?",
    "What are customer support hours?",
    "How do warranties work?",
    "Can I get free shipping?"
]

for query in test_queries:
    print(f"\nüîç Query: {query}")
    results = search_knowledge_base(query, n_results=2)
    
    for i in range(len(results['documents'][0])):
        print(f"  ‚úì {results['metadatas'][0][i]['source']}")
        print(f"    {results['documents'][0][i][:80]}...")

## Part 6: Build Complete RAG System

In [None]:
from openai import OpenAI

# Initialize OpenAI client
openai_client = OpenAI(api_key=os.environ.get('OPENAI_API_KEY'))

def rag_query(question, n_results=3):
    """
    Complete RAG pipeline:
    1. Retrieve relevant documents
    2. Format context
    3. Generate answer with LLM
    """
    
    # Step 1: Retrieve
    search_results = search_knowledge_base(question, n_results=n_results)
    
    # Step 2: Format context
    context_parts = []
    for i, doc in enumerate(search_results['documents'][0]):
        source = search_results['metadatas'][0][i]['source']
        context_parts.append(f"[Source: {source}]\n{doc}")
    
    context = "\n\n".join(context_parts)
    
    # Step 3: Create prompt
    prompt = f"""Answer the question based on the provided context.
If the answer is not in the context, say "I don't have enough information to answer that."

Context:
{context}

Question: {question}

Answer:"""
    
    # Step 4: Generate response
    response = openai_client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "You are a helpful customer service assistant. Answer questions based only on the provided context."},
            {"role": "user", "content": prompt}
        ],
        temperature=0.3,
        max_tokens=200
    )
    
    answer = response.choices[0].message.content
    
    return {
        'question': question,
        'answer': answer,
        'sources': [meta['source'] for meta in search_results['metadatas'][0]],
        'context_chunks': search_results['documents'][0]
    }

print("‚úÖ RAG system ready!")

In [None]:
# Test the RAG system
question = "What is your return policy and how long do I have?"
result = rag_query(question)

print("=" * 70)
print(f"Question: {result['question']}")
print("=" * 70)
print(f"\nü§ñ Answer:\n{result['answer']}")
print(f"\nüìö Sources: {', '.join(result['sources'])}")
print("=" * 70)

In [None]:
# Try more questions
questions = [
    "How much does overnight shipping cost?",
    "What days is customer support available?",
    "How long is the warranty period?",
    "What happens if I want to return something after 30 days?"  # Edge case
]

for q in questions:
    result = rag_query(q)
    print(f"\n‚ùì {q}")
    print(f"üí° {result['answer']}")
    print(f"üìÑ Sources: {', '.join(result['sources'])}")
    print("-" * 70)

## Part 7: Evaluation

In [None]:
# Create test cases
test_cases = [
    {
        "question": "How long do I have to return a product?",
        "expected_answer_contains": ["30-day", "30 day"],
        "expected_source": "return_policy.txt"
    },
    {
        "question": "What are the customer service hours on Saturday?",
        "expected_answer_contains": ["10 AM", "6 PM"],
        "expected_source": "support_hours.txt"
    },
    {
        "question": "How much is overnight shipping?",
        "expected_answer_contains": ["24.99", "$24.99"],
        "expected_source": "shipping_info.txt"
    }
]

# Evaluate
results = []
for test in test_cases:
    result = rag_query(test['question'])
    
    # Check answer correctness
    answer_correct = any(
        exp.lower() in result['answer'].lower()
        for exp in test['expected_answer_contains']
    )
    
    # Check source correctness
    source_correct = test['expected_source'] in result['sources']
    
    results.append({
        'question': test['question'],
        'answer_correct': answer_correct,
        'source_correct': source_correct
    })

# Calculate metrics
answer_accuracy = sum(r['answer_correct'] for r in results) / len(results)
source_accuracy = sum(r['source_correct'] for r in results) / len(results)

print("EVALUATION RESULTS")
print("=" * 60)
print(f"Answer Accuracy: {answer_accuracy:.1%}")
print(f"Source Accuracy: {source_accuracy:.1%}\n")

for i, result in enumerate(results):
    print(f"Test {i+1}: {result['question']}")
    print(f"  Answer: {'‚úÖ' if result['answer_correct'] else '‚ùå'}")
    print(f"  Source: {'‚úÖ' if result['source_correct'] else '‚ùå'}")
    print()

## Session Complete! ‚úÖ

You've successfully built a complete RAG system!

### What You Learned:
- ‚úÖ Document processing and chunking
- ‚úÖ Generating embeddings with Sentence Transformers
- ‚úÖ Storing vectors in ChromaDB
- ‚úÖ Semantic search and retrieval
- ‚úÖ Building end-to-end RAG pipeline
- ‚úÖ Evaluating RAG performance

### Next Steps:
1. Try with your own documents
2. Experiment with different chunk sizes
3. Test different embedding models
4. Implement reranking

**Next Session**: Function Calling & Tool Use