# Week 4: RAG (Retrieval-Augmented Generation) - Hands-On Exercise

## Learning Objectives
1. Understand how embeddings represent text semantically
2. Learn vector similarity search
3. Build a simple RAG system using ChromaDB
4. See RAG in action with real queries

## What We'll Build
A knowledge base about AI/ML concepts that can answer questions using RAG

## Part 1: Setup and Installation

In [1]:
# Install required packages
!pip install chromadb sentence-transformers openai python-dotenv scikit-learn -q

In [2]:
import chromadb
from chromadb.utils import embedding_functions
from sentence_transformers import SentenceTransformer
import numpy as np
from typing import List, Dict
import os
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

print("‚úÖ All imports successful!")

‚úÖ All imports successful!


## Part 2: Understanding Embeddings

Embeddings convert text into numerical vectors that capture semantic meaning.

In [3]:
# Initialize embedding model
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')

# Example sentences
sentences = [
    "The cat sits on the mat",
    "A feline rests on the carpet",
    "The dog runs in the park",
    "Python is a programming language"
]

# Generate embeddings
embeddings = embedding_model.encode(sentences)

print(f"Number of sentences: {len(sentences)}")
print(f"Embedding dimension: {embeddings.shape[1]}")
print(f"\nFirst embedding (truncated): {embeddings[0][:10]}...")

Number of sentences: 4
Embedding dimension: 384

First embedding (truncated): [ 0.13489066 -0.03206333 -0.02033523  0.03590099 -0.0283331   0.04150213
  0.03315875  0.03660566  0.00861661  0.03763952]...


### Exercise 2.1: Compute Similarity

Calculate cosine similarity between embeddings to see which sentences are semantically similar.

In [4]:
from sklearn.metrics.pairwise import cosine_similarity

# Compute similarity matrix
similarity_matrix = cosine_similarity(embeddings)

print("Similarity Matrix:")
print("=" * 60)
for i, sent1 in enumerate(sentences):
    print(f"\n{sent1}")
    for j, sent2 in enumerate(sentences):
        if i != j:
            print(f"  vs '{sent2}': {similarity_matrix[i][j]:.4f}")

print("\nüí° Notice: Sentences 0 and 1 have high similarity despite different words!")

Similarity Matrix:

The cat sits on the mat
  vs 'A feline rests on the carpet': 0.5612
  vs 'The dog runs in the park': 0.0949
  vs 'Python is a programming language': 0.0317

A feline rests on the carpet
  vs 'The cat sits on the mat': 0.5612
  vs 'The dog runs in the park': 0.1561
  vs 'Python is a programming language': 0.1009

The dog runs in the park
  vs 'The cat sits on the mat': 0.0949
  vs 'A feline rests on the carpet': 0.1561
  vs 'Python is a programming language': 0.0462

Python is a programming language
  vs 'The cat sits on the mat': 0.0317
  vs 'A feline rests on the carpet': 0.1009
  vs 'The dog runs in the park': 0.0462

üí° Notice: Sentences 0 and 1 have high similarity despite different words!


## Part 3: Setting Up ChromaDB Vector Database

ChromaDB is an in-memory vector database perfect for RAG applications.

In [5]:
# Initialize ChromaDB client (in-memory)
chroma_client = chromadb.Client()

# Create embedding function
sentence_transformer_ef = embedding_functions.SentenceTransformerEmbeddingFunction(
    model_name="all-MiniLM-L6-v2"
)

# Create or get collection
collection = chroma_client.create_collection(
    name="ai_knowledge_base",
    embedding_function=sentence_transformer_ef,
    metadata={"description": "AI/ML concepts knowledge base"}
)

print(f"‚úÖ Created collection: {collection.name}")
print(f"   Collection count: {collection.count()}")

‚úÖ Created collection: ai_knowledge_base
   Collection count: 0


## Part 4: Populating the Knowledge Base

Let's add documents about AI/ML concepts to our vector database.

In [6]:
# Knowledge base documents
documents = [
    {
        "id": "doc1",
        "text": "Machine Learning is a subset of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed. It focuses on developing algorithms that can access data and use it to learn for themselves.",
        "metadata": {"category": "ML Basics", "topic": "Introduction"}
    },
    {
        "id": "doc2",
        "text": "Deep Learning is a subset of machine learning that uses neural networks with multiple layers. These deep neural networks can learn hierarchical representations of data, making them particularly effective for tasks like image recognition and natural language processing.",
        "metadata": {"category": "Deep Learning", "topic": "Neural Networks"}
    },
    {
        "id": "doc3",
        "text": "Natural Language Processing (NLP) is a branch of AI that helps computers understand, interpret, and manipulate human language. NLP combines computational linguistics with machine learning and deep learning models.",
        "metadata": {"category": "NLP", "topic": "Language Understanding"}
    },
    {
        "id": "doc4",
        "text": "Transformers are a type of neural network architecture that has revolutionized NLP. They use self-attention mechanisms to process sequential data in parallel, making them more efficient than previous architectures like RNNs. GPT and BERT are examples of transformer models.",
        "metadata": {"category": "Deep Learning", "topic": "Transformers"}
    },
    {
        "id": "doc5",
        "text": "Embeddings are dense vector representations of data that capture semantic meaning. In NLP, word embeddings represent words as vectors in a continuous vector space where semantically similar words are closer together.",
        "metadata": {"category": "NLP", "topic": "Embeddings"}
    },
    {
        "id": "doc6",
        "text": "RAG (Retrieval-Augmented Generation) combines information retrieval with text generation. It retrieves relevant documents from a knowledge base and uses them as context for generating accurate, grounded responses. This approach reduces hallucinations in LLMs.",
        "metadata": {"category": "Advanced AI", "topic": "RAG"}
    },
    {
        "id": "doc7",
        "text": "Vector databases store and index high-dimensional vectors for efficient similarity search. They use algorithms like HNSW (Hierarchical Navigable Small World) or IVF (Inverted File Index) to quickly find similar vectors, which is essential for RAG systems.",
        "metadata": {"category": "Advanced AI", "topic": "Vector Databases"}
    },
    {
        "id": "doc8",
        "text": "Fine-tuning is the process of taking a pre-trained model and training it further on a specific task or domain. This allows the model to adapt to specialized use cases while leveraging the knowledge learned during pre-training.",
        "metadata": {"category": "ML Basics", "topic": "Model Training"}
    },
    {
        "id": "doc9",
        "text": "Prompt Engineering is the practice of designing effective prompts to get desired outputs from language models. It involves techniques like few-shot learning, chain-of-thought prompting, and role-based prompting.",
        "metadata": {"category": "LLM Usage", "topic": "Prompting"}
    },
    {
        "id": "doc10",
        "text": "Agentic AI refers to AI systems that can autonomously plan, execute tasks, and make decisions to achieve goals. These agents can use tools, interact with environments, and adapt their strategies based on feedback.",
        "metadata": {"category": "Advanced AI", "topic": "AI Agents"}
    }
]

# Add documents to collection
collection.add(
    documents=[doc["text"] for doc in documents],
    metadatas=[doc["metadata"] for doc in documents],
    ids=[doc["id"] for doc in documents]
)

print(f"‚úÖ Added {len(documents)} documents to the knowledge base")
print(f"   Total documents in collection: {collection.count()}")

‚úÖ Added 10 documents to the knowledge base
   Total documents in collection: 10


## Part 5: Semantic Search with Vector Database

Now let's query our knowledge base using semantic search.

In [8]:
def search_knowledge_base(query: str, n_results: int = 3):
    """
    Search the knowledge base for relevant documents.
    
    Args:
        query: The search query
        n_results: Number of results to return
    
    Returns:
        Dictionary containing results
    """
    results = collection.query(
        query_texts=[query],
        n_results=n_results
    )
    
    return results

# Test queries
test_queries = [
    "What are neural networks?",
    "How do I represent words as numbers?",
    "What is the best way to reduce AI hallucinations?"
]

for query in test_queries:
    print(f"\n{'='*70}")
    print(f"Query: {query}")
    print(f"{'='*70}")
    
    results = search_knowledge_base(query, n_results=3)
    
    for i, (doc, metadata, distance) in enumerate(zip(
        results['documents'][0],
        results['metadatas'][0],
        results['distances'][0]
    )):
        print(f"\nResult {i+1} (Distance: {distance:.4f})")
        print(f"Category: {metadata['category']} | Topic: {metadata['topic']}")
        print(f"Text: {doc[:150]}...")


Query: What are neural networks?

Result 1 (Distance: 0.3759)
Category: Deep Learning | Topic: Neural Networks
Text: Deep Learning is a subset of machine learning that uses neural networks with multiple layers. These deep neural networks can learn hierarchical repres...

Result 2 (Distance: 0.5338)
Category: Deep Learning | Topic: Transformers
Text: Transformers are a type of neural network architecture that has revolutionized NLP. They use self-attention mechanisms to process sequential data in p...

Result 3 (Distance: 0.5632)
Category: ML Basics | Topic: Introduction
Text: Machine Learning is a subset of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed....

Query: How do I represent words as numbers?

Result 1 (Distance: 0.6329)
Category: NLP | Topic: Embeddings
Text: Embeddings are dense vector representations of data that capture semantic meaning. In NLP, word embeddings represent words as vectors in a continuou

### Exercise 5.1: Try Your Own Queries

Experiment with different queries to see how semantic search works.

In [None]:
# TODO: Try your own queries here
my_query = "YOUR QUERY HERE"

# Uncomment to run:
# results = search_knowledge_base(my_query, n_results=3)
# for i, doc in enumerate(results['documents'][0]):
#     print(f"\nResult {i+1}: {doc}")

## Part 6: Building a Simple RAG System

Now let's combine retrieval with generation using an LLM.

In [11]:
from groq import Groq
import os
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Initialize Groq client
client = Groq(api_key=os.getenv("GROQ_API_KEY"))

def rag_query(question: str, n_results: int = 3, model: str = "llama-3.3-70b-versatile"):
    """
    Perform RAG: Retrieve relevant documents and generate an answer using Groq.
    
    Args:
        question: User's question
        n_results: Number of documents to retrieve
        model: Groq model to use (options: "llama-3.3-70b-versatile", "llama-3.3-8b-instant", 
               "mixtral-8x7b-32768", "gemma2-9b-it")
    
    Returns:
        Dictionary with answer and sources
    """
    # Step 1: Retrieve relevant documents
    print(f"üîç Retrieving relevant documents...")
    results = search_knowledge_base(question, n_results=n_results)
    
    # Extract documents and metadata
    retrieved_docs = results['documents'][0]
    retrieved_metadata = results['metadatas'][0]
    
    # Step 2: Build context from retrieved documents
    context = "\n\n".join([
        f"Document {i+1} ({meta['category']} - {meta['topic']}):\n{doc}"
        for i, (doc, meta) in enumerate(zip(retrieved_docs, retrieved_metadata))
    ])
    
    print(f"üìö Retrieved {len(retrieved_docs)} documents")
    
    # Step 3: Create prompt with context
    prompt = f"""You are a helpful AI assistant. Answer the question based on the provided context.
If the context doesn't contain enough information, say so.

Context:
{context}

Question: {question}

Answer:"""
    
    # Step 4: Generate answer using Groq LLM
    print(f"ü§ñ Generating answer with Groq ({model})...")
    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": "You are a helpful AI assistant that answers questions based on provided context."},
            {"role": "user", "content": prompt}
        ],
        temperature=0.3,
        max_tokens=500
    )
    
    answer = response.choices[0].message.content
    
    return {
        "answer": answer,
        "sources": [
            {"text": doc, "metadata": meta}
            for doc, meta in zip(retrieved_docs, retrieved_metadata)
        ]
    }

print("‚úÖ RAG system ready with Groq!")

# Available Groq models:
# - llama-3.1-70b-versatile (recommended for best quality)
# - llama-3.1-8b-instant (fastest)
# - mixtral-8x7b-32768 (good balance)
# - gemma2-9b-it (Google's model)


‚úÖ RAG system ready with Groq!


### Test the RAG System

In [12]:
# Test RAG with sample questions
questions = [
    "What is the difference between machine learning and deep learning?",
    "How can I reduce hallucinations in language models?",
    "What are transformers and why are they important?"
]

for question in questions:
    print(f"\n{'='*70}")
    print(f"‚ùì Question: {question}")
    print(f"{'='*70}\n")
    
    result = rag_query(question)
    
    print(f"\nüí° Answer:\n{result['answer']}")
    
    print(f"\nüìñ Sources Used:")
    for i, source in enumerate(result['sources']):
        print(f"  {i+1}. {source['metadata']['category']} - {source['metadata']['topic']}")


‚ùì Question: What is the difference between machine learning and deep learning?

üîç Retrieving relevant documents...
üìö Retrieved 3 documents
ü§ñ Generating answer with Groq (llama-3.3-70b-versatile)...

üí° Answer:
Based on the provided context, the difference between machine learning and deep learning is that machine learning is a broader subset of artificial intelligence that enables systems to learn and improve from experience, whereas deep learning is a subset of machine learning that specifically uses neural networks with multiple layers to learn hierarchical representations of data.

In other words, all deep learning is machine learning, but not all machine learning is deep learning. Deep learning is a more specialized approach within the machine learning field, particularly effective for tasks like image recognition and natural language processing.

üìñ Sources Used:
  1. Deep Learning - Neural Networks
  2. ML Basics - Introduction
  3. NLP - Language Understanding

‚

## Part 7: Key Takeaways

### What We Learned:
1. **Embeddings** convert text to vectors that capture semantic meaning
2. **Vector databases** enable fast similarity search
3. **RAG** combines retrieval + generation for accurate, grounded responses
4. **Metadata filtering** allows precise document retrieval

### Next Steps:
- Experiment with different embedding models
- Try chunking strategies for long documents
- Explore advanced RAG patterns (GraphRAG, Agentic RAG)
- Build domain-specific knowledge bases