# Week 12: Retrieval Augmented Generation (RAG) - Homework

**ML2: Advanced Machine Learning**

**Estimated Time**: 1 hour

---

This homework combines programming exercises and knowledge-based questions to reinforce this week's concepts.

## Setup

Run this cell to import necessary libraries:

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.nn as nn

# Set random seed for reproducibility
np.random.seed(42)
torch.manual_seed(42)

print('✓ Libraries imported successfully')

---
## Part 1: Programming Exercises (60%)

Complete the following programming tasks. Read each description carefully and implement the requested functionality.

### Exercise 1: Experiment: RAG vs Non-RAG

**Time**: 10 min

Compare LLM responses with and without retrieved context.

In [None]:
# Scenario: Company-specific Q&A

# Document database (simplified)
docs = [
    "Acme Corp vacation policy: 15 days PER year for new employees.",
    "Acme Corp allows remote work 3 days per week.",
    "Acme Corp health insurance covers dental."
]

question = "How many vacation days do new employees get?"

# WITHOUT RAG:
prompt_no_rag = f"Question: {question}"
# LLM might hallucinate or give generic answer

# WITH RAG:
# 1. Retrieve relevant docs
relevant_doc = docs[0]  # (in reality, use embedding similarity)

# 2. Augment prompt
prompt_with_rag = f"""Context: {relevant_doc}

Question: {question}

Answer based on the context:"""
# LLM gives accurate, grounded answer

# TODO: Compare outputs. RAG provides factual, specific answers.

---
## Part 2: Knowledge Questions (40%)

Answer the following questions to test your conceptual understanding.

### Question 1 (Short Answer)

**Question 1 - Why RAG?**

LLMs have knowledge cutoff dates and can't access private/proprietary data.

Explain:
1. How does RAG solve these problems?
2. What's the alternative to RAG (fine-tuning)?
3. When would you choose RAG over fine-tuning?

**Hint**: RAG = retrieve + inject context. Fine-tuning = retrain model. RAG is more flexible.

**Your Answer**:

[Write your answer here in 2-4 sentences]

### Question 2 (Short Answer)

**Question 2 - RAG Pipeline**

RAG pipeline:
1. Embed documents into vector database
2. Embed user query
3. Retrieve top-k most similar documents
4. Inject into LLM prompt
5. Generate answer

Explain: Why is embedding similarity better than keyword matching for retrieval?

**Hint**: Embeddings capture semantic meaning. "car" and "automobile" are similar in embedding space.

**Your Answer**:

[Write your answer here in 2-4 sentences]

### Question 3 (Multiple Choice)

**Question 3 - Chunk Size Tradeoff**

You're splitting documents into chunks for RAG. Should chunks be:

A) Very small (1 sentence) - precise but lacks context
B) Very large (entire documents) - contextual but noisy
C) Medium (paragraphs) - balances precision and context
D) Size doesn't matter

A) Very small (1 sentence) - precise but lacks context
B) Very large (entire documents) - contextual but noisy
C) Medium (paragraphs) - balances precision and context
D) Size doesn't matter

**Hint**: Too small = missing context. Too large = irrelevant information. Balance is key.

**Your Answer**: [Write your answer here - e.g., 'B']

**Explanation**: [Explain why this is correct]

### Question 4 (Short Answer)

**Question 4 - Vector Databases**

RAG systems use vector databases (Pinecone, Weaviate, Chroma).

Explain:
1. What makes them different from traditional databases?
2. What operation do they optimize for?
3. Why can't you just use PostgreSQL?

**Hint**: Vector DBs optimize for similarity search (nearest neighbors), not exact matches.

**Your Answer**:

[Write your answer here in 2-4 sentences]

### Question 5 (Short Answer)

**Question 5 - Top-k Retrieval**

You retrieve the top-k most similar documents. What's k?

Explain:
1. What happens if k is too small (e.g., k=1)?
2. What happens if k is too large (e.g., k=100)?
3. How do you choose k?

**Hint**: Too small = miss relevant info. Too large = noise + context window limit.

**Your Answer**:

[Write your answer here in 2-4 sentences]

### Question 6 (Multiple Choice)

**Question 6 - Hallucination Reduction**

RAG reduces hallucinations because:

A) It makes the model larger
B) It grounds responses in retrieved factual documents
C) It uses higher temperature
D) It eliminates all errors

A) It makes the model larger
B) It grounds responses in retrieved factual documents
C) It uses higher temperature
D) It eliminates all errors

**Hint**: RAG provides evidence. LLM is instructed to answer from evidence, not make things up.

**Your Answer**: [Write your answer here - e.g., 'B']

**Explanation**: [Explain why this is correct]

### Question 7 (Short Answer)

**Question 7 - Retrieval Metrics**

How do you measure retrieval quality?

- Precision @ k: Fraction of top-k that are relevant
- Recall @ k: Fraction of ALL relevant docs in top-k
- MRR: Mean reciprocal rank of first relevant doc

Explain: Why might you want high recall even if precision is lower?

**Hint**: Better to retrieve extra documents than miss THE crucial one.

**Your Answer**:

[Write your answer here in 2-4 sentences]

### Question 8 (Short Answer)

**Question 8 - Hybrid Search**

Hybrid search = semantic search (embeddings) + keyword search (BM25)

Explain:
1. Why combine both?
2. When does keyword search outperform semantic search?
3. How do you merge the results?

**Hint**: Semantic = meaning. Keyword = exact terms. Combine for robustness. Merge with weighted scores.

**Your Answer**:

[Write your answer here in 2-4 sentences]

### Question 9 (Short Answer)

**Question 9 - Metadata Filtering**

Before semantic search, filter by metadata:
- Date range
- Author
- Document type

Explain: Why filter first instead of just retrieving everything?

**Hint**: Filtering reduces search space, improves relevance, respects permissions.

**Your Answer**:

[Write your answer here in 2-4 sentences]

### Question 10 (Short Answer)

**Question 10 - Real-World Application**

Customer support chatbot with RAG:
- Vector DB: Company knowledge base, FAQs, docs
- Query: Customer question
- Retrieve + Generate: Grounded answer

Explain:
1. What advantage does this have over traditional keyword search?
2. What happens when documents are updated?
3. How do you handle multi-step questions?

**Hint**: Semantic search understands intent. Update embeddings when docs change. Multi-step = multiple retrievals.

**Your Answer**:

[Write your answer here in 2-4 sentences]

---
## Submission

Before submitting:
1. Run all cells to ensure code executes without errors
2. Check that all questions are answered
3. Review your explanations for clarity

**To Submit**:
- File → Download → Download .ipynb
- Submit the notebook file to your course LMS

**Note**: Make sure your name is in the filename (e.g., homework_01_yourname.ipynb)