![Redis](https://redis.io/wp-content/uploads/2024/04/Logotype.svg?auto=webp&quality=85,75&width=120)

# RAG: Retrieved Context in Practice

## From Context Engineering to Retrieval-Augmented Generation

In Section 1, you learned about the four core context types:
1. **System Context** - The AI's role and domain knowledge
2. **User Context** - Personal profiles and preferences  
3. **Conversation Context** - Dialogue history and flow
4. **Retrieved Context** - Dynamic information from external sources

This notebook focuses on **Retrieved Context** - the most powerful and complex context type. You'll learn how to build a production-ready RAG (Retrieval-Augmented Generation) system that dynamically fetches relevant information to enhance AI responses.

## What You'll Learn

**RAG Fundamentals:**
- What RAG is and why it's essential for context engineering
- How vector embeddings enable semantic search
- Building a complete RAG pipeline with LangChain and Redis

**Practical Implementation:**
- Generate and ingest course data using existing utilities
- Set up Redis vector store for semantic search
- Implement retrieval and generation workflows
- Combine retrieved context with user and system context

**Foundation for Advanced Topics:**
- This RAG system becomes the base for Section 3 (Memory Architecture)
- You'll add LangGraph state management and tools in later sections
- Focus here is purely on retrieval → context assembly → generation

**Time to complete:** 30-35 minutes

---

## Why RAG Matters for Context Engineering

### The Challenge: Static vs. Dynamic Knowledge

In Section 1, we used **hardcoded** course information in the system context:

```python
system_context = """You are a Redis University course advisor.

Available Courses:
- RU101: Introduction to Redis (Beginner, 4-6 hours)
- RU201: Redis for Python (Intermediate, 6-8 hours)
...
"""
```

**Problems with this approach:**
- ❌ **Doesn't scale** - Can't hardcode thousands of courses
- ❌ **Wastes tokens** - Includes irrelevant courses in every request
- ❌ **Hard to update** - Requires code changes to add/modify courses
- ❌ **No personalization** - Same courses shown to everyone

### The Solution: Retrieval-Augmented Generation (RAG)

RAG solves these problems by **dynamically retrieving** only the most relevant information:

```
User Query: "I want to learn about vector search"
     ↓
Semantic Search: Find courses matching "vector search"
     ↓
Retrieved Context: RU301 - Vector Similarity Search with Redis
     ↓
LLM Generation: Personalized recommendation using retrieved context
```

**Benefits:**
- ✅ **Scales infinitely** - Store millions of documents
- ✅ **Token efficient** - Only retrieve what's relevant
- ✅ **Easy to update** - Add/modify data without code changes
- ✅ **Personalized** - Different results for different queries

### RAG as "Retrieved Context" from Section 1

Remember the four context types? RAG is how we implement **Retrieved Context** in production:

| Context Type | Storage | Retrieval Method | Example |
|--------------|---------|------------------|---------|
| System Context | Hardcoded | Always included | AI role, instructions |
| User Context | Database | User ID lookup | Student profile |
| Conversation Context | Session store | Session ID lookup | Chat history |
| **Retrieved Context** | **Vector DB** | **Semantic search** | **Relevant courses** |

---

## Setup and Environment

Let's prepare our environment with the necessary dependencies.

In [1]:
import os
import sys
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Verify required environment variables
required_vars = ["OPENAI_API_KEY"]
missing_vars = [var for var in required_vars if not os.getenv(var)]

if missing_vars:
    print(f"""
⚠️  Missing required environment variables: {', '.join(missing_vars)}

Please create a .env file with:
OPENAI_API_KEY=your_openai_api_key
REDIS_URL=redis://localhost:6379

For Redis setup:
- Local: docker run -d -p 6379:6379 redis/redis-stack-server:latest
- Cloud: https://redis.com/try-free/
""")
    sys.exit(1)
REDIS_URL='redis://localhost:6379'
print("✅ Environment variables loaded")
print(f"   REDIS_URL: {REDIS_URL}")
print(f"   OPENAI_API_KEY: {'✓ Set' if os.getenv('OPENAI_API_KEY') else '✗ Not set'}")

✅ Environment variables loaded
   REDIS_URL: redis://localhost:6379
   OPENAI_API_KEY: ✓ Set


### Install Dependencies

We'll use LangChain for RAG orchestration and Redis for vector storage.

In [2]:
# Install required packages (uncomment if needed)
# %pip install -q langchain langchain-openai langchain-redis redisvl redis python-dotenv

print("✅ Dependencies ready")

✅ Dependencies ready


---

## 📊 Step 1: Understanding Vector Embeddings

Before building our RAG system, let's understand the core concept: **vector embeddings**.

### What Are Embeddings?

Embeddings convert text into numerical vectors that capture semantic meaning:

```
Text: "Introduction to Redis"
  ↓ (embedding model)
Vector: [0.23, -0.45, 0.67, ..., 0.12]  # 1536 dimensions for OpenAI
```

**Key insight:** Similar texts have similar vectors (measured by cosine similarity).

### Why Embeddings Enable Semantic Search

Traditional keyword search:
- Query: "machine learning courses" 
- Matches: Only documents containing exact words "machine learning"
- Misses: "AI courses", "neural network classes", "deep learning programs"

Semantic search with embeddings:
- Query: "machine learning courses"
- Matches: All semantically similar content (AI, neural networks, deep learning, etc.)
- Works across synonyms, related concepts, and different phrasings

Let's see this in action:

In [3]:
from langchain_openai import OpenAIEmbeddings

# Initialize embedding model
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Generate embeddings for similar and different texts
texts = [
    "Introduction to machine learning and neural networks",
    "Learn about AI and deep learning fundamentals", 
    "Database administration and SQL queries",
]

# Get embeddings (this calls OpenAI API)
vectors = embeddings.embed_documents(texts)

print(f"✅ Generated embeddings for {len(texts)} texts")
print(f"   Vector dimensions: {len(vectors[0])}")
print(f"   First vector preview: [{vectors[0][0]:.3f}, {vectors[0][1]:.3f}, {vectors[0][2]:.3f}, ...]")

✅ Generated embeddings for 3 texts
   Vector dimensions: 1536
   First vector preview: [-0.030, -0.013, 0.001, ...]


### Measuring Semantic Similarity

Let's calculate cosine similarity to see which texts are semantically related:

In [4]:
import numpy as np

def cosine_similarity(vec1, vec2):
    """Calculate cosine similarity between two vectors."""
    return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))

# Compare similarities
sim_1_2 = cosine_similarity(vectors[0], vectors[1])  # ML vs AI (related)
sim_1_3 = cosine_similarity(vectors[0], vectors[2])  # ML vs Database (unrelated)
sim_2_3 = cosine_similarity(vectors[1], vectors[2])  # AI vs Database (unrelated)

print("Semantic Similarity Scores (0=unrelated, 1=identical):")
print(f"   ML vs AI:       {sim_1_2:.3f} ← High similarity (related topics)")
print(f"   ML vs Database: {sim_1_3:.3f} ← Low similarity (different topics)")
print(f"   AI vs Database: {sim_2_3:.3f} ← Low similarity (different topics)")

Semantic Similarity Scores (0=unrelated, 1=identical):
   ML vs AI:       0.623 ← High similarity (related topics)
   ML vs Database: 0.171 ← Low similarity (different topics)
   AI vs Database: 0.177 ← Low similarity (different topics)


**💡 Key Takeaway:** Embeddings capture semantic meaning, allowing us to find relevant information even when exact keywords don't match.

---

## 📚 Step 2: Generate Course Data

Now let's create realistic course data for our RAG system. We'll use the existing utilities from the reference agent.

### Understanding the Course Generation Script

The `generate_courses.py` script creates realistic course data with:
- Multiple majors (CS, Data Science, Math, Business, Psychology)
- Course templates with descriptions, prerequisites, schedules
- Realistic metadata (instructors, enrollment, difficulty levels)

Let's generate our course catalog:

In [5]:
# IGNORE: Add reference-agent to Python path because I installed reference-agent with pip
# IGNORE: sys.path.insert(0, os.path.join(os.getcwd(), 'python-recipes/context-engineering/reference-agent'))

from redis_context_course.scripts.generate_courses import CourseGenerator

# Initialize generator with a seed for reproducibility
import random
random.seed(42)

# Create generator
generator = CourseGenerator()

print("📚 Generating course catalog...")
print()

# Generate majors
majors = generator.generate_majors()
print(f"✅ Generated {len(majors)} majors:")
for major in majors:
    print(f"   - {major.name} ({major.code})")

print()

# Generate courses (10 per major)
courses = generator.generate_courses(courses_per_major=10)
print(f"✅ Generated {len(courses)} courses")

# Show a sample course
sample_course = courses[0]
print(f"""
Sample Course:
   Code: {sample_course.course_code}
   Title: {sample_course.title}
   Department: {sample_course.department}
   Difficulty: {sample_course.difficulty_level.value}
   Credits: {sample_course.credits}
   Description: {sample_course.description[:100]}...
""")

📚 Generating course catalog...

✅ Generated 5 majors:
   - Computer Science (CS)
   - Data Science (DS)
   - Mathematics (MATH)
   - Business Administration (BUS)
   - Psychology (PSY)

✅ Generated 50 courses

Sample Course:
   Code: CS001
   Title: Introduction to Programming
   Department: Computer Science
   Difficulty: beginner
   Credits: 3
   Description: Fundamental programming concepts using Python. Variables, control structures, functions, and basic d...



### Save Course Catalog to JSON

Let's save this data so we can ingest it into Redis:

In [6]:
catalog_file = "course_catalog_section2.json"
generator.save_to_json(catalog_file)

print(f"✅ Course catalog saved to {catalog_file}")
print(f"   Ready for ingestion into Redis vector store")

Generated 5 majors and 50 courses
Data saved to course_catalog_section2.json
✅ Course catalog saved to course_catalog_section2.json
   Ready for ingestion into Redis vector store


---

## 🔧 Step 3: Set Up Redis Vector Store

Now we'll configure Redis to store our course embeddings and enable semantic search.

### Understanding Redis Vector Search

Redis Stack provides vector similarity search capabilities:
- **Storage:** Courses stored as Redis hashes with vector fields
- **Indexing:** Vector index for fast similarity search (HNSW algorithm)
- **Search:** Find top-k most similar courses to a query vector using cosine similarity

### Using the Reference Agent Utilities

Instead of configuring Redis from scratch, we'll use the **production-ready utilities** from the reference agent. These utilities are already configured and tested, allowing you to focus on context engineering concepts rather than Redis configuration details.

### Import Redis Configuration

Let's import the pre-configured Redis setup:

What we're importing:
 - redis_config: A global singleton that manages all Redis connections

What it provides (lazy-initialized properties):
 - redis_config.redis_client: Redis connection for data storage
 - redis_config.embeddings: OpenAI embeddings (text-embedding-3-small)
 - redis_config.vector_index: RedisVL SearchIndex with pre-configured schema
 - redis_config.checkpointer: RedisSaver for LangGraph (used in Section 3)

Why use this:
 - Production-ready configuration (same as reference agent)
 - Proper schema with all course metadata fields
 - Vector field: 1536 dims, cosine distance, HNSW algorithm
 - No boilerplate - just import and use

In [7]:
from redis_context_course.redis_config import redis_config

print("✅ Redis configuration imported")
print(f"   Redis URL: {redis_config.redis_url}")
print(f"   Vector index name: {redis_config.vector_index_name}")

✅ Redis configuration imported
   Redis URL: redis://localhost:6379
   Vector index name: course_catalog


### Test Redis Connection

Let's verify Redis is running and accessible:

In [8]:
# Test connection using built-in health check
if redis_config.health_check():
    print("✅ Connected to Redis")
    print(f"   Redis is healthy and ready")
else:
    print("❌ Redis connection failed")
    print("   Make sure Redis is running:")
    print("   - Local: docker run -d -p 6379:6379 redis/redis-stack-server:latest")
    print("   - Cloud: https://redis.com/try-free/")
    sys.exit(1)

✅ Connected to Redis
   Redis is healthy and ready


### Initialize Course Manager

Now let's import the `CourseManager` - this handles all course operations, such as storage, retrieval, and search:

What it provides:
 - store_course(): Store a course with vector embedding
 - search_courses(): Semantic search with filters
 - get_course(): Retrieve course by ID
 - get_course_by_code(): Retrieve course by course code
 - recommend_courses(): Generate personalized recommendations

How it works:
 - Uses redis_config for connections (redis_client, vector_index, embeddings)
 - Automatically generates embeddings from course content
 - Uses RedisVL's VectorQuery for semantic search
 - Supports metadata filters (department, difficulty, format, etc.)

Why use this:
 - Encapsulates all Redis/RedisVL complexity
 - Same code used in reference agent (Sections 3 & 4)
 - Focus on RAG concepts, not Redis implementation details

In [9]:
from redis_context_course.course_manager import CourseManager

# Initialize course manager
course_manager = CourseManager()

print("✅ Course manager initialized")
print(f"   Ready for course storage and search")
print(f"   Using RedisVL for vector operations")

✅ Course manager initialized
   Ready for course storage and search
   Using RedisVL for vector operations


---

## 📥 Step 4: Ingest Courses into Redis

Now we'll load our course catalog into Redis with vector embeddings for semantic search.

### Understanding the Ingestion Process

The ingestion pipeline:
1. **Load** course data from JSON
2. **Generate embeddings** for each course (title + description + tags)
3. **Store** in Redis with metadata for filtering
4. **Index** vectors for fast similarity search

Let's use the existing ingestion utilities:

In [10]:
from redis_context_course.scripts.ingest_courses import CourseIngestionPipeline
import asyncio

# What we're importing:
# - CourseIngestionPipeline: Handles bulk ingestion of course data
#
# What it does:
# - Loads course catalog from JSON file
# - For each course: generates embedding + stores in Redis
# - Uses CourseManager internally for storage
# - Provides progress tracking and verification
#
# Why use this:
# - Handles batch ingestion efficiently
# - Same utility used to populate reference agent
# - Includes error handling and progress reporting

# Initialize ingestion pipeline
pipeline = CourseIngestionPipeline()

print("🚀 Starting course ingestion...")
print()

# Run ingestion (clear existing data first)
success = await pipeline.run_ingestion(
    catalog_file=catalog_file,
    clear_existing=True
)

if success:
    print()
    print("✅ Course ingestion complete!")

    # Verify what was ingested
    verification = pipeline.verify_ingestion()
    print(f"   Courses in Redis: {verification['courses']}")
    print(f"   Majors in Redis: {verification['majors']}")
else:
    print("❌ Ingestion failed")

🚀 Starting course ingestion...



Output()

Output()

00:33:51 httpx INFO   HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
00:33:52 httpx INFO   HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
00:33:52 httpx INFO   HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
00:33:53 httpx INFO   HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
00:33:54 httpx INFO   HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
00:33:54 httpx INFO   HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
00:33:54 httpx INFO   HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
00:33:55 httpx INFO   HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
00:33:55 httpx INFO   HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
00:33:55 httpx INFO   HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
00:33:55 httpx INFO   HTTP Request: POST


✅ Course ingestion complete!
   Courses in Redis: 50
   Majors in Redis: 5


### What Just Happened?

For each course, the ingestion pipeline:

1. **Created searchable content:**
   ```python
   content = f"{course.title} {course.description} {course.department} {' '.join(course.tags)}"
   ```

2. **Generated embedding vector:**
   ```python
   embedding = await embeddings.aembed_query(content)  # 1536-dim vector
   ```

3. **Stored in Redis:**
   ```python
   redis_client.hset(f"course_idx:{course.id}", mapping={
       "course_code": "CS001",
       "title": "Introduction to Programming",
       "description": "...",
       "content_vector": embedding.tobytes()  # Binary vector
   })
   ```

4. **Indexed for search:**
   - Redis automatically indexes the vector field
   - Enables fast k-NN (k-nearest neighbors) search

---

## 🔍 Step 5: Semantic Search - Finding Relevant Courses

Now comes the magic: semantic search. Let's query our vector store to find relevant courses.

### Basic Semantic Search

Let's search for courses related to "machine learning".

When this is called:
```python
await course_manager.search_courses(
    query=query,
    limit=3  # top_k parameter
)
```
It is performing semantic search under the hood:
1. Generates embedding for the query using OpenAI
2. Performs vector similarity search in Redis (cosine distance)
3. Returns top-k most similar courses
4. Uses RedisVL's VectorQuery under the hood

In [14]:
# We already initialized course_manager in Step 3
# It's ready to use for semantic search

# Search for machine learning courses
query = "machine learning and artificial intelligence"
print(f"🔍 Searching for: '{query}'\n")

# Perform semantic search (returns top 3 most similar courses)
results = await course_manager.search_courses(
    query=query,
    limit=3  # top_k parameter
)

print(f"✅ Found {len(results)} relevant courses:\n")

for i, course in enumerate(results, 1):
    print(f"{i}. {course.course_code}: {course.title}")
    print(f"   Department: {course.department}")
    print(f"   Difficulty: {course.difficulty_level.value}")
    print(f"   Description: {course.description[:100]}...")
    print()

🔍 Searching for: 'machine learning and artificial intelligence'

00:35:39 httpx INFO   HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
✅ Found 3 relevant courses:

1. CS007: Machine Learning
   Department: Computer Science
   Difficulty: advanced
   Description: Introduction to machine learning algorithms and applications. Supervised and unsupervised learning, ...

2. DS012: Statistics for Data Science
   Department: Data Science
   Difficulty: intermediate
   Description: Statistical methods and probability theory for data analysis. Hypothesis testing, regression, and st...

3. DS015: Statistics for Data Science
   Department: Data Science
   Difficulty: intermediate
   Description: Statistical methods and probability theory for data analysis. Hypothesis testing, regression, and st...



### Search with Filters

We can combine semantic search with metadata filters for more precise results:

How filters work:

```python
results = await course_manager.search_courses(
    query=query,
    limit=3,
    filters=filters
)
```
 - CourseManager._build_filters() converts dict to RedisVL filter expressions
 - Uses Tag filters for categorical fields (difficulty_level, format, department)
 - Uses Num filters for numeric fields (credits, year)
 - Combines filters with AND logic
 - Applied to vector search results


In [15]:
# Search for beginner-level machine learning courses
query = "machine learning"
filters = {
    "difficulty_level": "beginner",
    "format": "online"
}

print(f"🔍 Searching for: '{query}'\n   Filters: {filters}\n")
# How filters work:
# - CourseManager._build_filters() converts dict to RedisVL filter expressions
# - Uses Tag filters for categorical fields (difficulty_level, format, department)
# - Uses Num filters for numeric fields (credits, year)
# - Combines filters with AND logic
# - Applied to vector search results
results = await course_manager.search_courses(
    query=query,
    limit=3,
    filters=filters
)

print(f"✅ Found {len(results)} matching courses:")
for i, course in enumerate(results, 1):
    print(f"{i}. {course.course_code}: {course.title}")
    print(f"   Format: {course.format.value}, Difficulty: {course.difficulty_level.value}")
    print()

🔍 Searching for: 'machine learning'
   Filters: {'difficulty_level': 'beginner', 'format': 'online'}

00:39:02 httpx INFO   HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
✅ Found 3 matching courses:
1. DS020: Data Visualization
   Format: online, Difficulty: beginner

2. PSY043: Introduction to Psychology
   Format: online, Difficulty: beginner

3. PSY049: Introduction to Psychology
   Format: online, Difficulty: beginner



**💡 Key Insight:** We can combine:
- **Semantic search** (find courses about "machine learning")
- **Metadata filters** (only beginner, online courses)

This gives us precise, relevant results for any query. This will be a useful tool to build context for our RAG pipeline.

---

## 🔗 Step 6: Building the RAG Pipeline

Now let's combine everything into a complete RAG pipeline: Retrieval → Context Assembly → Generation.

### The RAG Flow

```
User Query
    ↓
1. Semantic Search (retrieve relevant courses)
    ↓
2. Context Assembly (combine system + user + retrieved context)
    ↓
3. LLM Generation (create personalized response)
```

Let's implement each step:

In [16]:
from langchain_openai import ChatOpenAI
from langchain_core.messages import SystemMessage, HumanMessage

# Initialize LLM
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.7)

print("✅ LLM initialized (gpt-4o-mini)")

✅ LLM initialized (gpt-4o-mini)


### Step 6.1: Retrieval Function

First, let's create a function to retrieve relevant courses:

In [17]:
async def retrieve_courses(query: str, top_k: int = 3, filters: dict = None):
    """
    Retrieve relevant courses using semantic search.

    Args:
        query: User's search query
        top_k: Number of courses to retrieve
        filters: Optional metadata filters

    Returns:
        List of relevant courses
    """
    # Note: CourseManager.search_courses() uses 'limit' parameter, not 'top_k'
    results = await course_manager.search_courses(
        query=query,
        limit=top_k,
        filters=filters
    )
    return results

# Test retrieval
test_query = "I want to learn about data structures"
retrieved_courses = await retrieve_courses(test_query, top_k=3)

print(f"🔍 Retrieved {len(retrieved_courses)} courses for: '{test_query}'")
for course in retrieved_courses:
    print(f"   - {course.course_code}: {course.title}")

00:40:03 httpx INFO   HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
🔍 Retrieved 3 courses for: 'I want to learn about data structures'
   - CS009: Data Structures and Algorithms
   - CS001: Introduction to Programming
   - CS005: Introduction to Programming


### Step 6.2: Context Assembly Function

Now let's assemble context from multiple sources (system + user + retrieved):

In [18]:
def assemble_context(
    user_query: str,
    retrieved_courses: list,
    user_profile: dict = None
):
    """
    Assemble context from multiple sources for the LLM.

    This implements the context engineering principles from Section 1:
    - System Context: AI role and instructions
    - User Context: Student profile and preferences
    - Retrieved Context: Relevant courses from vector search
    """

    # System Context: Define the AI's role
    system_context = """You are a Redis University course advisor.

Your role:
- Help students find courses that match their interests and goals
- Provide personalized recommendations based on student profiles
- Explain course prerequisites and learning paths
- Be encouraging and supportive

Guidelines:
- Only recommend courses from the provided course list
- Consider student's difficulty level preferences
- Explain your reasoning for recommendations
- Be concise but informative
"""

    # User Context: Student profile (if provided)
    user_context = ""
    if user_profile:
        user_context = f"""
Student Profile:
- Name: {user_profile.get('name', 'Student')}
- Major: {user_profile.get('major', 'Undeclared')}
- Year: {user_profile.get('year', 'N/A')}
- Interests: {', '.join(user_profile.get('interests', []))}
- Preferred Difficulty: {user_profile.get('preferred_difficulty', 'any')}
- Preferred Format: {user_profile.get('preferred_format', 'any')}
"""

    # Retrieved Context: Relevant courses from semantic search
    retrieved_context = "\nRelevant Courses:\n"
    for i, course in enumerate(retrieved_courses, 1):
        retrieved_context += f"""
{i}. {course.course_code}: {course.title}
   Department: {course.department}
   Difficulty: {course.difficulty_level.value}
   Format: {course.format.value}
   Credits: {course.credits}
   Description: {course.description}
   Prerequisites: {len(course.prerequisites)} required
"""

    # Combine all context
    full_context = system_context
    if user_context:
        full_context += user_context
    full_context += retrieved_context

    return full_context

# Test context assembly
test_profile = {
    "name": "Sarah Chen",
    "major": "Computer Science",
    "year": "Junior",
    "interests": ["machine learning", "data science"],
    "preferred_difficulty": "intermediate",
    "preferred_format": "online"
}

assembled_context = assemble_context(
    user_query=test_query,
    retrieved_courses=retrieved_courses,
    user_profile=test_profile
)

print("✅ Context assembled")
print(f"   Total length: {len(assembled_context)} characters")
print(f"   Includes: System + User + Retrieved context")

✅ Context assembled
   Total length: 1537 characters
   Includes: System + User + Retrieved context


In [20]:
print(f"Observe the assembled context: \n\n{assembled_context}")

Observe the assembled context: 

You are a Redis University course advisor.

Your role:
- Help students find courses that match their interests and goals
- Provide personalized recommendations based on student profiles
- Explain course prerequisites and learning paths
- Be encouraging and supportive

Guidelines:
- Only recommend courses from the provided course list
- Consider student's difficulty level preferences
- Explain your reasoning for recommendations
- Be concise but informative

Student Profile:
- Name: Sarah Chen
- Major: Computer Science
- Year: Junior
- Interests: machine learning, data science
- Preferred Difficulty: intermediate
- Preferred Format: online

Relevant Courses:

1. CS009: Data Structures and Algorithms
   Department: Computer Science
   Difficulty: intermediate
   Format: in_person
   Credits: 4
   Description: Study of fundamental data structures and algorithms. Arrays, linked lists, trees, graphs, sorting, and searching.
   Prerequisites: 2 required

2. CS

**🎁 Bonus:** Can you identify the different parts of the context from what we learned in section 1 from above?

**✅ Answer:** Yes! Looking at the assembled context above, we can identify all three context types from Section 1:

1. **System Context** (Static)
   - The first section: "You are a Redis University course advisor..."
   - Defines the AI's role, responsibilities, and guidelines
   - Remains the same for all queries
   - Sets behavioral instructions and constraints

2. **User Context** (Dynamic, User-Specific)
   - The "Student Profile" section
   - Contains Sarah Chen's personal information: major, year, interests, preferences
   - Changes based on who is asking the question
   - Enables personalized recommendations

3. **Retrieved Context** (Dynamic, Query-Specific)
   - The "Relevant Courses" section
   - Lists the 3 courses found via semantic search for "data structures"
   - Changes based on the specific query
   - Provides the factual information the LLM needs to answer

Notice how all three work together: System Context tells the AI **how to behave**, User Context tells it **who it's helping**, and Retrieved Context provides **what information is relevant**. This is RAG in action!

### Step 6.3: Generation Function

Finally, let's generate a response using the assembled context:

In [None]:
async def generate_response(user_query: str, context: str):
    """
    Generate LLM response using assembled context.

    Args:
        user_query: User's question
        context: Assembled context (system + user + retrieved)

    Returns:
        LLM response string
    """
    messages = [
        SystemMessage(content=context),
        HumanMessage(content=user_query)
    ]

    response = await llm.ainvoke(messages)
    return response.content

# Test generation
response = await generate_response(test_query, assembled_context)

print("\n🤖 Generated Response:\n")
print(response)

### 🎯 Understanding the Generated Response

Notice how the LLM's response demonstrates effective context engineering:

**👤 Personalization from User Context:**
- Addresses Sarah by name
- References her intermediate difficulty preference
- Acknowledges her online format preference (even though the course is in-person)
- Connects to her interests (machine learning and data science)

**📚 Accuracy from Retrieved Context:**
- Recommends CS009 (which was in the retrieved courses)
- Provides correct course details (difficulty, format, credits, description)
- Mentions prerequisites accurately (2 required)

**🤖 Guidance from System Context:**
- Acts as a supportive advisor ("I'm here to help you succeed!")
- Explains reasoning for the recommendation
- Acknowledges the format mismatch honestly
- Stays within the provided course list

This is the power of RAG: the LLM generates a response that is **personalized** (User Context), **accurate** (Retrieved Context), and **helpful** (System Context). Without RAG, the LLM would either hallucinate course details or provide generic advice.

---

## ✨ Step 7: Complete RAG Function

Let's combine all three steps into a single, reusable RAG function:

In [None]:
async def rag_query(
    user_query: str,
    user_profile: dict = None,
    top_k: int = 3,
    filters: dict = None
):
    """
    Complete RAG pipeline: Retrieve → Assemble → Generate

    Args:
        user_query: User's question
        user_profile: Optional student profile
        top_k: Number of courses to retrieve
        filters: Optional metadata filters

    Returns:
        LLM response string
    """
    # Step 1: Retrieve relevant courses
    retrieved_courses = await retrieve_courses(user_query, top_k, filters)

    # Step 2: Assemble context
    context = assemble_context(user_query, retrieved_courses, user_profile)

    # Step 3: Generate response
    response = await generate_response(user_query, context)

    return response, retrieved_courses

# Test the complete RAG pipeline
print("=" * 60)
print("COMPLETE RAG PIPELINE TEST")
print("=" * 60)
print()

query = "I'm interested in learning about databases and data management"
profile = {
    "name": "Alex Johnson",
    "major": "Data Science",
    "year": "Sophomore",
    "interests": ["databases", "data analysis", "SQL"],
    "preferred_difficulty": "intermediate",
    "preferred_format": "hybrid"
}

print(f"Query: {query}")
print()
print(f"Student: {profile['name']} ({profile['major']}, {profile['year']})")
print()

response, courses = await rag_query(query, profile, top_k=3)

print("Retrieved Courses:")
for i, course in enumerate(courses, 1):
    print(f"   {i}. {course.course_code}: {course.title}")
print()

print("AI Response:")
print(response)

### 🎯 Why This Complete RAG Function Matters

The `rag_query()` function encapsulates the entire RAG pipeline in a single, reusable interface. This is important because:

**1. Simplicity:** One function call handles retrieval → assembly → generation
- No need to manually orchestrate the three steps
- Clean API for building applications

**2. Consistency:** Every query follows the same pattern
- Ensures all three context types are always included
- Reduces errors from missing context

**3. Flexibility:** Easy to customize behavior
- Adjust `top_k` for more/fewer retrieved courses
- Add/remove user profile information
- Modify filters for specific use cases

**4. Production-Ready:** This pattern scales to real applications
- In Section 3, we'll add memory (conversation history)
- In Section 4, we'll add tools (course enrollment, prerequisites checking)
- The core RAG pattern remains the same

This is the foundation you'll build on throughout the rest of the course.

---

## 🧪 Step 8: Try Different Queries

Let's test our RAG system with various queries to see how it handles different scenarios:

In [25]:
# Test 1: Beginner looking for programming courses
print("=" * 60)
print("TEST 1: Beginner Programming")
print("=" * 60)
print()

query1 = "I'm new to programming and want to start learning"
profile1 = {
    "name": "Maria Garcia",
    "major": "Undeclared",
    "year": "Freshman",
    "interests": ["programming", "technology"],
    "preferred_difficulty": "beginner",
    "preferred_format": "online"
}

response1, courses1 = await rag_query(query1, profile1, top_k=2)
print(f"\nQuery: {query1}\n")
print("\nAI Response:\n")
print(response1)

TEST 2: Advanced Machine Learning

00:46:06 httpx INFO   HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
00:46:13 httpx INFO   HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"

Query: I want advanced courses in machine learning and AI


AI Response:

Hi David! Based on your major in Computer Science and your interests in machine learning and AI, I recommend the following course:

**CS007: Machine Learning**
- **Difficulty:** Advanced
- **Format:** Hybrid (though not in-person, it involves some in-person elements)
- **Credits:** 4
- **Description:** This course covers machine learning algorithms and applications, including supervised and unsupervised learning as well as neural networks. 

While it would be ideal to have an exclusively in-person format, CS007 is the only advanced course listed that aligns with your interests and goals in machine learning. The hybrid format may still offer valuable in-person interaction.

Unfortunat

In [26]:
# Test 2: Advanced student looking for specialized courses
print("=" * 60)
print("TEST 2: Advanced Machine Learning")
print("=" * 60)
print()

query2 = "I want advanced courses in machine learning and AI"
profile2 = {
    "name": "David Kim",
    "major": "Computer Science",
    "year": "Senior",
    "interests": ["machine learning", "AI", "research"],
    "preferred_difficulty": "advanced",
    "preferred_format": "in-person"
}

response2, courses2 = await rag_query(query2, profile2, top_k=2)
print(f"\nQuery: {query2}\n")
print("\nAI Response:\n")
print(response2)

TEST 3: Business Analytics

00:46:14 httpx INFO   HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
00:46:17 httpx INFO   HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"

Query: What courses can help me with business analytics and decision making?



AI Response:

Hi Jennifer! Given your interests in analytics and strategy, I recommend looking into the following course:

**BUS033: Marketing Strategy**
- **Department:** Business
- **Difficulty:** Intermediate
- **Format:** Hybrid
- **Credits:** 3
- **Description:** This course covers strategic marketing planning, market analysis, consumer behavior, and digital marketing techniques. 

This course aligns well with your major in Business Administration and your interest in analytics and strategy. It will provide you with valuable insights into decision-making processes in marketing, which is crucial for any business professional.

Since you prefer a hybrid format, BUS033 is a great fi

In [None]:
# Test 3: Business student looking for relevant courses
print("=" * 60)
print("TEST 3: Business Analytics")
print("=" * 60)
print()

query3 = "What courses can help me with business analytics and decision making?"
profile3 = {
    "name": "Jennifer Lee",
    "major": "Business Administration",
    "year": "Junior",
    "interests": ["analytics", "management", "strategy"],
    "preferred_difficulty": "intermediate",
    "preferred_format": "hybrid"
}

response3, courses3 = await rag_query(query3, profile3, top_k=2)
print(f"\nQuery: {query3}\n")
print()
print("\nAI Response:\n")
print(response3)

---

## 🎓 Key Takeaways

### What You've Learned

**1. RAG Fundamentals**
- RAG dynamically retrieves relevant information instead of hardcoding knowledge
- Vector embeddings enable semantic search (meaning-based, not keyword-based)
- RAG solves the scalability and token efficiency problems of static context

**2. The RAG Pipeline**
```
User Query → Semantic Search → Context Assembly → LLM Generation
```
- **Retrieval:** Find relevant documents using vector similarity
- **Assembly:** Combine system + user + retrieved context
- **Generation:** LLM creates personalized response with full context

**3. Context Engineering in Practice**
- **System Context:** AI role and instructions (static)
- **User Context:** Student profile and preferences (dynamic, user-specific)
- **Retrieved Context:** Relevant courses from vector search (dynamic, query-specific)
- **Integration:** All three context types work together

**4. Technical Implementation with Reference Agent Utilities**
- **redis_config**: Production-ready Redis configuration (RedisVL + LangChain)
  - Manages connections, embeddings, vector index, checkpointer
  - Same configuration used in reference agent
- **CourseManager**: Handles all course operations
  - Uses RedisVL's VectorQuery for semantic search
  - Supports metadata filters with Tag and Num classes
  - Automatically generates embeddings and stores courses
- **CourseIngestionPipeline**: Bulk data ingestion
  - Loads JSON, generates embeddings, stores in Redis
  - Progress tracking and verification
- **Benefits**: Focus on RAG concepts, not Redis implementation details

### Best Practices

**Retrieval:**
- Retrieve only what's needed (top-k results)
- Use metadata filters to narrow results
- Balance between too few (missing info) and too many (wasting tokens) results

**Context Assembly:**
- Structure context clearly (system → user → retrieved)
- Include only relevant metadata
- Keep descriptions concise but informative

**Generation:**
- Use appropriate temperature (0.7 for creative, 0.0 for factual)
- Provide clear instructions in system context
- Let the LLM explain its reasoning

---

## 🚀 What's Next?

### 🧠 Section 3: Memory Architecture

In this section, you built a RAG system that retrieves relevant information for each query. But there's a problem: **it doesn't remember previous conversations**.

In Section 3, you'll add memory to your RAG system:
- **Working Memory:** Track conversation history within a session
- **Long-term Memory:** Remember user preferences across sessions
- **LangGraph Integration:** Manage stateful workflows with checkpointing
- **Redis Agent Memory Server:** Automatic memory extraction and retrieval

### Section 4: Tool Use and Agents

After adding memory, you'll transform your RAG system into a full agent:
- **Tool Calling:** Let the AI use functions (search, enroll, check prerequisites)
- **LangGraph State Management:** Orchestrate complex multi-step workflows
- **Agent Reasoning:** Plan and execute multi-step tasks
- **Production Patterns:** Error handling, retries, and monitoring

### The Journey

```
Section 1: Context Engineering Fundamentals
    ↓
Section 2: RAG (Retrieved Context) ← You are here
    ↓
Section 3: Memory Architecture (Conversation Context)
    ↓
Section 4: Tool Use and Agents (Complete System)
```

---

## 💪 Practice Exercises

Try these exercises to deepen your understanding:

**Exercise 1: Custom Filters**
- Modify the RAG query to filter by specific departments
- Try combining multiple filters (difficulty + format + department)

**Exercise 2: Adjust Retrieval**
- Experiment with different `top_k` values (1, 3, 5, 10)
- Observe how response quality changes with more/fewer retrieved courses

**Exercise 3: Context Optimization**
- Modify the `assemble_context` function to include more/less detail
- Measure token usage and response quality trade-offs

**Exercise 4: Different Domains**
- Generate courses for a different domain (e.g., healthcare, finance)
- Ingest and test RAG with your custom data

**Exercise 5: Evaluation**
- Create test queries with expected results
- Measure retrieval accuracy (are the right courses retrieved?)
- Measure generation quality (are responses helpful and accurate?)

---

## 📝 Summary

You've built a complete RAG system that:
- ✅ Generates and ingests course data with vector embeddings
- ✅ Performs semantic search to find relevant courses
- ✅ Assembles context from multiple sources (system + user + retrieved)
- ✅ Generates personalized responses using LLMs
- ✅ Handles different query types and user profiles

This RAG system is the foundation for the advanced topics in Sections 3 and 4. You'll build on this exact code to add memory, tools, and full agent capabilities.

**Great work!** You've mastered Retrieved Context and built a production-ready RAG pipeline. 🎉
