![Redis](https://redis.io/wp-content/uploads/2024/04/Logotype.svg?auto=webp&quality=85,75&width=120)

# Engineering Context for Production

## From RAG Basics to Production-Ready Context Engineering

In the previous notebook, you built a working RAG system and saw why context quality matters. Now you'll learn to engineer context with production-level rigor.

**What makes context "good"?**

This notebook teaches you that **context engineering is real engineering** - it requires the same rigor, analysis, and deliberate decision-making as any other engineering discipline. Context isn't just "data you feed to an LLM" - it requires thoughtful preparation, quality assessment, and optimization.

## What You'll Learn

**The Engineering Mindset:**
- Why context quality matters (concrete impact on accuracy, relevance, cost)
- The transformation workflow: Raw Data ‚Üí Engineered Context ‚Üí Quality Responses
- Contrasts between naive and engineered approaches

**Data Engineering for Context:**
- Systematic transformation: Extract ‚Üí Clean ‚Üí Transform ‚Üí Optimize ‚Üí Store
- Engineering decisions based on YOUR domain requirements
- When to use different approaches (RAG, Structured Views, Hybrid)

**Introduction to Chunking:**
- When does your data need chunking? (Critical first question)
- Different chunking strategies and their trade-offs
- How to choose based on YOUR data characteristics

**Production Pipelines:**
- Three pipeline architectures (Request-Time, Batch, Event-Driven)
- How to choose based on YOUR constraints
- Building production-ready context preparation workflows

**Time to complete:** 90-105 minutes

---

## Prerequisites

- Completed Section 2, Notebook 1 (RAG Fundamentals and Implementation)
- Redis 8 running locally
- OpenAI API key set
- Understanding of RAG basics and vector embeddings

---

## Part 1: Context is Data - and Data Requires Engineering

### The Naive Approach (What NOT to Do)

Let's start by seeing what happens when you treat context as "just data" without engineering discipline.

**Scenario:** A student asks "What machine learning courses are available?"

Let's see what happens with a naive approach:

### Setup

In [1]:
import os
import sys

from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Verify required environment variables
required_vars = ["OPENAI_API_KEY"]
missing_vars = [var for var in required_vars if not os.getenv(var)]

if missing_vars:
    print(
        f"""‚ö†Ô∏è  Missing required environment variables: {', '.join(missing_vars)}

Please create a .env file with:
OPENAI_API_KEY=your_openai_api_key
REDIS_URL=redis://localhost:6379
"""
    )
    sys.exit(1)

REDIS_URL = os.getenv("REDIS_URL", "redis://localhost:6379")
print("‚úÖ Environment variables loaded")

‚úÖ Environment variables loaded


In [2]:
import asyncio

# Import dependencies
import json
from typing import Any, Dict, List

import redis
import tiktoken
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_openai import ChatOpenAI
from redis_context_course import CourseManager, redis_config

# Initialize
course_manager = CourseManager()
redis_client = redis.from_url(REDIS_URL, decode_responses=True)
llm = ChatOpenAI(model="gpt-4o", temperature=0)

# Token counter
encoding = tiktoken.encoding_for_model("gpt-4o")


def count_tokens(text: str) -> int:
    return len(encoding.encode(text))


print("‚úÖ Dependencies loaded")

21:56:33 redisvl.index.index INFO   Index already exists, not overwriting.


‚úÖ Dependencies loaded




### Naive Approach: Dump Everything

The simplest approach is to include all course data in every request:

In [3]:
# Naive Approach: Get all courses and dump as JSON
all_courses = await course_manager.get_all_courses()

# Convert to raw JSON (what many developers do first)
raw_context = json.dumps(
    [
        {
            "id": c.id,
            "course_code": c.course_code,
            "title": c.title,
            "description": c.description,
            "department": c.department,
            "credits": c.credits,
            "difficulty_level": c.difficulty_level.value,
            "format": c.format.value,
            "instructor": c.instructor,
            "prerequisites": (
                [p.course_code for p in c.prerequisites] if c.prerequisites else []
            ),
            "created_at": str(c.created_at) if hasattr(c, "created_at") else None,
            "updated_at": str(c.updated_at) if hasattr(c, "updated_at") else None,
        }
        for c in all_courses[:10]  # Just first 10 for demo
    ],
    indent=2,
)

token_count = count_tokens(raw_context)

print(
    f"""üìä Naive Approach Results:
   Courses included: {len(all_courses[:10])}
   Token count: {token_count:,}
   Estimated cost per request: ${(token_count / 1_000_000) * 2.50:.4f}

   For 100 courses, this would be ~{token_count * 10:,} tokens!
"""
)

# Show a sample
print("\nüìÑ Sample of raw JSON context:")
print(raw_context[:500] + "...")

21:56:34 httpx INFO   HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


üìä Naive Approach Results:
   Courses included: 10
   Token count: 1,681
   Estimated cost per request: $0.0042

   For 100 courses, this would be ~16,810 tokens!


üìÑ Sample of raw JSON context:
[
  {
    "id": "course_catalog:01K98Z0MEF04N2YSG94PQYJDY0",
    "course_code": "CS004",
    "title": "Database Systems",
    "description": "Design and implementation of database systems. SQL, normalization, transactions, and database administration.",
    "department": "Computer Science",
    "credits": 3,
    "difficulty_level": "intermediate",
    "format": "online",
    "instructor": "Nicholas Nelson",
    "prerequisites": [],
    "created_at": "2025-11-04 21:56:34.452731",
    "updated_at"...


In [4]:
[course.title for course in all_courses]

['Database Systems',
 'Web Development',
 'Web Development',
 'Web Development',
 'Web Development',
 'Linear Algebra',
 'Linear Algebra',
 'Linear Algebra',
 'Linear Algebra',
 'Linear Algebra',
 'Calculus I',
 'Calculus I',
 'Calculus I',
 'Calculus I',
 'Calculus I',
 'Marketing Strategy',
 'Marketing Strategy',
 'Marketing Strategy',
 'Marketing Strategy',
 'Marketing Strategy',
 'Marketing Strategy',
 'Marketing Strategy',
 'Cognitive Psychology',
 'Cognitive Psychology',
 'Cognitive Psychology',
 'Cognitive Psychology',
 'Cognitive Psychology',
 'Cognitive Psychology',
 'Cognitive Psychology',
 'Data Structures and Algorithms',
 'Principles of Management',
 'Principles of Management',
 'Principles of Management',
 'Introduction to Psychology',
 'Introduction to Psychology',
 'Introduction to Psychology',
 'Data Visualization',
 'Data Visualization',
 'Data Visualization',
 'Data Visualization',
 'Machine Learning',
 'Introduction to Programming',
 'Introduction to Programming',
 

### Test the Naive Approach

In [5]:
# Test with a real query
query = "What machine learning courses are available?"

messages = [
    SystemMessage(
        content=f"""You are a Redis University course advisor.

Available Courses:
{raw_context}

Help students find relevant courses."""
    ),
    HumanMessage(content=query),
]

response = llm.invoke(messages)

print(
    f"""ü§ñ Query: "{query}"

Response:
{response.content}
"""
)

21:56:35 httpx INFO   HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


ü§ñ Query: "What machine learning courses are available?"

Response:
Currently, there are no machine learning courses listed in the available course catalog. If you are interested in machine learning, you might consider taking foundational courses such as "Linear Algebra" or "Database Systems," as they provide essential knowledge that can be beneficial for understanding machine learning concepts.



### Problems with the Naive Approach

As discussed in previous notebooks, this approach has several problems:

1. **Excessive Token Usage**
   - 10 courses = ~1,703 tokens
   - 100 courses would be ~17,030 tokens


2. **Raw JSON is Inefficient**
   - Includes internal fields (IDs, timestamps, created_at, updated_at)
   - Verbose formatting (indentation, field names repeated)


3. **No Filtering**
   - Student asked about ML, but got all courses, even irrelevant ones
   - **Dilutes relevant information with noise**


4. **Poor Response Quality**
   - Generic responses ("We have many courses...")
   - May miss the most relevant courses
   - Can't provide personalized recommendations


5. **Not Scalable**
   - What if you have 1,000 courses? 10,000?
   - What if courses change daily?
   - Requires code changes to update

**Therefore, the goal is not only to give the LLM "all the data" - it's to *give it the useful data.***

---

## The Engineering Mindset

Context is data that flows through a pipeline. Like any data engineering problem, it requires:

### 1. Requirements Analysis
- What is the intended use case?
- What queries will users ask?
- What information do they need?
- What constraints exist (token budget, latency, cost)?

### 2. Data Transformation
- Raw data ‚Üí Cleaned data ‚Üí Structured data ‚Üí LLM-optimized context

### 3. Quality Metrics
- How do we measure if context is "good"?
- Relevance, completeness, efficiency, accuracy

### 4. Testing and Iteration
- Test with real queries
- Measure quality metrics
- Iterate based on results

**The Engineering Question:** "How do we transform raw course data into high-quality context that produces accurate, relevant, efficient responses?"

### Three Engineering Approaches

Let's compare three approaches with concrete examples:

| Approach | Description | Token Usage | Response Quality | Maintenance | Verdict |
|----------|-------------|-------------|------------------|-------------|---------|
| **Naive** | Include all raw data | 50K tokens | Poor (generic) | Easy | ‚ùå Not production-ready |
| **RAG** | Semantic search for relevant courses | 3K tokens | Good (relevant) | Moderate | ‚úÖ Good for most cases |
| **Structured Views** | Pre-compute LLM-optimized summaries | 2K tokens | Excellent (overview + details) | Higher | ‚úÖ Best for production |
| **Hybrid** | Structured view + RAG | 5K tokens | Excellent (best of both) | Higher | ‚úÖ Best for production |

Let's implement each approach and compare them.

---

## Part 2: Data Engineering Workflow - From Raw to Optimized

### The Data Engineering Pipeline

Context preparation follows a systematic workflow:

```
Raw Data (Database/API)
    ‚Üì
[Step 1: Extract] - Get the data
    ‚Üì
[Step 2: Clean] - Remove noise, fix inconsistencies
    ‚Üì
[Step 3: Transform] - Structure for LLM consumption
    ‚Üì
[Step 4: Optimize] - Reduce tokens, improve clarity
    ‚Üì
[Step 5: Store] - Vector DB, cache, or pre-compute
    ‚Üì
Engineered Context (Ready for LLM)
```

Let's walk through this pipeline with a real example.

### Step 1: Extract (Raw Data)

First, let's look at what raw course data looks like:

In [6]:
# Get a sample course
sample_course = all_courses[0]

# Show raw database record
raw_record = {
    "id": sample_course.id,
    "course_code": sample_course.course_code,
    "title": sample_course.title,
    "description": sample_course.description,
    "department": sample_course.department,
    "credits": sample_course.credits,
    "difficulty_level": sample_course.difficulty_level.value,
    "format": sample_course.format.value,
    "instructor": sample_course.instructor,
    "prerequisites": (
        [p.course_code for p in sample_course.prerequisites]
        if sample_course.prerequisites
        else []
    ),
    "created_at": (
        str(sample_course.created_at)
        if hasattr(sample_course, "created_at")
        else "2024-01-15T08:30:00Z"
    ),
    "updated_at": (
        str(sample_course.updated_at)
        if hasattr(sample_course, "updated_at")
        else "2024-09-01T14:22:00Z"
    ),
}

raw_json = json.dumps(raw_record, indent=2)
raw_tokens = count_tokens(raw_json)

print("üìÑ Step 1: Raw Database Record")
print("=" * 80)
print(raw_json)
print("=" * 80)
print(f"\nüìä Token count: {raw_tokens}")

üìÑ Step 1: Raw Database Record
{
  "id": "course_catalog:01K98Z0MEF04N2YSG94PQYJDY0",
  "course_code": "CS004",
  "title": "Database Systems",
  "description": "Design and implementation of database systems. SQL, normalization, transactions, and database administration.",
  "department": "Computer Science",
  "credits": 3,
  "difficulty_level": "intermediate",
  "format": "online",
  "instructor": "Nicholas Nelson",
  "prerequisites": [],
  "created_at": "2025-11-04 21:56:34.452731",
  "updated_at": "2025-11-04 21:56:34.452739"
}

üìä Token count: 160


Issues with above:
 - Internal fields (IDs, timestamps) waste tokens
 - Verbose JSON formatting
 - Prerequisites are codes, not human-readable
 - No structure for LLM consumption

### Step 2: Clean (Remove Noise)

Remove fields that don't help the LLM answer user queries:

In [7]:
# Step 2: Clean - Remove internal fields
cleaned_record = {
    "course_code": sample_course.course_code,
    "title": sample_course.title,
    "description": sample_course.description,
    "department": sample_course.department,
    "credits": sample_course.credits,
    "difficulty_level": sample_course.difficulty_level.value,
    "format": sample_course.format.value,
    "instructor": sample_course.instructor,
    "prerequisites": (
        [p.course_code for p in sample_course.prerequisites]
        if sample_course.prerequisites
        else []
    ),
}

cleaned_json = json.dumps(cleaned_record, indent=2)
cleaned_tokens = count_tokens(cleaned_json)

print("üìÑ Step 2: Cleaned Record")
print("=" * 80)
print(cleaned_json)
print("=" * 80)
print(
    f"\nüìä Token count: {cleaned_tokens} (saved {raw_tokens - cleaned_tokens} tokens, {((raw_tokens - cleaned_tokens) / raw_tokens * 100):.1f}% reduction)"
)

üìÑ Step 2: Cleaned Record
{
  "course_code": "CS004",
  "title": "Database Systems",
  "description": "Design and implementation of database systems. SQL, normalization, transactions, and database administration.",
  "department": "Computer Science",
  "credits": 3,
  "difficulty_level": "intermediate",
  "format": "online",
  "instructor": "Nicholas Nelson",
  "prerequisites": []
}

üìä Token count: 89 (saved 71 tokens, 44.4% reduction)



Improvements:
 - Removed id, created_at, updated_at
 - Still has all information needed to answer queries

Still has minor problems:
 - JSON formatting is verbose (this is a *minor* issue as LLMs can handle it; however)
 - Prerequisites are still codes


### Step 3: Transform (Structure for LLM)

Convert to a format optimized for LLM consumption:

In [8]:
# Step 3: Transform - Convert to LLM-friendly format


def transform_course_to_text(course) -> str:
    """Transform course object to LLM-optimized text format."""

    # Build prerequisites text
    prereq_text = ""
    if course.prerequisites:
        prereq_codes = [p.course_code for p in course.prerequisites]
        prereq_text = f"\nPrerequisites: {', '.join(prereq_codes)}"

    # Build course text
    course_text = f"""{course.course_code}: {course.title}
Department: {course.department}\nCredits: {course.credits}\nLevel: {course.difficulty_level.value}\nFormat: {course.format.value}
Instructor: {course.instructor}{prereq_text}
Description: {course.description}
    """

    return course_text


transformed_text = transform_course_to_text(sample_course)
transformed_tokens = count_tokens(transformed_text)

print("üìÑ Step 3: Transformed to LLM-Friendly Format")
print("=" * 80)
print(transformed_text)
print("=" * 80)
print(
    f"\nüìä Token count: {transformed_tokens} (saved {cleaned_tokens - transformed_tokens} tokens, {((cleaned_tokens - transformed_tokens) / cleaned_tokens * 100):.1f}% reduction)"
)

üìÑ Step 3: Transformed to LLM-Friendly Format
CS004: Database Systems
Department: Computer Science
Credits: 3
Level: intermediate
Format: online
Instructor: Nicholas Nelson
Description: Design and implementation of database systems. SQL, normalization, transactions, and database administration.
    

üìä Token count: 49 (saved 40 tokens, 44.9% reduction)



‚úÖ Improvements:
 - Natural text format with the correct metadata
 - Clear structure with labels
 - No JSON overhead (brackets, quotes, commas)

**Note:** In case the description is too long, we can apply compression techniques such as summarization to keep the description within a desired token limit. Section 3 will cover compression in more detail.


### Step 4: Optimize (Further Reduce Tokens)

For even more efficiency, we can create a summarized version:

In [9]:
# Step 4: Optimize - Create ultra-compact version
# TODO: Maybe use summarization here? Maybe for that we need a longer description or some other metadata?

def optimize_course_text(course) -> str:
    """Create ultra-compact course description."""
    prereqs = (
        f" (Prereq: {', '.join([p.course_code for p in course.prerequisites])})"
        if course.prerequisites
        else ""
    )
    return (
        f"{course.course_code}: {course.title} - {course.description[:100]}...{prereqs}"
    )


optimized_text = optimize_course_text(sample_course)
optimized_tokens = count_tokens(optimized_text)

print("üìÑ Step 4: Optimized (Ultra-Compact)")
print("=" * 80)
print(optimized_text)
print("=" * 80)
print(
    f"\nüìä Token count: {optimized_tokens} (saved {transformed_tokens - optimized_tokens} tokens, {((transformed_tokens - optimized_tokens) / transformed_tokens * 100):.1f}% reduction)"
)

üìÑ Step 4: Optimized (Ultra-Compact)
CS004: Database Systems - Design and implementation of database systems. SQL, normalization, transactions, and database admini...

üìä Token count: 24 (saved 25 tokens, 51.0% reduction)


Improvements:
   - Truncated description to 100 chars
   - Removed metadata (instructor, format, credits)

Trade-off:
   - Lost some detail (may need for specific queries)
   - Best for overview/catalog views

**Note:** This is just an example of what you can do to be more efficient. This is where you have to be creative and engineer based on the usercase and requirements.

### Step 5: Store (Choose Storage Strategy)

Now we need to decide HOW to store this engineered context:

**Option 1: Vector Database (RAG)**
- Store transformed text with embeddings
- Retrieve relevant courses at query time
- Good for: Large datasets, specific queries

**Option 2: Pre-Computed Views**
- Create structured summaries ahead of time
- Store in Redis as cached views
- Good for: Common queries, overview information

**Option 3: Hybrid**
- Combine both approaches
- Pre-compute catalog view + RAG for details
- Good for: Production systems

Let's implement all three and compare.

### Summary: The Transformation Pipeline

Let's see the complete transformation:

In [10]:
print("=" * 80)
print("EXAMPLE PIPELINE SUMMARY")
print("=" * 80)

print(
    f"""
Step 1: Raw Database Record
   Token count: {raw_tokens}
   Format: JSON with all fields

Step 2: Cleaned Record
   Token count: {cleaned_tokens} ({((raw_tokens - cleaned_tokens) / raw_tokens * 100):.1f}% reduction)
   Removed: Internal fields (IDs, timestamps)

Step 3: Transformed to LLM Format
   Token count: {transformed_tokens} ({((cleaned_tokens - transformed_tokens) / cleaned_tokens * 100):.1f}% reduction from Step 2)
   Format: Natural text, structured

Step 4: Optimized (Ultra-Compact)
   Token count: {optimized_tokens} ({((transformed_tokens - optimized_tokens) / transformed_tokens * 100):.1f}% reduction from Step 3)
   Format: Single line, truncated

TOTAL REDUCTION: {raw_tokens} ‚Üí {optimized_tokens} tokens ({((raw_tokens - optimized_tokens) / raw_tokens * 100):.1f}% reduction)
"""
)

print("=" * 80)
print("\nüéØ Key Insight:")
print("   Through systematic engineering, we reduced token usage by ~70%")
print("   while IMPROVING readability for the LLM!")
print("=" * 80)

EXAMPLE PIPELINE SUMMARY

Step 1: Raw Database Record
   Token count: 160
   Format: JSON with all fields

Step 2: Cleaned Record
   Token count: 89 (44.4% reduction)
   Removed: Internal fields (IDs, timestamps)

Step 3: Transformed to LLM Format
   Token count: 49 (44.9% reduction from Step 2)
   Format: Natural text, structured

Step 4: Optimized (Ultra-Compact)
   Token count: 24 (51.0% reduction from Step 3)
   Format: Single line, truncated

TOTAL REDUCTION: 160 ‚Üí 24 tokens (85.0% reduction)


üéØ Key Insight:
   Through systematic engineering, we reduced token usage by ~70%
   while IMPROVING readability for the LLM!


The key insight states that we reduced token usage.

However, it should be noted that reduction is not the goal. The goal is to optimize the content and provide the most relevant information to the LLM.

---

## Part 3: Engineering Decision - When to Use Each Approach

Now let's implement the three approaches and compare them with real queries.

### Approach 1: RAG (Semantic Search)

Retrieve only relevant courses using vector search:

In [11]:
from redisvl.index import SearchIndex
from redisvl.query import VectorQuery
from redisvl.query.filter import Tag

# Initialize vector search
index_name = "course_index"

# Check if index exists, create if not
try:
    index = SearchIndex.from_existing(index_name, redis_url=REDIS_URL)
    print(f"‚úÖ Using existing index: {index_name}")
except:
    print(
        f"‚ö†Ô∏è  Index '{index_name}' not found. Please run Section 2 notebooks to create it."
    )
    print("   For this demo, we'll simulate RAG results.")
    index = None

‚ö†Ô∏è  Index 'course_index' not found. Please run Section 2 notebooks to create it.
   For this demo, we'll simulate RAG results.


In [12]:
# Simulate RAG retrieval (in production, this would use vector search)


async def rag_approach(query: str, limit: int = 5) -> str:
    """Retrieve relevant courses using semantic search."""

    # In production: Use vector search
    # For demo: Filter courses by keyword matching
    query_lower = query.lower()

    relevant_courses = []
    for course in all_courses:
        # Simple keyword matching (in production, use embeddings)
        if any(
            keyword in course.title.lower() or keyword in course.description.lower()
            for keyword in ["machine learning", "ml", "ai", "data science", "neural"]
        ):
            relevant_courses.append(course)
            if len(relevant_courses) >= limit:
                break

    # Transform to LLM-friendly format
    context = "\n\n".join([transform_course_to_text(c) for c in relevant_courses])
    return context


# Test RAG approach
query = "What machine learning courses are available?"
rag_context = await rag_approach(query, limit=5)
rag_tokens = count_tokens(rag_context)

print(
    f"""üìä RAG Approach Results:
   Query: "{query}"
   Courses retrieved: 5
   Token count: {rag_tokens:,}

üìÑ Context Preview:
{rag_context[:500]}...
"""
)

üìä RAG Approach Results:
   Query: "What machine learning courses are available?"
   Courses retrieved: 5
   Token count: 268

üìÑ Context Preview:
CS010: Web Development
Department: Computer Science
Credits: 3
Level: intermediate
Format: in_person
Instructor: Kathy Blair
Description: Full-stack web development using modern frameworks. HTML, CSS, JavaScript, React, and backend APIs.
    

CS002: Web Development
Department: Computer Science
Credits: 3
Level: intermediate
Format: in_person
Instructor: Tamara Murray
Description: Full-stack web development using modern frameworks. HTML, CSS, JavaScript, React, and backend APIs.
    

CS003: Web...



### Approach 2: Structured Views (Pre-Computed Summaries)

Create a pre-computed catalog view that's optimized for LLM consumption:

In [13]:
# Approach 2: Structured Views
# Pre-compute a catalog summary organized by department


async def create_catalog_view() -> str:
    """Create a pre-computed catalog view organized by department."""

    # Group courses by department
    by_department = {}
    for course in all_courses:
        dept = course.department
        if dept not in by_department:
            by_department[dept] = []
        by_department[dept].append(course)

    # Build catalog view
    catalog_sections = []

    for dept_name in sorted(by_department.keys()):
        courses = by_department[dept_name]

        # Create department section
        dept_section = f"\n## {dept_name} ({len(courses)} courses)\n"

        # Add course summaries (optimized format)
        course_summaries = []
        for course in courses[:10]:  # Limit for demo
            summary = f"- {course.course_code}: {course.title} ({course.difficulty_level.value})"
            course_summaries.append(summary)

        dept_section += "\n".join(course_summaries)
        catalog_sections.append(dept_section)

    catalog_view = "# Redis University Course Catalog\n" + "\n".join(catalog_sections)
    return catalog_view


# Create and cache the view
catalog_view = await create_catalog_view()
catalog_tokens = count_tokens(catalog_view)

# Store in Redis for reuse
redis_client.set("course_catalog_view", catalog_view)

print(
    f"""üìä Structured View Approach Results:
   Total courses: {len(all_courses)}
   Token count: {catalog_tokens:,}
   Cached in Redis: ‚úÖ

üìÑ Catalog Preview:
{catalog_view[:600]}...
"""
)

üìä Structured View Approach Results:
   Total courses: 50
   Token count: 585
   Cached in Redis: ‚úÖ

üìÑ Catalog Preview:
# Redis University Course Catalog

## Business (10 courses)
- BUS033: Marketing Strategy (intermediate)
- BUS032: Marketing Strategy (intermediate)
- BUS034: Marketing Strategy (intermediate)
- BUS035: Marketing Strategy (intermediate)
- BUS037: Marketing Strategy (intermediate)
- BUS039: Marketing Strategy (intermediate)
- BUS040: Marketing Strategy (intermediate)
- BUS031: Principles of Management (beginner)
- BUS036: Principles of Management (beginner)
- BUS038: Principles of Management (beginner)

## Computer Science (10 courses)
- CS004: Database Systems (intermediate)
- CS010: Web Develo...



### Approach 3: Hybrid (Best of Both Worlds)

Combine structured view (overview) + RAG (specific details):

In [14]:
# Approach 3: Hybrid


async def hybrid_approach(query: str) -> str:
    """Combine catalog overview with RAG for specific details."""

    # Part 1: Get catalog overview (from cache)
    catalog_overview = redis_client.get("course_catalog_view")

    # Part 2: Get specific course details via RAG
    specific_courses = await rag_approach(query, limit=3)

    # Combine
    hybrid_context = f"""# Course Catalog Overview
{catalog_overview}

---

# Detailed Information for Your Query
{specific_courses}
"""

    return hybrid_context


# Test hybrid approach
hybrid_context = await hybrid_approach(query)
hybrid_tokens = count_tokens(hybrid_context)

print(
    f"""üìä Hybrid Approach Results:
   Query: "{query}"
   Token count: {hybrid_tokens:,}

   Components:
   - Catalog overview: {catalog_tokens:,} tokens
   - Specific details (RAG): {rag_tokens:,} tokens

üìÑ Context Structure:
   1. Full catalog overview (all departments)
   2. Detailed info for 3 most relevant courses
"""
)

üìä Hybrid Approach Results:
   Query: "What machine learning courses are available?"
   Token count: 761

   Components:
   - Catalog overview: 585 tokens
   - Specific details (RAG): 268 tokens

üìÑ Context Structure:
   1. Full catalog overview (all departments)
   2. Detailed info for 3 most relevant courses



### Compare All Three Approaches

Let's test all three with the same query and compare results:

In [15]:
# Test all three approaches
query = "What machine learning courses are available?"

print("=" * 80)
print("COMPARING THREE APPROACHES")
print("=" * 80)

# Approach 1: RAG
messages_rag = [
    SystemMessage(
        content=f"""You are a Redis University course advisor.

Available Courses:
{rag_context}

Help students find relevant courses."""
    ),
    HumanMessage(content=query),
]
response_rag = llm.invoke(messages_rag)

# Approach 2: Structured View
messages_view = [
    SystemMessage(
        content=f"""You are a Redis University course advisor.

{catalog_view}

Help students find relevant courses."""
    ),
    HumanMessage(content=query),
]
response_view = llm.invoke(messages_view)

# Approach 3: Hybrid
messages_hybrid = [
    SystemMessage(
        content=f"""You are a Redis University course advisor.

{hybrid_context}

Help students find relevant courses."""
    ),
    HumanMessage(content=query),
]
response_hybrid = llm.invoke(messages_hybrid)

# Display comparison
print(
    f"""
Query: "{query}"

{'=' * 80}
APPROACH 1: RAG (Semantic Search)
{'=' * 80}
Token count: {rag_tokens:,}
Response:
{response_rag.content}

{'=' * 80}
APPROACH 2: Structured View (Pre-Computed)
{'=' * 80}
Token count: {catalog_tokens:,}
Response:
{response_view.content}

{'=' * 80}
APPROACH 3: Hybrid (View + RAG)
{'=' * 80}
Token count: {hybrid_tokens:,}
Response:
{response_hybrid.content}

{'=' * 80}
"""
)

COMPARING THREE APPROACHES


21:56:37 httpx INFO   HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


21:56:38 httpx INFO   HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


21:56:38 httpx INFO   HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"



Query: "What machine learning courses are available?"

APPROACH 1: RAG (Semantic Search)
Token count: 268
Response:
Currently, there are no machine learning courses listed in the available courses at Redis University. If you're interested in related topics, you might consider taking "MATH026: Linear Algebra," as it covers essential mathematical concepts like vector spaces and matrices that are foundational for machine learning.

APPROACH 2: Structured View (Pre-Computed)
Token count: 585
Response:
The available machine learning course is:

- CS007: Machine Learning (advanced)

APPROACH 3: Hybrid (View + RAG)
Token count: 761
Response:
The available machine learning course is:

- CS007: Machine Learning (advanced)




### Decision Framework: Which Approach to Use?

Here's how to choose based on YOUR requirements:

| Factor | RAG | Structured Views | Hybrid |
|--------|-----|------------------|--------|
| **Token Efficiency** | ‚úÖ Good (3K) | ‚úÖ‚úÖ Excellent (2K) | ‚ö†Ô∏è Moderate (5K) |
| **Response Quality** | ‚úÖ Good (relevant) | ‚úÖ Good (overview) | ‚úÖ‚úÖ Excellent (both) |
| **Latency** | ‚ö†Ô∏è Moderate (search) | ‚úÖ‚úÖ Fast (cached) | ‚ö†Ô∏è Moderate (search) |
| **Maintenance** | ‚úÖ Low (auto-updates) | ‚ö†Ô∏è Higher (rebuild views) | ‚ö†Ô∏è Higher (both) |
| **Best For** | Specific queries | Overview queries | Production systems |

**Decision Process:**

1. **Analyze YOUR data characteristics:**
   - How many items? (10s, 100s, 1000s, millions?)
   - How often does it change? (Real-time, daily, weekly?)
   - What's the average item size? (100 words, 1000 words, 10K words?)

2. **Analyze YOUR query patterns:**
   - Specific queries ("Show me RU101") ‚Üí RAG
   - Overview queries ("What courses exist?") ‚Üí Structured Views
   - Mixed queries ‚Üí Hybrid

3. **Analyze YOUR constraints:**
   - Tight token budget ‚Üí Structured Views
   - Real-time updates required ‚Üí RAG
   - Best quality needed ‚Üí Hybrid

**Example Decision:**

For Redis University:
- ‚úÖ **Data:** 100-500 courses, updated weekly, 200-500 words each
- ‚úÖ **Queries:** Mix of overview ("What's available?") and specific ("ML courses?")
- ‚úÖ **Constraints:** Moderate token budget, weekly updates acceptable
- ‚úÖ **Decision:** **Hybrid approach** (pre-compute catalog + RAG for details)

---

## Part 4: Introduction to Chunking - When and Why

So far, we've worked with course data where each course is a complete, self-contained unit (200-500 words). But what happens when you have **long documents** that exceed token limits or contain multiple distinct topics?

This is where **chunking** becomes necessary.

### The Critical First Question: Does My Data Need Chunking?

**Chunking is NOT a default step** - it's an engineering decision based on your data characteristics.

Let's understand when chunking is necessary and when it's not.

### When You DON'T Need Chunking

If your data already consists of small, complete semantic units, chunking can actually hurt quality:

**Examples of data that DON'T need chunking:**
- ‚úÖ Course descriptions (200-500 words, complete)
- ‚úÖ Product listings (100-300 words, self-contained)
- ‚úÖ FAQ entries (50-200 words, question + answer)
- ‚úÖ Social media posts (50-280 characters, atomic)
- ‚úÖ Customer support tickets (100-500 words, single issue)

**Why not chunk?**
- Already at optimal size for retrieval
- Each unit is semantically complete
- Chunking would break coherent information
- Adds unnecessary complexity

In [16]:
# Example: Course data (NO chunking needed)
sample_course_text = transform_course_to_text(all_courses[0])
sample_tokens = count_tokens(sample_course_text)

print(
    f"""üìä Example: Course Description
{'=' * 80}
{sample_course_text}
{'=' * 80}

Token count: {sample_tokens}
Semantic completeness: ‚úÖ Complete (has all info about this course)
Chunking needed? ‚ùå NO

Why not?
- Under 500 tokens (well within limits)
- Self-contained (doesn't reference other sections)
- Semantically complete (has all course details)
- Breaking it up would lose context
"""
)

üìä Example: Course Description
CS004: Database Systems
Department: Computer Science
Credits: 3
Level: intermediate
Format: online
Instructor: Nicholas Nelson
Description: Design and implementation of database systems. SQL, normalization, transactions, and database administration.
    

Token count: 49
Semantic completeness: ‚úÖ Complete (has all info about this course)
Chunking needed? ‚ùå NO

Why not?
- Under 500 tokens (well within limits)
- Self-contained (doesn't reference other sections)
- Semantically complete (has all course details)
- Breaking it up would lose context



### When You DO Need Chunking

Chunking becomes necessary when documents are too long or contain multiple distinct topics:

**Examples of data that NEED chunking:**
- ‚úÖ Research papers (multiple sections)
- ‚úÖ Technical documentation (many topics)
- ‚úÖ Books/chapters (many concepts)
- ‚úÖ Legal contracts (multiple clauses)
- ‚úÖ Medical records (multiple visits/conditions)

**Why chunk?**
- **Exceeds embedding model limits** - Most embedding models have context windows of 512-8192 tokens
- **Contains multiple distinct topics** - Should be retrieved separately for precision
- **Too large for LLM to process effectively** - Even if it fits, quality degrades
- **Improves retrieval precision** - Find specific sections, not whole document
- **Prevents context quality problems**:
    - **"Needle in the Haystack" Problem**
       - LLMs struggle to find relevant information buried in long context
       - Performance degrades significantly in middle of long documents
       - Even GPT-4 shows 10-30% accuracy drop with irrelevant context

    - **Context Poisoning**
       - Irrelevant information actively degrades response quality
       - LLM may focus on wrong parts of context
       - Contradictory information causes confusion

    - **Context Rot (Lost in the Middle)**
       - Information in middle of long context is often ignored
       - LLMs have recency bias (focus on start/end)
       - Critical details get "lost" even if technically present

**üí° Solution:** Chunk documents so each chunk is focused, relevant, and within optimal context size (typically 200-800 tokens per chunk).


In [17]:
# Example: Research paper (NEEDS chunking)
# Let's simulate a long research paper about Redis

research_paper = """
# Optimizing Vector Search Performance in Redis

## Abstract
This paper presents a comprehensive analysis of vector search optimization techniques in Redis,
examining the trade-offs between search quality, latency, and memory usage. We evaluate multiple
indexing strategies including HNSW and FLAT indexes across datasets ranging from 10K to 10M vectors.
Our results demonstrate that careful index configuration can improve search latency by up to 10x
while maintaining 95%+ recall. We also introduce novel compression techniques that reduce memory
usage by 75% with minimal impact on search quality.

## 1. Introduction
Vector databases have become essential infrastructure for modern AI applications, enabling semantic
search, recommendation systems, and retrieval-augmented generation (RAG). Redis, traditionally known
as an in-memory data structure store, has evolved to support high-performance vector search through
the RediSearch module. However, optimizing vector search performance requires understanding complex
trade-offs between multiple dimensions...

[... 5,000 more words covering methodology, experiments, results, discussion ...]

## 2. Background and Related Work
Previous work on vector search optimization has focused primarily on algorithmic improvements to
approximate nearest neighbor (ANN) search. Malkov and Yashunin (2018) introduced HNSW, which has
become the de facto standard for high-dimensional vector search. Johnson et al. (2019) developed
FAISS, demonstrating that product quantization can significantly reduce memory usage...

[... 2,000 more words ...]

## 3. Performance Analysis and Results

### 3.1 HNSW Configuration Trade-offs

Table 1 shows the performance comparison across different HNSW configurations. As M increases from 16 to 64,
we observe significant improvements in recall (0.89 to 0.97) but at the cost of increased latency (2.1ms to 8.7ms)
and memory usage (1.2GB to 3.8GB). The sweet spot for most production workloads is M=32 with ef_construction=200,
which achieves 0.94 recall with 4.3ms latency.

Table 1: HNSW Performance Comparison
| M  | ef_construction | Recall@10 | Latency (ms) | Memory (GB) | Build Time (min) |
|----|-----------------|-----------|--------------|-------------|------------------|
| 16 | 100            | 0.89      | 2.1          | 1.2         | 8                |
| 32 | 200            | 0.94      | 4.3          | 2.1         | 15               |
| 64 | 400            | 0.97      | 8.7          | 3.8         | 32               |

The data clearly demonstrates the fundamental trade-off between search quality and resource consumption.
For applications requiring high recall (>0.95), the increased latency and memory costs are unavoidable.

### 3.2 Mathematical Model

The recall-latency trade-off can be modeled as a quadratic function of the HNSW parameters:

Latency(M, ef) = Œ±¬∑M¬≤ + Œ≤¬∑ef + Œ≥

Where:
- M = number of connections per layer (controls graph connectivity)
- ef = size of dynamic candidate list (controls search breadth)
- Œ±, Œ≤, Œ≥ = dataset-specific constants (fitted from experimental data)

For our e-commerce dataset, we fitted: Œ±=0.002, Œ≤=0.015, Œ≥=1.2 (R¬≤=0.94)

This model allows us to predict latency for untested configurations and optimize for specific
recall targets. The quadratic dependency on M explains why doubling M more than doubles latency.

## 4. Implementation Recommendations

Based on our findings, we recommend the following configuration for production deployments:

```python
# Optimal HNSW configuration for balanced performance
index_params = {
    "M": 32,                  # Balance recall and latency
    "ef_construction": 200,   # Higher quality index
    "ef_runtime": 100         # Fast search with good recall
}
```

This configuration achieves 0.94 recall with 4.3ms p95 latency, suitable for most real-time applications.
For applications with stricter latency requirements (<2ms), consider M=16 with ef_construction=100,
accepting the lower recall of 0.89. For applications requiring maximum recall (>0.95), use M=64
with ef_construction=400, but ensure adequate memory and accept higher latency.

[... 1,500 more words with additional analysis ...]

## 5. Discussion and Conclusion
Our findings demonstrate that vector search optimization is fundamentally about understanding
YOUR specific requirements and constraints. There is no one-size-fits-all configuration. The choice
between HNSW parameters depends on your specific recall requirements, latency budget, and memory constraints.
We provide a mathematical model and practical guidelines to help practitioners make informed decisions...
"""

paper_tokens = count_tokens(research_paper)
print(f"Token count: {paper_tokens:,} | Words: ~{len(research_paper.split())}")

Token count: 1,035 | Words: ~634


**üìä Analysis: Research Paper Example**

**Document:** "Optimizing Vector Search Performance in Redis"

**Structure:** Abstract, Introduction, Background, Methodology, Results, Discussion

**Chunking needed?** ‚úÖ **YES**

**Why This Document May Benefit from Chunking (Even with Large Context Windows):**

> **Note:** Modern LLMs can handle 128K+ tokens, so "fitting in context" isn't the issue. The real value of chunking is **better data modeling and retrieval precision**.

**1. Retrieval Precision vs. Recall Trade-off**

Without chunking (embed entire paper):
- Query: "What compression techniques were used?"
- Retrieved: Entire 15,000-token paper (includes Abstract, Background, Results, Discussion)
- Problem: 80% of retrieved content is irrelevant to the query
- LLM must process 15,000 tokens to find 200 tokens of relevant information

With chunking (embed by section):
- Query: "What compression techniques were used?"
- Retrieved: Methodology section (800 tokens)
- Result: 90%+ of retrieved content is directly relevant
- LLM processes 800 focused tokens with high signal-to-noise ratio

**2. Structured Content Requires Specialized Chunking**

Research papers contain heterogeneous content types that need different handling. Without specialized chunking, there will be a danger of mixing incompatible content types, chunking in the middle of tables, etc.

**Tables and Charts:**
```
Table 1: HNSW Performance Comparison
| M  | ef_construction | Recall@10 | Latency (ms) | Memory (GB) |
|----|-----------------|-----------|--------------|-------------|
| 16 | 100            | 0.89      | 2.1          | 1.2         |
| 32 | 200            | 0.94      | 4.3          | 2.1         |
| 64 | 400            | 0.97      | 8.7          | 3.8         |
```

**Best practice:** Chunk table WITH its caption and explanation:
- ‚úÖ "Table 1 shows HNSW performance trade-offs. As M increases from 16 to 64, recall improves from 0.89 to 0.97, but latency increases from 2.1ms to 8.7ms..."
- ‚ùå Don't chunk table separately from context - it becomes meaningless

**Mathematical Formulas:**
```
The recall-latency trade-off can be modeled as:
Latency(M, ef) = Œ±¬∑M¬≤ + Œ≤¬∑ef + Œ≥

Where:
- M = number of connections per layer
- ef = size of dynamic candidate list
- Œ±, Œ≤, Œ≥ = dataset-specific constants
```

**Best practice:** Chunk formula WITH its explanation and variable definitions
- ‚úÖ Keep formula + explanation + interpretation together
- ‚ùå Don't separate formula from its meaning

**Code Snippets:**
```python
# Optimal HNSW configuration for our use case
index_params = {
    "M": 32,              # Balance recall and latency
    "ef_construction": 200,  # Higher quality index
    "ef_runtime": 100     # Fast search
}
```

**Best practice:** Chunk code WITH its context and rationale
- ‚úÖ "For production deployment, we recommend M=32 and ef_construction=200 because..."
- ‚ùå Don't chunk code without explaining WHY these values

**3. Query-Specific Retrieval Patterns**

Different queries need different chunks:

| Query | Needs | Without Chunking | With Chunking |
|-------|-------|------------------|---------------|
| "What compression techniques?" | Methodology section | Entire paper (15K tokens) | Methodology (800 tokens) |
| "What were recall results?" | Results + Table 1 | Entire paper (15K tokens) | Results section (600 tokens) |
| "How does HNSW work?" | Background + Formula | Entire paper (15K tokens) | Background (500 tokens) |
| "What's the recommended config?" | Discussion + Code | Entire paper (15K tokens) | Discussion (400 tokens) |

**Impact:** 10-20x reduction in irrelevant context, leading to faster responses and better quality.

**4. Embedding Quality: Focused vs. Averaged**

**Without chunking:**
- Embedding represents "a paper about vector search, HNSW, compression, benchmarks, Redis..."
- Generic, averaged representation
- Matches weakly with specific queries

**With chunking:**
- Methodology chunk: "compression techniques, quantization, memory reduction, implementation details..."
- Results chunk: "recall metrics, latency measurements, performance comparisons, benchmark data..."
- Each embedding is focused and matches strongly with relevant queries

**üí° Key Insight:** Chunking isn't about fitting in context windows - it's about **data modeling for retrieval**. Just like you wouldn't store all customer data in one database row, you shouldn't embed all document content in one vector.

### Chunking Strategies: Engineering Trade-Offs

Once you've determined that your data needs chunking, the next question is: **How should you chunk it?**

There's no single "best" chunking strategy - the optimal approach depends on YOUR data characteristics and query patterns. Let's explore different strategies and their trade-offs.

**üîß Using LangChain for Production-Ready Chunking**

In this section, we'll use **LangChain's text splitting utilities** for Strategies 2 and 3. LangChain provides battle-tested, production-ready implementations that handle edge cases and optimize for LLM consumption.

**Why LangChain?**
- **Industry-standard**: Used by thousands of production applications
- **Smart boundary detection**: Respects natural text boundaries (paragraphs, sentences, words)
- **Local embeddings**: Free semantic chunking with HuggingFace models (no API costs)
- **Well-tested**: Handles edge cases (empty chunks, unicode, special characters)

We'll use:
- `RecursiveCharacterTextSplitter` (Strategy 2): Smart fixed-size chunking with boundary awareness
- `SemanticChunker` + `HuggingFaceEmbeddings` (Strategy 3): Meaning-based chunking with local models

### Strategy 1: Document-Based Chunking (Structure-Aware)

**Concept:** Split documents based on their inherent structure (sections, paragraphs, headings, and as mentioned earlier, tables, code, and formulas).

**Best for:** Structured documents with clear logical divisions (research papers, technical docs, books, etc.).

In [18]:
# Strategy 1: Document-Based Chunking
# Split research paper by sections (using markdown headers)


def chunk_by_structure(text: str, separator: str = "\n## ") -> List[str]:
    """Split text by structural markers (e.g., markdown headers)."""

    # Split by headers
    sections = text.split(separator)

    # Clean and format chunks
    chunks = []
    for i, section in enumerate(sections):
        if section.strip():
            # Add header back (except for first chunk which is title)
            if i > 0:
                chunk = "## " + section
            else:
                chunk = section
            chunks.append(chunk.strip())

    return chunks


# Apply to research paper
structure_chunks = chunk_by_structure(research_paper)

print(
    f"""üìä Strategy 1: Document-Based (Structure-Aware) Chunking
{'=' * 80}
Original document: {paper_tokens:,} tokens
Number of chunks: {len(structure_chunks)}

Chunk breakdown:
"""
)

for i, chunk in enumerate(structure_chunks):
    chunk_tokens = count_tokens(chunk)
    # Show first 100 chars of each chunk
    preview = chunk[:300].replace("\n", " ")
    print(f"   Chunk {i+1}: {chunk_tokens:,} tokens - {preview}...\n")


üìä Strategy 1: Document-Based (Structure-Aware) Chunking
Original document: 1,035 tokens
Number of chunks: 7

Chunk breakdown:

   Chunk 1: 8 tokens - # Optimizing Vector Search Performance in Redis...

   Chunk 2: 108 tokens - ## Abstract This paper presents a comprehensive analysis of vector search optimization techniques in Redis, examining the trade-offs between search quality, latency, and memory usage. We evaluate multiple indexing strategies including HNSW and FLAT indexes across datasets ranging from 10K to 10M vec...

   Chunk 3: 98 tokens - ## 1. Introduction Vector databases have become essential infrastructure for modern AI applications, enabling semantic search, recommendation systems, and retrieval-augmented generation (RAG). Redis, traditionally known as an in-memory data structure store, has evolved to support high-performance ve...

   Chunk 4: 98 tokens - ## 2. Background and Related Work Previous work on vector search optimization has focused primarily on algorithm

**Strategy 1 Analysis:**

‚úÖ **Advantages:**
- Respects document structure (sections stay together)
- Semantically coherent (each chunk is a complete section)
- Easy to implement for structured documents
- Preserves author's logical organization
- **Keeps tables, formulas, and code WITH their context** (e.g., "## 3. Performance Analysis" section includes Table 1 WITH its explanation, and "## 3.2 Mathematical Model" includes the formula WITH its variable definitions)

‚ö†Ô∏è **Trade-offs:**
- Variable chunk sizes (some sections longer than others)
- Requires documents to have clear structure
- May create chunks that are still too large
- Doesn't work for unstructured text

üéØ **Best for:**
- Research papers with clear sections
- Technical documentation with headers
- Books with chapters/sections
- Any markdown/HTML content with structural markers

üí° **Key Insight:**
Notice how Chunk 3 ("## 3. Performance Analysis and Results") contains Table 1 along with its explanation and interpretation. This is the correct approach - the table is meaningless without context. Similarly, the mathematical formula in section 3.2 stays with its variable definitions and interpretation. This is why structure-aware chunking is superior to fixed-size chunking for technical documents.

### Strategy 2: Fixed-Size Chunking (Token-Based)

**Concept:** Split text into chunks of a predetermined size (e.g., 512 tokens) with overlap.

**Best for:** Unstructured text, quick prototyping, when you need consistent chunk sizes.

Trade-offs:
- Ignores document structure (may split mid-sentence or mid-paragraph or mid-table)
- Can break semantic coherence
- May split important information across chunks

In [19]:
# Strategy 2: Fixed-Size Chunking (Using LangChain)
# Industry-standard approach with smart boundary detection

from langchain_text_splitters import RecursiveCharacterTextSplitter

# Create text splitter with smart boundary detection
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=800,  # Target chunk size in characters
    chunk_overlap=100,  # Overlap to preserve context
    length_function=len,
    separators=["\n\n", "\n", ". ", " ", ""],  # Try these in order
    is_separator_regex=False,
)

print("üîÑ Running fixed-size chunking with LangChain...")
print("   Trying to split on: paragraphs ‚Üí sentences ‚Üí words ‚Üí characters\n")

# Apply to research paper
fixed_chunks_docs = text_splitter.create_documents([research_paper])
fixed_chunks = [doc.page_content for doc in fixed_chunks_docs]

print(
    f"""üìä Strategy 2: Fixed-Size (LangChain) Chunking
{'=' * 80}
Original document: {paper_tokens:,} tokens
Target chunk size: 800 characters (~200 words)
Overlap: 100 characters
Number of chunks: {len(fixed_chunks)}

Chunk breakdown:
"""
)

for i, chunk in enumerate(fixed_chunks[:5]):  # Show first 5
    chunk_tokens = count_tokens(chunk)
    preview = chunk[:100].replace("\n", " ")
    print(f"   Chunk {i+1}: {chunk_tokens:,} tokens - {preview}...")

print(f"... ({len(fixed_chunks) - 5} more chunks)")

üîÑ Running fixed-size chunking with LangChain...
   Trying to split on: paragraphs ‚Üí sentences ‚Üí words ‚Üí characters

üìä Strategy 2: Fixed-Size (LangChain) Chunking
Original document: 1,035 tokens
Target chunk size: 800 characters (~200 words)
Overlap: 100 characters
Number of chunks: 8

Chunk breakdown:

   Chunk 1: 117 tokens - # Optimizing Vector Search Performance in Redis  ## Abstract This paper presents a comprehensive ana...
   Chunk 2: 98 tokens - ## 1. Introduction Vector databases have become essential infrastructure for modern AI applications,...
   Chunk 3: 134 tokens - [... 5,000 more words covering methodology, experiments, results, discussion ...]  ## 2. Background ...
   Chunk 4: 128 tokens - ## 3. Performance Analysis and Results  ### 3.1 HNSW Configuration Trade-offs  Table 1 shows the per...
   Chunk 5: 206 tokens - Table 1: HNSW Performance Comparison | M  | ef_construction | Recall@10 | Latency (ms) | Memory (GB)...
... (3 more chunks)


**Strategy 2 Analysis:**

‚úÖ **Advantages:**
- **Respects natural boundaries**: Tries paragraphs ‚Üí sentences ‚Üí words ‚Üí characters
- Consistent chunk sizes (predictable token usage)
- Works on any text (structured or unstructured)
- Fast processing
- **Doesn't split mid-sentence** (unless absolutely necessary)

‚ö†Ô∏è **Trade-offs:**
- Ignores document structure (doesn't understand sections)
- Can break semantic coherence (may split related content)
- Overlap creates redundancy (increases storage/cost)
- May split important information across chunks

üéØ **Best for:**
- Unstructured text (no clear sections)
- Quick prototyping and baselines
- When consistent chunk sizes are required
- Simple documents where structure doesn't matter

üí° **How RecursiveCharacterTextSplitter Works:**

Unlike naive fixed-size splitting, this algorithm:

1. **Tries separators in order**: `["\n\n", "\n", ". ", " ", ""]`
2. **Splits on first successful separator** that keeps chunks under target size
3. **Falls back to next separator** if chunks are still too large
4. **Preserves natural boundaries** (paragraphs > sentences > words > characters)

**Example:**
- Target: 800 characters
- First try: Split on `\n\n` (paragraphs)
- If paragraph > 800 chars: Split on `\n` (lines)
- If line > 800 chars: Split on `. ` (sentences)
- And so on...

**Why this is better than naive splitting:**
- ‚úÖ Respects natural text boundaries
- ‚úÖ Doesn't split mid-sentence (unless necessary)
- ‚úÖ Maintains readability
- ‚úÖ Better for LLM comprehension

### Strategy 3: Semantic Chunking (Meaning-Based)

**Concept:** Split text based on semantic similarity using embeddings - create new chunks when topic changes significantly.

**How it works:**
1. Split text into sentences or paragraphs
2. Generate embeddings for each segment
3. Calculate similarity between consecutive segments
4. Create chunk boundaries where similarity drops (topic shift detected)

**Best for:** Dense academic text, legal documents, narratives where semantic boundaries don't align with structure.

In [20]:
# Strategy 3: Semantic Chunking (Using LangChain)
# Industry-standard approach with local embeddings (no API costs!)

from langchain_experimental.text_splitter import SemanticChunker
from langchain_huggingface import HuggingFaceEmbeddings
import os

# Suppress tokenizer warnings
os.environ["TOKENIZERS_PARALLELISM"] = "false"

# Initialize local embeddings (no API costs!)
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    model_kwargs={"device": "cpu"},
    encode_kwargs={"normalize_embeddings": True},
)

# Create semantic chunker with percentile-based breakpoint detection
semantic_chunker = SemanticChunker(
    embeddings=embeddings,
    breakpoint_threshold_type="percentile",  # Split at bottom 25% of similarities
    breakpoint_threshold_amount=25,  # 25th percentile
    buffer_size=1,  # Compare consecutive sentences
)

print("üîÑ Running semantic chunking with LangChain...")
print("   Using local embeddings (sentence-transformers/all-MiniLM-L6-v2)")
print("   Breakpoint detection: 25th percentile of similarity scores\n")

# Apply to research paper
semantic_chunks_docs = semantic_chunker.create_documents([research_paper])

# Extract text from Document objects
semantic_chunks = [doc.page_content for doc in semantic_chunks_docs]

print(
    f"""üìä Strategy 3: Semantic (LangChain) Chunking
{'=' * 80}
Original document: {paper_tokens:,} tokens
Breakpoint method: Percentile (25th percentile)
Number of chunks: {len(semantic_chunks)}

Chunk breakdown:
"""
)

for i, chunk in enumerate(semantic_chunks):
    chunk_tokens = count_tokens(chunk)
    preview = chunk[:200].replace("\n", " ")
    print(f"   Chunk {i+1}: {chunk_tokens:,} tokens - {preview}...\n")

21:56:40 sentence_transformers.SentenceTransformer INFO   Load pretrained SentenceTransformer: sentence-transformers/all-MiniLM-L6-v2


üîÑ Running semantic chunking with LangChain...
   Using local embeddings (sentence-transformers/all-MiniLM-L6-v2)
   Breakpoint detection: 25th percentile of similarity scores

üìä Strategy 3: Semantic (LangChain) Chunking
Original document: 1,035 tokens
Breakpoint method: Percentile (25th percentile)
Number of chunks: 25

Chunk breakdown:

   Chunk 1: 70 tokens -  # Optimizing Vector Search Performance in Redis  ## Abstract This paper presents a comprehensive analysis of vector search optimization techniques in Redis, examining the trade-offs between search qu...

   Chunk 2: 26 tokens - Our results demonstrate that careful index configuration can improve search latency by up to 10x while maintaining 95%+ recall....

   Chunk 3: 22 tokens - We also introduce novel compression techniques that reduce memory usage by 75% with minimal impact on search quality....

   Chunk 4: 4 tokens - ## 1....

   Chunk 5: 60 tokens - Introduction Vector databases have become essential infrastructure

**Strategy 3 Analysis:**

‚úÖ **Advantages:**
- **Detects actual topic changes** using semantic similarity (not just structural markers)
- Preserves semantic coherence (topics stay together even without headers)
- Better retrieval quality (chunks are topically focused)
- Adapts to content (works on unstructured text)
- Reduces context loss at boundaries (doesn't split mid-topic)
- **Free and local**: Uses sentence-transformers (no API costs)

‚ö†Ô∏è **Trade-offs:**
- Slower processing (must compute embeddings for each sentence)
- Variable chunk sizes (depends on topic boundaries)
- Higher computational cost (embedding computation + similarity calculations)
- Requires initial model download (~90MB for all-MiniLM-L6-v2)

üéØ **Best for:**
- Dense academic papers with complex topic transitions
- Legal documents where semantic sections don't have headers
- Narratives where topics don't align with structure
- Unstructured text (emails, transcripts, conversations)
- When retrieval quality is more important than processing speed

üí° **How Percentile-Based Breakpoint Detection Works:**

Instead of using a fixed similarity threshold (e.g., 0.75), the percentile method:

1. **Computes all similarities** between consecutive sentences
2. **Calculates percentiles** of the similarity distribution
3. **Creates breakpoints** where similarity is in the bottom X percentile

**Example:**
- Similarities: [0.92, 0.88, 0.45, 0.91, 0.35, 0.89]
- 25th percentile: 0.45
- Breakpoints created at: positions 2 (0.45) and 4 (0.35)

**Why this is better than fixed threshold:**
- ‚úÖ Adapts to document's similarity distribution
- ‚úÖ Works across different document types
- ‚úÖ No manual threshold tuning needed
- ‚úÖ More robust to outliers

**Alternative Breakpoint Methods:**
- `"gradient"`: Detects sudden drops in similarity (topic shifts)
- `"standard_deviation"`: Uses statistical deviation from mean
- `"interquartile"`: Uses IQR-based outlier detection

This is fundamentally different from structure-based chunking - it detects semantic boundaries regardless of headers or formatting.

### Strategy 4: Hierarchical Chunking (Multi-Level)

**Concept:** Create multiple levels of chunks - large chunks for overview, small chunks for details.

**Best for:** Very large documents where users need both high-level summaries and specific details.

In [21]:
# Strategy 4: Hierarchical Chunking


def chunk_hierarchically(text: str) -> Dict[str, List[str]]:
    """
    Create multiple levels of chunks.
    Level 1: Large sections (by ## headers)
    Level 2: Subsections (by paragraphs within sections)
    """

    # Level 1: Split by major sections
    level1_chunks = chunk_by_structure(text, separator="\n## ")

    # Level 2: Further split large sections into paragraphs
    level2_chunks = []
    for section in level1_chunks:
        # If section is large, split into paragraphs
        if count_tokens(section) > 400:
            paragraphs = [
                p.strip() for p in section.split("\n\n") if p.strip() and len(p) > 50
            ]
            level2_chunks.extend(paragraphs)
        else:
            level2_chunks.append(section)

    return {
        "level1": level1_chunks,  # Large sections
        "level2": level2_chunks,  # Smaller subsections
    }


# Apply to research paper
hierarchical_chunks = chunk_hierarchically(research_paper)

print(
    f"""üìä Strategy 4: Hierarchical (Multi-Level) Chunking
{'=' * 80}
Original document: {paper_tokens:,} tokens

Level 1 (Sections): {len(hierarchical_chunks['level1'])} chunks
"""
)

for i, chunk in enumerate(hierarchical_chunks["level1"]):
    chunk_tokens = count_tokens(chunk)
    preview = chunk[:80].replace("\n", " ")
    print(f"   L1-{i+1}: {chunk_tokens:,} tokens - {preview}...")

print(
    f"""
Level 2 (Subsections): {len(hierarchical_chunks['level2'])} chunks
"""
)

for i, chunk in enumerate(hierarchical_chunks["level2"][:5]):  # Show first 5
    chunk_tokens = count_tokens(chunk)
    preview = chunk[:80].replace("\n", " ")
    print(f"   L2-{i+1}: {chunk_tokens:,} tokens - {preview}...")

print(f"... ({len(hierarchical_chunks['level2']) - 5} more L2 chunks)")

üìä Strategy 4: Hierarchical (Multi-Level) Chunking
Original document: 1,035 tokens

Level 1 (Sections): 7 chunks

   L1-1: 8 tokens - # Optimizing Vector Search Performance in Redis...
   L1-2: 108 tokens - ## Abstract This paper presents a comprehensive analysis of vector search optimi...
   L1-3: 98 tokens - ## 1. Introduction Vector databases have become essential infrastructure for mod...
   L1-4: 98 tokens - ## 2. Background and Related Work Previous work on vector search optimization ha...
   L1-5: 464 tokens - ## 3. Performance Analysis and Results  ### 3.1 HNSW Configuration Trade-offs  T...
   L1-6: 187 tokens - ## 4. Implementation Recommendations  Based on our findings, we recommend the fo...
   L1-7: 73 tokens - ## 5. Discussion and Conclusion Our findings demonstrate that vector search opti...

Level 2 (Subsections): 13 chunks

   L2-1: 8 tokens - # Optimizing Vector Search Performance in Redis...
   L2-2: 108 tokens - ## Abstract This paper presents a comprehensive anal

**Strategy 4 Analysis:**

‚úÖ **Advantages:**
- Supports both overview and detailed queries
- Flexible retrieval (can search at different levels)
- Preserves document hierarchy
- Better for complex documents

‚ö†Ô∏è **Trade-offs:**
- More complex to implement and maintain
- Requires more storage (multiple levels)
- Need strategy to choose which level to search
- Higher indexing cost

üéØ **Best for:**
- Very large documents (textbooks, manuals)
- When users need both summaries and details
- Technical documentation with nested structure
- Legal contracts with sections and subsections

üí° **Retrieval Strategy:**
- Start with Level 1 for overview
- If user needs more detail, retrieve Level 2 chunks
- Can combine: "Show section summary + relevant details"

### Comparing Chunking Strategies: Decision Framework

Now let's compare all four strategies side-by-side:

In [22]:
print(
    f"""
{'=' * 80}
CHUNKING STRATEGY COMPARISON
{'=' * 80}

Document: Research Paper ({paper_tokens:,} tokens)

Strategy              | Chunks | Avg Size | Complexity | Best For
--------------------- | ------ | -------- | ---------- | --------
Document-Based        | {len(structure_chunks):>6} | {sum(count_tokens(c) for c in structure_chunks) // len(structure_chunks):>8} | Low        | Structured docs
Fixed-Size            | {len(fixed_chunks):>6} | {sum(count_tokens(c) for c in fixed_chunks) // len(fixed_chunks):>8} | Low        | Unstructured text
Semantic              | {len(semantic_chunks):>6} | {sum(count_tokens(c) for c in semantic_chunks) // len(semantic_chunks):>8} | High       | Dense academic text
Hierarchical (L1)     | {len(hierarchical_chunks['level1']):>6} | {sum(count_tokens(c) for c in hierarchical_chunks['level1']) // len(hierarchical_chunks['level1']):>8} | Medium     | Large complex docs
Hierarchical (L2)     | {len(hierarchical_chunks['level2']):>6} | {sum(count_tokens(c) for c in hierarchical_chunks['level2']) // len(hierarchical_chunks['level2']):>8} | Medium     | Large complex docs

{'=' * 80}
"""
)


CHUNKING STRATEGY COMPARISON

Document: Research Paper (1,035 tokens)

Strategy              | Chunks | Avg Size | Complexity | Best For
--------------------- | ------ | -------- | ---------- | --------
Document-Based        |      7 |      148 | Low        | Structured docs
Fixed-Size            |      8 |      140 | Low        | Unstructured text
Semantic              |     25 |       41 | High       | Dense academic text
Hierarchical (L1)     |      7 |      148 | Medium     | Large complex docs
Hierarchical (L2)     |     13 |       76 | Medium     | Large complex docs




### YOUR Chunking Decision Framework

Here's how to choose the right chunking strategy for YOUR domain:

**Step 1: Analyze YOUR Data Characteristics**

Ask these questions about your documents:

1. **Structure:** Do documents have clear structural markers (headers, sections)?
   - ‚úÖ Yes ‚Üí Consider Document-Based or Hierarchical
   - ‚ùå No ‚Üí Consider Fixed-Size or Semantic

2. **Length:** How long are documents?
   - < 500 tokens ‚Üí Don't chunk!
   - 500-2000 tokens ‚Üí Document-Based (if structured) or Fixed-Size
   - 2000-10000 tokens ‚Üí Semantic or Hierarchical
   - > 10000 tokens ‚Üí Hierarchical

3. **Homogeneity:** Are all documents similar in structure?
   - ‚úÖ Yes ‚Üí Use single strategy
   - ‚ùå No ‚Üí Consider Adaptive (different strategies for different doc types)

4. **Topic Density:** How many topics per document?
   - Single topic ‚Üí Don't chunk or use large chunks
   - Multiple related topics ‚Üí Document-Based
   - Many distinct topics ‚Üí Semantic or Fixed-Size

**Step 2: Analyze YOUR Query Patterns**

1. **Query Specificity:**
   - Specific ("What is HNSW?") ‚Üí Smaller chunks (Fixed-Size, Semantic)
   - Overview ("Summarize the paper") ‚Üí Larger chunks (Document-Based, Hierarchical L1)
   - Mixed ‚Üí Hierarchical

2. **Query Scope:**
   - Single-section queries ‚Üí Document-Based
   - Cross-section queries ‚Üí Semantic or Fixed-Size

**Step 3: Analyze YOUR Constraints**

1. **Token Budget:** How many tokens can you afford per query?
   - Tight budget ‚Üí Smaller chunks, fewer retrieved
   - Generous budget ‚Üí Larger chunks or Hierarchical

2. **Latency Requirements:**
   - Real-time ‚Üí Fixed-Size (fast, simple)
   - Batch processing ‚Üí Semantic (slower but better quality)

3. **Quality Requirements:**
   - Highest quality ‚Üí Semantic or Hierarchical
   - Good enough ‚Üí Document-Based or Fixed-Size

**Example Decisions:**

| Domain | Data Characteristics | Decision | Why |
|--------|---------------------|----------|-----|
| **Research Papers** | 5-10K tokens, clear sections, dense topics | Document-Based | Sections are natural semantic units |
| **Customer Support** | 100-500 tokens, unstructured | Don't chunk! | Already optimal size |
| **Legal Contracts** | 10-50K tokens, nested structure | Hierarchical | Need both overview and clause-level detail |
| **Product Docs** | 1-5K tokens, mixed structure | Fixed-Size (512 tokens, 50 overlap) | Simple, works for varied content |
| **Medical Records** | 1-3K tokens, chronological | Semantic | Topic changes (visits, conditions) don't align with structure |

---

## Part 5: Building Production-Ready Context Pipelines

Now that you understand data transformation and chunking, let's discuss how to build production-ready pipelines.

### Three Pipeline Architectures

There are three main approaches to context preparation in production:

### Architecture 1: Request-Time Processing

**Concept:** Transform data on-the-fly when a query arrives.

```
User Query ‚Üí Retrieve Raw Data ‚Üí Transform ‚Üí Chunk (if needed) ‚Üí Embed ‚Üí Search ‚Üí Return Context
```

**Pros:**
- ‚úÖ Always up-to-date (no stale data)
- ‚úÖ No pre-processing required
- ‚úÖ Simple to implement

**Cons:**
- ‚ùå Higher latency (processing happens during request)
- ‚ùå Repeated work (same transformations for every query)
- ‚ùå Not suitable for large datasets

**Best for:**
- Small datasets (< 1,000 documents)
- Frequently changing data
- Simple transformations

### Architecture 2: Batch Processing

**Concept:** Pre-process all data in batches (nightly, weekly) and store results.

```
[Scheduled Job]
Raw Data ‚Üí Extract ‚Üí Clean ‚Üí Transform ‚Üí Chunk ‚Üí Embed ‚Üí Store in Vector DB

[Query Time]
User Query ‚Üí Search Vector DB ‚Üí Return Pre-Processed Context
```

**Pros:**
- ‚úÖ Fast query time (all processing done ahead)
- ‚úÖ Efficient (process once, use many times)
- ‚úÖ Can use expensive transformations (LLM-based chunking, semantic analysis)

**Cons:**
- ‚ùå Data can be stale (until next batch run)
- ‚ùå Requires scheduling infrastructure
- ‚ùå Higher storage costs (store processed data)

**Best for:**
- Large datasets (> 10,000 documents)
- Infrequently changing data (daily/weekly updates)
- Complex transformations (semantic chunking, LLM summaries)

### Architecture 3: Event-Driven Processing

**Concept:** Process data as it changes (real-time updates).

```
Data Change Event ‚Üí Trigger Pipeline ‚Üí Extract ‚Üí Clean ‚Üí Transform ‚Üí Chunk ‚Üí Embed ‚Üí Update Vector DB

[Query Time]
User Query ‚Üí Search Vector DB ‚Üí Return Context
```

**Pros:**
- ‚úÖ Always up-to-date (real-time)
- ‚úÖ Fast query time (pre-processed)
- ‚úÖ Efficient (only process changed data)

**Cons:**
- ‚ùå Complex infrastructure (event streams, queues)
- ‚ùå Requires change detection
- ‚ùå Higher operational complexity

**Best for:**
- Real-time data (news, social media, live updates)
- Large datasets that change frequently
- When both freshness and speed are critical

### Choosing YOUR Pipeline Architecture

Use this decision tree:

**Question 1: How often does your data change?**
- Real-time (seconds/minutes) ‚Üí Event-Driven
- Frequently (hourly/daily) ‚Üí Batch or Event-Driven
- Infrequently (weekly/monthly) ‚Üí Batch
- Rarely (manual updates) ‚Üí Request-Time or Batch

**Question 2: How large is your dataset?**
- Small (< 1,000 docs) ‚Üí Request-Time
- Medium (1,000-100,000 docs) ‚Üí Batch
- Large (> 100,000 docs) ‚Üí Batch or Event-Driven

**Question 3: What are your latency requirements?**
- Real-time (< 100ms) ‚Üí Batch or Event-Driven (pre-processed)
- Interactive (< 1s) ‚Üí Any approach
- Batch queries ‚Üí Request-Time acceptable

**Question 4: How complex are your transformations?**
- Simple (cleaning, formatting) ‚Üí Any approach
- Moderate (chunking, basic NLP) ‚Üí Batch or Event-Driven
- Complex (LLM-based, semantic analysis) ‚Üí Batch (pre-compute)

**Example Decision:**

For Redis University:
- ‚úÖ **Data changes:** Weekly (new courses added)
- ‚úÖ **Dataset size:** 100-500 courses (medium)
- ‚úÖ **Latency:** Interactive (< 1s acceptable)
- ‚úÖ **Transformations:** Moderate (structured views + embeddings)
- ‚úÖ **Decision:** **Batch Processing** (weekly job to rebuild catalog + embeddings)

### Example: Batch Processing Pipeline for Redis University

In [23]:
# Example: Batch Processing Pipeline
# This would run as a scheduled job (e.g., weekly)


async def batch_process_courses():
    """
    Batch processing pipeline for Redis University courses.
    Runs weekly to update catalog and embeddings.
    """

    print("=" * 80)
    print("BATCH PROCESSING PIPELINE - Redis University Courses")
    print("=" * 80)

    # Step 1: Extract
    print("\n[Step 1/5] Extracting course data...")
    all_courses = await course_manager.get_all_courses()
    print(f"   ‚úÖ Extracted {len(all_courses)} courses")

    # Show sample raw data
    if all_courses:
        sample = all_courses[0]
        print(f"\n   üìÑ Sample raw course:")
        print(f"      {sample.course_code}: {sample.title}")
        print(f"      Department: {sample.department}, Credits: {sample.credits}, Level: {sample.difficulty_level.value}")

    # Step 2: Clean
    print("\n[Step 2/5] Cleaning data...")
    # Remove test courses, validate fields, etc.
    cleaned_courses = [
        c for c in all_courses if c.course_code.startswith(("RU", "CS", "MATH"))
    ]
    print(
        f"   ‚úÖ Cleaned: {len(cleaned_courses)} courses (removed {len(all_courses) - len(cleaned_courses)} test courses)"
    )

    # Show what was filtered out
    removed_courses = [c for c in all_courses if not c.course_code.startswith(("RU", "CS", "MATH"))]
    if removed_courses:
        print(f"\n   üìÑ Example removed course:")
        print(f"      üóëÔ∏è  {removed_courses[0].course_code}: {removed_courses[0].title} (filtered out)")

    # Step 3: Transform
    print("\n[Step 3/5] Transforming to LLM-friendly format...")
    transformed_courses = [transform_course_to_text(c) for c in cleaned_courses]
    total_tokens = sum(count_tokens(t) for t in transformed_courses)
    print(
        f"   ‚úÖ Transformed: {len(transformed_courses)} courses ({total_tokens:,} total tokens)"
    )

    # Show before/after transformation
    if cleaned_courses and transformed_courses:
        print(f"\n   üìÑ Transformation example:")
        print(f"      Before: {cleaned_courses[0].course_code} (Course object)")
        print(f"      After (LLM-friendly text):")
        preview = transformed_courses[0].replace('\n', '\n      ')
        print(f"      {preview[:250]}...")

    # Step 4: Create Structured Views
    print("\n[Step 4/5] Creating structured catalog view...")
    catalog_view = await create_catalog_view()
    catalog_tokens = count_tokens(catalog_view)
    redis_client.set("course_catalog_view", catalog_view)
    redis_client.set("course_catalog_view:updated", "2024-09-30")
    print(f"   ‚úÖ Created catalog view ({catalog_tokens:,} tokens)")
    print(f"   ‚úÖ Cached in Redis")

    # Show catalog structure
    print(f"\n   üìÑ Catalog view structure:")
    catalog_preview = catalog_view[:300].replace('\n', '\n      ')
    print(f"      {catalog_preview}...")

    # Step 5: Store (in production, would also create embeddings and store in vector DB)
    print("\n[Step 5/5] Storing processed data...")
    for i, (course, text) in enumerate(zip(cleaned_courses, transformed_courses)):
        key = f"course:processed:{course.course_code}"
        redis_client.set(key, text)
    print(f"   ‚úÖ Stored {len(cleaned_courses)} processed courses in Redis")

    # Show storage example
    if cleaned_courses:
        print(f"\n   üìÑ Storage example:")
        print(f"      Key: course:processed:{cleaned_courses[0].course_code}")
        print(f"      Value: {transformed_courses[0][:100]}...")

    print("\n" + "=" * 80)
    print("BATCH PROCESSING COMPLETE")
    print("=" * 80)
    print(
        f"""
Summary:
- Courses processed: {len(cleaned_courses)}
- Total tokens: {total_tokens:,}
- Catalog view tokens: {catalog_tokens:,}
- Storage: Redis
- Next run: 2024-10-07 (weekly)
"""
    )


# Run the batch pipeline
await batch_process_courses()

BATCH PROCESSING PIPELINE - Redis University Courses

[Step 1/5] Extracting course data...


21:56:41 httpx INFO   HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


   ‚úÖ Extracted 50 courses

   üìÑ Sample raw course:
      CS004: Database Systems
      Department: Computer Science, Credits: 3, Level: intermediate

[Step 2/5] Cleaning data...
   ‚úÖ Cleaned: 20 courses (removed 30 test courses)

   üìÑ Example removed course:
      üóëÔ∏è  BUS033: Marketing Strategy (filtered out)

[Step 3/5] Transforming to LLM-friendly format...
   ‚úÖ Transformed: 20 courses (1,079 total tokens)

   üìÑ Transformation example:
      Before: CS004 (Course object)
      After (LLM-friendly text):
      CS004: Database Systems
      Department: Computer Science
      Credits: 3
      Level: intermediate
      Format: online
      Instructor: Nicholas Nelson
      Description: Design and implementation of database systems. SQL, normalization, transac...

[Step 4/5] Creating structured catalog view...
   ‚úÖ Created catalog view (585 tokens)
   ‚úÖ Cached in Redis

   üìÑ Catalog view structure:
      # Redis University Course Catalog
      
      ## Business

---

## Summary and Key Takeaways

### What You Learned

**1. Context is Data - and Data Requires Engineering**
- Context isn't just "data you feed to an LLM"
- It requires systematic transformation: Raw ‚Üí Clean ‚Üí Transform ‚Üí Optimize ‚Üí Store
- Engineering discipline: requirements analysis, design decisions, quality metrics, testing

**2. The Data Engineering Pipeline**
- Extract: Get raw data from sources
- Clean: Remove noise, fix inconsistencies
- Transform: Structure for LLM consumption
- Optimize: Reduce tokens, improve clarity
- Store: Choose storage strategy (RAG, Views, Hybrid)

**3. Three Engineering Approaches**
- **RAG:** Semantic search for relevant data (good for specific queries)
- **Structured Views:** Pre-computed summaries (excellent for overviews)
- **Hybrid:** Combine both (best for production)

**4. Chunking is an Engineering Decision**
- **Don't chunk** if data is already small and complete (< 500 tokens)
- **Do chunk** if documents are long (> 1000 tokens) or multi-topic
- Four strategies: Document-Based, Fixed-Size, Semantic, Hierarchical
- Choose based on YOUR data characteristics, query patterns, and constraints

**5. Production Pipeline Architectures**
- **Request-Time:** Process on-the-fly (simple, always fresh, higher latency)
- **Batch:** Pre-process in batches (fast queries, can be stale)
- **Event-Driven:** Process on changes (real-time, complex infrastructure)

### The Engineering Mindset

Every decision should be based on **YOUR specific requirements:**

1. **Analyze YOUR data:** Size, structure, update frequency, topic density
2. **Analyze YOUR queries:** Specific vs. overview, single vs. cross-section
3. **Analyze YOUR constraints:** Token budget, latency, quality requirements
4. **Make informed decisions:** Choose approaches that match YOUR needs
5. **Measure and iterate:** Test with real queries, measure quality, optimize

**Remember:** There is no "best practice" that works for everyone. Context engineering is about making deliberate, informed choices based on YOUR domain, application, and constraints.

---

## Part 6: Quality Optimization - Measuring and Improving Context

### The Systematic Optimization Process

Now that you understand data engineering and production pipelines, let's learn how to systematically optimize context quality.

**The Process:**
```
1. Define Quality Metrics (domain-specific)
   ‚Üì
2. Establish Baseline (measure current performance)
   ‚Üì
3. Experiment (try different approaches)
   ‚Üì
4. Measure (compare against metrics)
   ‚Üì
5. Iterate (refine based on results)
```

---

### Step 1: Define Quality Metrics for YOUR Domain

**The Problem with Generic Metrics:**

Don't aim for "95% accuracy on benchmark X" - that benchmark wasn't designed for YOUR domain.

**DO this instead:** Define what "quality" means for YOUR domain, then measure it.

### The Four Quality Dimensions

Every context engineering solution should be evaluated across four dimensions:

1. **Relevance** - Does context include information needed to answer the query?
2. **Completeness** - Does context include ALL necessary information?
3. **Efficiency** - Is context optimized for token usage?
4. **Accuracy** - Is context factually correct and up-to-date?

Different domains prioritize these differently.

---

### Example: Quality Metrics for Redis University Course Advisor

Let's define specific, measurable quality metrics for our course advisor domain.

In [24]:
# Define domain-specific quality metrics

quality_metrics = {
    "Relevance": {
        "definition": "Does context include courses relevant to the user's query?",
        "metric": "% of queries where retrieved courses match query intent",
        "measurement": "Manual review of 50 sample queries",
        "target": ">90%",
        "why_important": "Irrelevant courses waste tokens and confuse users",
    },
    "Completeness": {
        "definition": "Does context include all information needed to answer?",
        "metric": "% of responses that mention all prerequisites when asked",
        "measurement": "Automated check: parse response for prerequisite mentions",
        "target": "100%",
        "why_important": "Missing prerequisites leads to hallucinations",
    },
    "Efficiency": {
        "definition": "Is context optimized for token usage?",
        "metric": "Average tokens per query",
        "measurement": "Token counter on all context strings",
        "target": "<5,000 tokens",
        "why_important": "Exceeding budget increases cost and latency",
    },
    "Accuracy": {
        "definition": "Is context factually correct and up-to-date?",
        "metric": "% of responses with correct course information",
        "measurement": "Manual review against course database",
        "target": ">95%",
        "why_important": "Incorrect information damages trust",
    },
}

print(
    """QUALITY METRICS FOR REDIS UNIVERSITY COURSE ADVISOR
‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ"""
)
for dimension, details in quality_metrics.items():
    print(
        f"""
{dimension}:
  Definition: {details['definition']}
  Metric: {details['metric']}
  How to measure: {details['measurement']}
  Target: {details['target']}
  Why important: {details['why_important']}"""
    )
print("\n" + "‚îÅ" * 80)

QUALITY METRICS FOR REDIS UNIVERSITY COURSE ADVISOR
‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ

Relevance:
  Definition: Does context include courses relevant to the user's query?
  Metric: % of queries where retrieved courses match query intent
  How to measure: Manual review of 50 sample queries
  Target: >90%
  Why important: Irrelevant courses waste tokens and confuse users

Completeness:
  Definition: Does context include all information needed to answer?
  Metric: % of responses that mention all prerequisites when asked
  How to measure: Automated check: parse response for prerequisite mentions
  Target: 100%
  Why important: Missing prerequisites leads to hallucinations

Efficiency:
  Definition: Is context optimized for token usage?
  Metric: Average tokens per query
  How to measure: Token coun

### Key Insight: Metrics Must Be Domain-Specific

Notice how these metrics are specific to the course advisor domain:

**Relevance metric:**
- ‚ùå Generic: "Cosine similarity > 0.8"
- ‚úÖ Domain-specific: "Retrieved courses match query intent"

**Completeness metric:**
- ‚ùå Generic: "Context includes top-5 search results"
- ‚úÖ Domain-specific: "All prerequisites mentioned when asked"

**Efficiency metric:**
- ‚ùå Generic: "Minimize tokens"
- ‚úÖ Domain-specific: "<5,000 tokens (fits our budget)"

**Accuracy metric:**
- ‚ùå Generic: "95% on MMLU benchmark"
- ‚úÖ Domain-specific: "Correct course information vs. database"

**Your metrics should reflect YOUR domain's requirements, not generic benchmarks.**

---

### Step 2-5: Baseline ‚Üí Experiment ‚Üí Measure ‚Üí Iterate

Let's demonstrate the optimization process with a concrete example.

**Scenario:** We want to optimize our hybrid approach (catalog overview + RAG) to meet all quality targets.

In [25]:
# Step 2: Establish Baseline (Hybrid Approach from Part 3)

# Sample query
test_query = "What machine learning courses are available for beginners?"

# Hybrid approach: Catalog overview + RAG
catalog_overview = """Redis University Course Catalog Overview:

Computer Science Department:
- RU101: Introduction to Redis Data Structures (Beginner, 4-6 hours)
- RU201: Redis for Python Developers (Intermediate, 6-8 hours)
- RU301: Vector Similarity Search with Redis (Advanced, 8-10 hours)

Data Science Department:
- RU401: Machine Learning with Redis (Intermediate, 10-12 hours)
- RU402: Real-Time Analytics with Redis (Advanced, 8-10 hours)
"""

# RAG: Get specific courses
rag_results = await course_manager.search_courses(test_query, limit=2)
rag_context = "\n\n".join(
    [
        f"""{course.course_code}: {course.title} ({course.difficulty_level.value})
Description: {course.description}
Prerequisites: {', '.join([p.course_code for p in course.prerequisites]) if course.prerequisites else 'None'}"""
        for course in rag_results
    ]
)

# Combined context
baseline_context = f"""{catalog_overview}

Detailed Course Information:
{rag_context}"""

baseline_tokens = count_tokens(baseline_context)

print(
    f"""BASELINE (Hybrid Approach):
‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ
Tokens: {baseline_tokens:,}

Context:
{baseline_context}
"""
)

21:56:41 httpx INFO   HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


BASELINE (Hybrid Approach):
‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ
Tokens: 177

Context:
Redis University Course Catalog Overview:

Computer Science Department:
- RU101: Introduction to Redis Data Structures (Beginner, 4-6 hours)
- RU201: Redis for Python Developers (Intermediate, 6-8 hours)
- RU301: Vector Similarity Search with Redis (Advanced, 8-10 hours)

Data Science Department:
- RU401: Machine Learning with Redis (Intermediate, 10-12 hours)
- RU402: Real-Time Analytics with Redis (Advanced, 8-10 hours)


Detailed Course Information:
CS007: Machine Learning (advanced)
Description: Introduction to machine learning algorithms and applications. Supervised and unsupervised learning, neural networks.
Prerequisites: None

MATH026: Linear Algebra (intermediate)
Description: Vector spaces, matrices, e

In [26]:
# Step 3: Experiment - Try optimized version

# Optimization: Reduce catalog overview to just relevant departments
optimized_catalog = """Redis University - Relevant Departments:

Data Science:
- RU401: Machine Learning with Redis (Intermediate)
- RU402: Real-Time Analytics (Advanced)

Computer Science:
- RU301: Vector Search (Advanced)
"""

optimized_context = f"""{optimized_catalog}

{rag_context}"""

optimized_tokens = count_tokens(optimized_context)

print(
    f"""EXPERIMENT (Optimized Hybrid):
‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ
Tokens: {optimized_tokens:,}

Context:
{optimized_context}

‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ
Token Reduction: {baseline_tokens - optimized_tokens:,} tokens ({((baseline_tokens - optimized_tokens) / baseline_tokens * 100):.1f}% reduction)
"""
)

EXPERIMENT (Optimized Hybrid):
‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ
Tokens: 111

Context:
Redis University - Relevant Departments:

Data Science:
- RU401: Machine Learning with Redis (Intermediate)
- RU402: Real-Time Analytics (Advanced)

Computer Science:
- RU301: Vector Search (Advanced)


CS007: Machine Learning (advanced)
Description: Introduction to machine learning algorithms and applications. Supervised and unsupervised learning, neural networks.
Prerequisites: None

MATH026: Linear Algebra (intermediate)
Description: Vector spaces, matrices, eigenvalues, and linear transformations. Essential for data science and engineering.
Prerequisites: None

‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚î

In [27]:
# Step 4: Measure - Compare responses

# Baseline response
messages_baseline = [
    SystemMessage(content=f"You are a Redis University course advisor.\n\n{baseline_context}"),
    HumanMessage(content=test_query),
]
response_baseline = llm.invoke(messages_baseline)

# Optimized response
messages_optimized = [
    SystemMessage(
        content=f"You are a Redis University course advisor.\n\n{optimized_context}"
    ),
    HumanMessage(content=test_query),
]
response_optimized = llm.invoke(messages_optimized)

print(
    f"""BASELINE RESPONSE:
‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ
{response_baseline.content}

OPTIMIZED RESPONSE:
‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ
{response_optimized.content}
"""
)

21:56:46 httpx INFO   HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


21:56:48 httpx INFO   HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


BASELINE RESPONSE:
‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ
In the Redis University course catalog, there isn't a specific machine learning course labeled as "beginner." However, if you're looking to start with foundational knowledge that could be beneficial for machine learning, you might consider taking courses that cover essential prerequisites or related topics. For example, MATH026: Linear Algebra (intermediate) could be a good starting point, as linear algebra is fundamental to understanding many machine learning algorithms. 

For a direct introduction to machine learning concepts, you might consider CS007: Machine Learning (advanced), although it is labeled as advanced, it covers introductory topics in machine learning algorithms and applications. If you're comfortable with the challenge, it co

### Step 5: Iterate - Refine Based on Results

Based on the measurements:

**Quality Assessment:**
- ‚úÖ **Relevance:** Both approaches retrieve relevant ML courses
- ‚úÖ **Completeness:** Both mention prerequisites and difficulty levels
- ‚úÖ **Efficiency:** Optimized version uses fewer tokens ({optimized_tokens} vs {baseline_tokens})
- ‚úÖ **Accuracy:** Both provide correct course information

**Decision:** The optimized hybrid approach meets all quality targets while reducing token usage.

**Next Iteration:** Test with more queries to ensure consistency across different query types.

---

### Key Takeaways: Quality Optimization

1. **Define Domain-Specific Metrics** - Don't rely on generic benchmarks
2. **Measure Systematically** - Baseline ‚Üí Experiment ‚Üí Measure ‚Üí Iterate
3. **Balance Trade-offs** - Relevance vs. Efficiency, Completeness vs. Token Budget
4. **Test Before Production** - Validate with real queries from your domain
5. **Iterate Continuously** - Quality optimization is ongoing, not one-time

**The Engineering Mindset:**
- Context quality is measurable
- Optimization is systematic, not guesswork
- Domain-specific metrics matter more than generic benchmarks
- Testing and iteration are essential

---

## üìù Summary

You've mastered production-ready context engineering:

**Part 1: The Engineering Mindset**
- ‚úÖ Context is data requiring engineering discipline
- ‚úÖ Naive approaches fail in production
- ‚úÖ Engineering mindset: Requirements ‚Üí Transformation ‚Üí Quality ‚Üí Testing

**Part 2: Data Engineering Pipeline**
- ‚úÖ Extract ‚Üí Clean ‚Üí Transform ‚Üí Optimize ‚Üí Store
- ‚úÖ Concrete examples with course data
- ‚úÖ Token optimization techniques

**Part 3: Engineering Approaches**
- ‚úÖ RAG (Semantic Search)
- ‚úÖ Structured Views (Pre-Computed Summaries)
- ‚úÖ Hybrid (Best of Both Worlds)
- ‚úÖ Decision framework for choosing approaches

**Part 4: Chunking Strategies**
- ‚úÖ When to chunk (critical first question)
- ‚úÖ Four strategies with LangChain integration
- ‚úÖ Trade-offs and decision criteria

**Part 5: Production Pipeline Architectures**
- ‚úÖ Request-Time, Batch, Event-Driven
- ‚úÖ Batch processing example with data
- ‚úÖ Decision framework for architecture selection

**Part 6: Quality Optimization**
- ‚úÖ Domain-specific quality metrics
- ‚úÖ Systematic optimization process
- ‚úÖ Baseline ‚Üí Experiment ‚Üí Measure ‚Üí Iterate

**You're now ready to engineer production-ready context for any domain!** üéâ

---

## üöÄ What's Next?

### Section 3: Memory Systems for Context Engineering

Now that you can engineer high-quality retrieved context, you'll learn to manage conversation context:
- **Working Memory:** Track conversation history within a session
- **Long-term Memory:** Remember user preferences across sessions
- **LangGraph Integration:** Manage stateful workflows with checkpointing
- **Redis Agent Memory Server:** Automatic memory extraction and retrieval

### Section 4: Tool Use and Agents

After adding memory, you'll build complete autonomous agents:
- **Tool Calling:** Let the AI use functions (search, enroll, check prerequisites)
- **LangGraph State Management:** Orchestrate complex multi-step workflows
- **Agent Reasoning:** Plan and execute multi-step tasks
- **Production Patterns:** Error handling, retries, and monitoring

```
Section 1: Context Engineering Fundamentals
    ‚Üì
Section 2, NB1: RAG Fundamentals
    ‚Üì
Section 2, NB2: Engineering Context for Production ‚Üê You are here
    ‚Üì
Section 3: Memory Systems for Context Engineering ‚Üê Next
    ‚Üì
Section 4: Tool Use and Agents (Complete System)
```

---

## Additional Resources

**Chunking Strategies:**
- [LangChain Text Splitters](https://python.langchain.com/docs/modules/data_connection/document_transformers/)
- [LlamaIndex Node Parsers](https://docs.llamaindex.ai/en/stable/module_guides/loading/node_parsers/)

**Data Engineering for LLMs:**
- [OpenAI Best Practices](https://platform.openai.com/docs/guides/prompt-engineering)
- [Anthropic Prompt Engineering](https://docs.anthropic.com/claude/docs/prompt-engineering)

**Vector Databases:**
- [Redis Vector Search Documentation](https://redis.io/docs/stack/search/reference/vectors/)
- [RedisVL Python Library](https://github.com/RedisVentures/redisvl)

