# Crafting Data for LLMs: Creating Structured Views

## Introduction

In this advanced notebook, you'll learn how to create structured "views" or "dashboards" of data specifically optimized for LLM consumption. This goes beyond simple chunking and retrieval - you'll pre-compute summaries and organize data in ways that give your agent a high-level understanding while keeping token usage low.

### What You'll Learn

- Why pre-computed views matter
- How to create course catalog summary views
- How to build user profile views
- Techniques for retrieve → summarize → stitch → save
- When to use structured views vs. RAG

### Prerequisites

- Completed all Section 3 notebooks
- Completed Section 4 notebooks 01-03
- Redis 8 running locally
- Agent Memory Server running
- OpenAI API key set

## Concepts: Structured Data Views

### Beyond Chunking and RAG

Traditional approaches:
- **Chunking**: Split documents into pieces, retrieve relevant chunks
- **RAG**: Search for relevant documents/records on each query

These work well, but have limitations:
- ❌ No high-level overview
- ❌ May miss important context
- ❌ Requires search on every request
- ❌ Can't see relationships across data

### Structured Views Approach

**Pre-compute summaries** that give the LLM:
- ✅ High-level overview of entire dataset
- ✅ Organized, structured information
- ✅ Key metadata for finding details
- ✅ Relationships between entities

### Two Key Patterns

#### 1. Course Catalog Summary View

Instead of searching courses every time, give the agent:
```
Course Catalog Overview:

Computer Science (50 courses):
- CS101: Intro to Programming (3 credits, beginner)
- CS201: Data Structures (3 credits, intermediate)
- CS401: Machine Learning (4 credits, advanced)
...

Mathematics (30 courses):
- MATH101: Calculus I (4 credits, beginner)
...
```

**Benefits:**
- Agent knows what's available
- Can reference specific courses
- Can suggest alternatives
- Compact (1-2K tokens for 100s of courses)

#### 2. User Profile View

Instead of searching memories every time, give the agent:
```
Student Profile: student_123

Academic Info:
- Major: Computer Science
- Year: Junior
- GPA: 3.7
- Expected Graduation: Spring 2026

Completed Courses (12):
- CS101 (A), CS201 (A-), CS301 (B+)
- MATH101 (A), MATH201 (B)
...

Preferences:
- Prefers online courses
- Morning classes only
- No classes on Fridays
- Interested in AI/ML

Goals:
- Graduate in 2026
- Focus on machine learning
- Maintain 3.5+ GPA
```

**Benefits:**
- Agent has complete user context
- No need to search memories
- Personalized from turn 1
- Compact (500-1K tokens)

### The Pattern: Retrieve → Summarize → Stitch → Save

1. **Retrieve**: Get all relevant data from storage
2. **Summarize**: Use LLM to create concise summaries
3. **Stitch**: Combine summaries into structured view
4. **Save**: Store as string or JSON blob

### When to Use Structured Views

**Use structured views when:**
- ✅ Data changes infrequently
- ✅ Agent needs overview + details
- ✅ Same data used across many requests
- ✅ Relationships matter

**Use RAG when:**
- ✅ Data changes frequently
- ✅ Dataset is huge (can't summarize all)
- ✅ Only need specific details
- ✅ Query-specific retrieval needed

**Best: Combine both!**
- Structured view for overview
- RAG for specific details

## Setup

In [None]:
import os
import json
import asyncio
from typing import List, Dict, Any
import tiktoken
from langchain_openai import ChatOpenAI
from langchain_core.messages import SystemMessage, HumanMessage
from redis_context_course import CourseManager, MemoryClient, redis_config

# Initialize
course_manager = CourseManager()
# Initialize memory client with proper config
import os
config = MemoryClientConfig(
    base_url=os.getenv("AGENT_MEMORY_URL", "http://localhost:8000"),
    default_namespace="redis_university"
)
memory_client = MemoryClient(config=config)
llm = ChatOpenAI(model="gpt-4o", temperature=0)
tokenizer = tiktoken.encoding_for_model("gpt-4o")

def count_tokens(text: str) -> int:
    return len(tokenizer.encode(text))

print("✅ Setup complete")

## Example 1: Course Catalog Summary View

Let's create a high-level summary of the entire course catalog.

### Step 1: Retrieve All Courses

In [None]:
print("=" * 80)
print("CREATING COURSE CATALOG SUMMARY VIEW")
print("=" * 80)

# Step 1: Retrieve all courses
print("\n1. Retrieving all courses...")
all_courses = await course_manager.get_all_courses()
print(f"   Retrieved {len(all_courses)} courses")

### Step 2: Organize by Department

In [None]:
# Step 2: Organize by department
print("\n2. Organizing by department...")
by_department = {}
for course in all_courses:
    dept = course.department
    if dept not in by_department:
        by_department[dept] = []
    by_department[dept].append(course)

print(f"   Found {len(by_department)} departments")

### Step 3: Summarize Each Department

In [None]:
# Step 3: Summarize each department
print("\n3. Creating summaries for each department...")

async def summarize_department(dept_name: str, courses: List) -> str:
    """Create a concise summary of courses in a department."""
    
    # Build course list
    course_list = "\n".join([
        f"- {c.course_code}: {c.title} ({c.credits} credits, {c.difficulty_level.value})"
        for c in courses[:10]  # Limit for demo
    ])
    
    # Ask LLM to create one-sentence descriptions
    prompt = f"""Create a one-sentence description for each course. Be concise.

Courses:
{course_list}

Format: COURSE_CODE: One sentence description
"""
    
    messages = [
        SystemMessage(content="You are a helpful assistant that creates concise course descriptions."),
        HumanMessage(content=prompt)
    ]
    
    response = llm.invoke(messages)
    return response.content

# Summarize first 3 departments (for demo)
dept_summaries = {}
for dept_name in list(by_department.keys())[:3]:
    print(f"   Summarizing {dept_name}...")
    summary = await summarize_department(dept_name, by_department[dept_name])
    dept_summaries[dept_name] = summary
    await asyncio.sleep(0.5)  # Rate limiting

print(f"   Created {len(dept_summaries)} department summaries")

### Step 4: Stitch Into Complete View

In [None]:
# Step 4: Stitch into complete view
print("\n4. Stitching into complete catalog view...")

catalog_view_parts = ["Redis University Course Catalog\n" + "=" * 40 + "\n"]

for dept_name, summary in dept_summaries.items():
    course_count = len(by_department[dept_name])
    catalog_view_parts.append(f"\n{dept_name} ({course_count} courses):")
    catalog_view_parts.append(summary)

catalog_view = "\n".join(catalog_view_parts)

print(f"   View created!")
print(f"   Total tokens: {count_tokens(catalog_view):,}")

### Step 5: Save to Redis

In [None]:
# Step 5: Save to Redis
print("\n5. Saving to Redis...")

redis_client = redis_config.get_redis_client()
redis_client.set("course_catalog_view", catalog_view)

print("   ✅ Saved to Redis as 'course_catalog_view'")

# Display the view
print("\n" + "=" * 80)
print("COURSE CATALOG VIEW")
print("=" * 80)
print(catalog_view)
print("\n" + "=" * 80)

### Using the Catalog View

In [None]:
# Load and use the view
print("\nUsing the catalog view in an agent...\n")

catalog_view = redis_client.get("course_catalog_view").decode('utf-8')

system_prompt = f"""You are a class scheduling agent for Redis University.

{catalog_view}

Use this overview to help students understand what's available.
For specific course details, you can search the full catalog.
"""

user_query = "What departments offer courses? I'm interested in computer science."

messages = [
    SystemMessage(content=system_prompt),
    HumanMessage(content=user_query)
]

response = llm.invoke(messages)

print(f"User: {user_query}")
print(f"\nAgent: {response.content}")
print("\n✅ Agent has high-level overview of entire catalog!")

## Example 2: User Profile View

Let's create a comprehensive user profile from various data sources.

### Step 1: Retrieve User Data

In [None]:
print("\n" + "=" * 80)
print("CREATING USER PROFILE VIEW")
print("=" * 80)

# Step 1: Retrieve user data from various sources
print("\n1. Retrieving user data...")

# Simulate user data (in production, this comes from your database)
user_data = {
    "student_id": "student_123",
    "name": "Alex Johnson",
    "major": "Computer Science",
    "year": "Junior",
    "gpa": 3.7,
    "expected_graduation": "Spring 2026",
    "completed_courses": [
        {"code": "CS101", "title": "Intro to Programming", "grade": "A"},
        {"code": "CS201", "title": "Data Structures", "grade": "A-"},
        {"code": "CS301", "title": "Algorithms", "grade": "B+"},
        {"code": "MATH101", "title": "Calculus I", "grade": "A"},
        {"code": "MATH201", "title": "Calculus II", "grade": "B"},
    ],
    "current_courses": [
        "CS401", "CS402", "MATH301"
    ]
}

# Get memories
memories = await memory_client.search_memories(
    query="",  # Get all
    limit=20
)

print(f"   Retrieved user data and {len(memories)} memories")

### Step 2: Summarize Each Section

In [None]:
# Step 2: Create summaries for each section
print("\n2. Creating section summaries...")

# Academic info (structured, no LLM needed)
academic_info = f"""Academic Info:
- Major: {user_data['major']}
- Year: {user_data['year']}
- GPA: {user_data['gpa']}
- Expected Graduation: {user_data['expected_graduation']}
"""

# Completed courses (structured)
completed_courses = "Completed Courses (" + str(len(user_data['completed_courses'])) + "):\n"
completed_courses += "\n".join([
    f"- {c['code']}: {c['title']} (Grade: {c['grade']})"
    for c in user_data['completed_courses']
])

# Current courses
current_courses = "Current Courses:\n- " + ", ".join(user_data['current_courses'])

# Summarize memories with LLM
if memories:
    memory_text = "\n".join([f"- {m.text}" for m in memories[:10]])
    
    prompt = f"""Summarize these student memories into two sections:
1. Preferences (course format, schedule, etc.)
2. Goals (academic, career, etc.)

Be concise. Use bullet points.

Memories:
{memory_text}
"""
    
    messages = [
        SystemMessage(content="You are a helpful assistant that summarizes student information."),
        HumanMessage(content=prompt)
    ]
    
    response = llm.invoke(messages)
    preferences_and_goals = response.content
else:
    preferences_and_goals = "Preferences:\n- None recorded\n\nGoals:\n- None recorded"

print("   Created all section summaries")

### Step 3: Stitch Into Profile View

In [None]:
# Step 3: Stitch into complete profile
print("\n3. Stitching into complete profile view...")

profile_view = f"""Student Profile: {user_data['student_id']}
{'=' * 50}

{academic_info}

{completed_courses}

{current_courses}

{preferences_and_goals}
"""

print(f"   Profile created!")
print(f"   Total tokens: {count_tokens(profile_view):,}")

### Step 4: Save as JSON

In [None]:
# Step 4: Save to Redis (as JSON for structured access)
print("\n4. Saving to Redis...")

profile_data = {
    "student_id": user_data['student_id'],
    "profile_text": profile_view,
    "last_updated": "2024-09-30",
    "token_count": count_tokens(profile_view)
}

redis_client.set(
    f"user_profile:{user_data['student_id']}",
    json.dumps(profile_data)
)

print(f"   ✅ Saved to Redis as 'user_profile:{user_data['student_id']}'")

# Display the profile
print("\n" + "=" * 80)
print("USER PROFILE VIEW")
print("=" * 80)
print(profile_view)
print("=" * 80)

### Using the Profile View

In [None]:
# Load and use the profile
print("\nUsing the profile view in an agent...\n")

profile_json = json.loads(redis_client.get(f"user_profile:{user_data['student_id']}").decode('utf-8'))
profile_text = profile_json['profile_text']

system_prompt = f"""You are a class scheduling agent for Redis University.

{profile_text}

Use this profile to provide personalized recommendations.
"""

user_query = "What courses should I take next semester?"

messages = [
    SystemMessage(content=system_prompt),
    HumanMessage(content=user_query)
]

response = llm.invoke(messages)

print(f"User: {user_query}")
print(f"\nAgent: {response.content}")
print("\n✅ Agent has complete user context from turn 1!")

## Key Takeaways

### The Pattern: Retrieve → Summarize → Stitch → Save

1. **Retrieve**: Get all relevant data
   - From databases, APIs, memories
   - Organize by category/section

2. **Summarize**: Create concise summaries
   - Use LLM for complex data
   - Use templates for structured data
   - Keep it compact (one-sentence descriptions)

3. **Stitch**: Combine into complete view
   - Organize logically
   - Add headers and structure
   - Format for LLM consumption

4. **Save**: Store for reuse
   - Redis for fast access
   - String or JSON format
   - Include metadata (timestamp, token count)

### When to Refresh Views

**Course Catalog View:**
- When courses are added/removed
- When descriptions change
- Typically: Daily or weekly

**User Profile View:**
- When user completes a course
- When preferences change
- When new memories are added
- Typically: After each session or daily

### Scheduling Considerations

In production, you'd use:
- **Cron jobs** for periodic updates
- **Event triggers** for immediate updates
- **Background workers** for async processing

For this course, we focus on the **function-level logic**, not the scheduling infrastructure.

### Benefits of Structured Views

✅ **Performance:**
- No search needed on every request
- Pre-computed, ready to use
- Fast retrieval from Redis

✅ **Quality:**
- Agent has complete overview
- Better context understanding
- More personalized responses

✅ **Efficiency:**
- Compact token usage
- Organized information
- Easy to maintain

### Combining with RAG

**Best practice: Use both!**

```python
# Load structured views
catalog_view = load_catalog_view()
profile_view = load_profile_view(user_id)

# Add targeted RAG
relevant_courses = search_courses(query, limit=3)

# Combine
context = f"""
{catalog_view}

{profile_view}

Relevant courses for this query:
{relevant_courses}
"""
```

This gives you:
- Overview (from views)
- Personalization (from profile)
- Specific details (from RAG)

## Exercises

1. **Create a department view**: Build a detailed view for a single department with all its courses.

2. **Build a schedule view**: Create a view of a student's current schedule with times, locations, and conflicts.

3. **Optimize token usage**: Experiment with different summary lengths. What's the sweet spot?

4. **Implement refresh logic**: Write a function that determines when a view needs to be refreshed.

## Summary

In this notebook, you learned:

- ✅ Structured views provide high-level overviews for LLMs
- ✅ The pattern: Retrieve → Summarize → Stitch → Save
- ✅ Course catalog views give agents complete course knowledge
- ✅ User profile views enable personalization from turn 1
- ✅ Combine views with RAG for best results

**Key insight:** Pre-computing structured views is an advanced technique that goes beyond simple RAG. It gives your agent a "mental model" of the domain, enabling better understanding and more intelligent responses.