![Redis](https://redis.io/wp-content/uploads/2024/04/Logotype.svg?auto=webp&quality=85,75&width=120)

<a href="https://colab.research.google.com/github/redislabs-training/ce-redis-langchain/blob/main/section-1-context-engineering-foundations/01_Introduction_to_context_engineering.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



---

# What Is Context Engineering?

## The Problem

Imagine you walk into a doctor's office. The doctor has never met you before, doesn't have access to your medical records, and can't remember anything you said five minutes ago. Every time you visit, it's like the first time all over again.

Sound frustrating? That's what AI agents are like without context engineering.


**Context engineering** is the discipline of designing, shaping, and delivering the right information to an AI model, via prompts, retrieval, memory, schemas, system instructions, grounding data, and constraints, so the model behaves predictably, accurately, and safely.


**The quality of an AI system‚Äôs output is directly determined by the quality, completeness, and structure of the information (context) you give it**. 

- As models become more capable
- The competitive differentiator shifts from the model itself to 
- how well you can control and feed it the right information at the right time

**Context engineering matters because it:**

1. Gives AI the information it needs to be accurate
2. Specializes general models for your domain
3. Reduces hallucinations
4. Ensures consistency and reliability
5. Enables complex AI applications
6. Reduces cost and increases efficiency
7. Enforces safety and governance

---
Visually, context can contain something like this:

![Context engineering overview](../public/ce-overview.png)


## Context Engineering in Action

Lets explore this through a common scenario we want the agent to deal with: 
- A student seeking advice on what classes to take to help build a career in AI. 

We'll use a sample student profile as our base in the LLM's context: 

- **Name**: Olivia Jansen
- **Major**: Computer Science
- **Year**: Sophomore
- **Completed**: Intro to Programming (CS101), Data Structures (CS201), Calculus I
- **Interests**: Machine learning, Data science
- **Preferences**: Prefers online courses, learns best with hands-on projects
- **Goal**: Build a career in AI

Olivia goes ahead and interacts with the agent and has the following results:

<details>
  <summary>üî∂ Click the dropdown to review the conversation with the agent</summary>
  
**Turn 1:**

*Student:* "I'm interested in machine learning courses"

*Agent:* "Here are all machine learning courses available:
- CS301: Introduction to Machine Learning
- CS401: Advanced Machine Learning
- CS402: Deep Learning
- CS403: Natural Language Processing
- CS404: Computer Vision

All courses meet twice per week on campus."

---

**Turn 2:**

*Student:* "I prefer online courses."

*Agent:* "Here are all machine learning courses available:
- CS301: Introduction to Machine Learning
- CS401: Advanced Machine Learning
..."

Notice that the result is the same list, ignoring Olivia's preferences.

---

**Turn 3:**

*Student:* "Can I take CS401?"

*Agent:* "CS401 requires CS301 as a prerequisite. Do you meet this requirement?"

The agent does not respond adequately as it doesn't know what courses the student has completed

---

**Turn 4:**

*Student:* "What about that intro course you mentioned earlier?"

*Agent:* "I'm not sure which course you're referring to. Could you provide more details?"

The agent does not remember its old replies and has no context on what intro course was mentioned. 

---
</details>

### The Problems Are Clear

**No Memory:**
- Can't remember what it just said
- Can't reference earlier parts of the conversation
- Forces the user to repeat information

**No Personalization:**
- Ignores user preferences and background
- Gives generic responses regardless of the user's situation
- Can't tailor recommendations

**No Awareness:**
- Doesn't know the user's history or capabilities
- Can't check prerequisites or eligibility
- Can't connect to other relevant information

**Result:** A frustrating experience that wastes time and provides little value.

## With Context Engineering

Now, let's examine the agent, but with proper context management: 

<details>
    <summary>üî∂ Click the dropdown to review the conversation with the agent with proper context management </summary>

**Turn 1:**

*Student:* "I'm interested in machine learning courses."

*Agent:* "Great! Based on your Computer Science major and the courses you've completed (CS101 and CS201), you're ready for CS301: Introduction to Machine Learning.

Since you prefer online courses and hands-on learning, I should mention that CS301 has:
- A fully online section is available
- Weekly programming assignments with real datasets
- A final project where you build your own ML system

This course aligns perfectly with your goal of building an AI career and serves as the foundation for more advanced ML courses."

---

**Turn 2:**

*Student:* "That sounds perfect! What comes after that?"

*Agent:* "After completing CS301, you'll have several great options:

**Immediately Available:**
- CS402: Deep Learning (Online, project-based)
  Focus: Neural networks, CNNs, RNNs
  
- CS403: Natural Language Processing (Online, project-based)
  Focus: Text analysis, transformers, LLMs

**Recommended Path for AI Career:**
1. CS301 (Introduction) ‚Üê Start here
2. CS402 (Deep Learning) ‚Üê Most relevant to modern AI
3. CS403 (NLP) or CS404 (Computer Vision) ‚Üê Specialize

Given your interest in data science, I'd especially recommend the NLP path - it combines well with your data structures background."

---

**Turn 3:**

*Student:* "How long will that path take?"

*Agent:* "Let me break it down for you:

**CS301:** 14 weeks (3 credits)
**CS402:** 14 weeks (3 credits)  
**CS403:** 14 weeks (3 credits)

If you take one course per semester:
- Fall 2024: CS301
- Spring 2025: CS402
- Fall 2025: CS403

Since you're currently a Junior, this timeline would have you completing this sequence before graduation, with one semester remaining for an internship or capstone project.

Alternatively, since these are online courses, you could potentially take CS301 this fall and CS402 in a winter intersession to accelerate your path."

---

**Turn 4:**

*Student:* "I like that accelerated option. Can I register now?"

*Agent:* "Yes! Since you've met the prerequisites for CS301, I can guide you through registration. The online section (CS301-W01) has 15 seats remaining.

Would you like me to:
1. Reserve a seat for you in CS301-W01 for Fall 2024?
2. Add you to the notification list for the Winter CS402 section (registration opens in October)?
3. Send you the course syllabus and professor information?"

</details>

**With context engineering:**

- it remembers who the student is (major, completions, preferences)
- filters to relevant options (online, project-based ML)
- checks prerequisites automatically,
- turns choices into a sequenced plan and timeline
- can even propose recommended actions (register, notify, send syllabi).

**Result:** The result is a fast, personalized, high-confidence experience that actually delivers value.




## Building a complete context: the four pillars

We can breakdown the context that goes into an AI application into four distinct types: 

1. System context
2. User context
3. Conversation context
4. Retrieved context 

Each of these types plays a crucial role in making sure that an AI system has what it needs to properly respond. In this section, we'll explore each context type in more detail and setup a sample context to call a LLM (OpenAI). As you go through, make sure to run each code block that is provided as we build up the context and send it to a LLM at the end.

### 1. System Context: "What Am I?"

The system context defines the AI's identity, capabilities, and knowledge. It is static, universal, and always present.

It typically contains: 
- A role definition ("You are a course advisor")
- Available tools and actions (`search_courses(query)`)
- Domain knowledge and business rules ("CS401 requires CS301")
- Behavioral guidelines ("Ask one clarifying question at a time")

Below is an example of system context (also commonly called the system prompt) for our Redis University agent. Run the codeblock to create the context. We'll use it shortly when we assemble the full context.

In [1]:
system_context = """You are a Redis University course advisor.

Guidelines:
1. Always provide specific course recommendations with clear reasoning
2. Consider the student's background, completed courses, and interests
3. Ensure prerequisites are met before recommending advanced courses
4. Be encouraging and supportive in your guidance
5. Do not offer courses that are not part of the curriculum
"""

print("System context created successfully!")
print(system_context)

System context created successfully!
You are a Redis University course advisor.

Guidelines:
1. Always provide specific course recommendations with clear reasoning
2. Consider the student's background, completed courses, and interests
3. Ensure prerequisites are met before recommending advanced courses
4. Be encouraging and supportive in your guidance
5. Do not offer courses that are not part of the curriculum



### 2. User Context: "Who Are You?"

The user context contains personal information about the specific user. It is typically dynamic, personalized, and retrieved from storage.

**Retrieval Methods:**
- **Direct lookup** - Fetch user profile by user ID (structured data like name, major, year)
- **Search (RAG)** - Find relevant user facts using semantic search over long-term memories (Module 4)

It typically contains:
- Profile information (major, year, background)
- Preferences and learning style ("prefers online/asynchronous")
- History and achievements ("completed CS101/CS201/Calc I")
- Goals and constraints ("Tue/Thu availability")
- Long-term memories (facts learned about the user over time)

We have already seen an example of user context when we defined the user profile and preferences for Olivia earlier. Run the code block below to create the sample profile in the notebook.

In [13]:
olivia_profile = {
    "name": "Olivia Jansen",
    "background": "Python developer, 2 years experience",
    "completed_courses": ["RU101"],
    "interests": ["machine learning", "data science", "vector search"],
    "time_availability": "evenings and weekends",
    "goal": "Build a RAG system for my company's documentation",
}

print("User context for Olivia created successfully!")
print(olivia_profile)

User context for Olivia created successfully!
{'name': 'Olivia Jansen', 'background': 'Python developer, 2 years experience', 'completed_courses': ['RU101'], 'interests': ['machine learning', 'data science', 'vector search'], 'time_availability': 'evenings and weekends', 'goal': "Build a RAG system for my company's documentation"}


Note that in production, the above context would likely come from a database, but we have hardcoded it above for demonstration purposes. 

LLM API's usually require message content to be a string, so we need to convert our dictionary to text. We have two options:

1. Send it as JSON string: `json.dumps(olivia_profile)`
2. Format it as natural language: Use a formatting function for better readability and control

We'll use option 2 because it's more readable and gives us precise control over how the information is presented to the LLM. This type of data transformation ‚Äî converting structured data into LLM-friendly formats ‚Äî is actually a core context engineering technique that we'll explore in further detail throughout this course.

Now, run the code block below to create the function and format Olivia's profile.

In [14]:
def format_user_context(profile):
    """Convert user profile dictionary to formatted context string"""
    return f"""Student Profile:
    - Name: {profile['name']}
    - Background: {profile['background']}
    - Completed Courses: {', '.join(profile['completed_courses'])}
    - Interests: {', '.join(profile['interests'])}
    - Availability: {profile['time_availability']}
    - Goal: {profile['goal']}
    """

user_context = format_user_context(olivia_profile)
print("Formatted Context:", user_context)

Formatted Context: Student Profile:
    - Name: Olivia Jansen
    - Background: Python developer, 2 years experience
    - Completed Courses: RU101
    - Interests: machine learning, data science, vector search
    - Availability: evenings and weekends
    - Goal: Build a RAG system for my company's documentation
    


### 3. Conversation Context: "What Have We Discussed?"

Conversation context is one of the most important parts of context since it represents the history of the current conversation. Typically this is implemented in the form of memory. It is dynamic, session-specific, and will grow over time.

**Retrieval Methods:**
- **Direct lookup** - Fetch all messages by session ID (working memory - Module 4)
- **Search (RAG)** - Find relevant past exchanges using semantic search over conversation history (advanced)

It typically contains:
- Previous user messages
- Previous AI responses
- Decisions and commitments made
- Topics explored

We have also seen an example of conversation context earlier:
```
Turn 1:
User: "I'm interested in machine learning courses."
AI: "I recommend CS301: Introduction to Machine Learning..."

Turn 2:
User: "What comes after that course?"
[Current query - needs context from Turn 1 to understand "that course"]
AI: "After CS301, you can take CS402 or CS403..."

Turn 3:
User: "How long will that path take?"
[Current query - needs context from Turn 2 to understand "that path"]
```

In this notebook, since we don't have a chat interface, we're going to simulate conversation history by manually creating a list of message dictionaries. In a real application, this list would be dynamically built as users interact with the system, but here we'll pre-populate it to demonstrate how conversation context works.

Examine and then run the following codeblock (it will execute silently as we will observe it in the next codeblock):

In [15]:
# Step 1: Start with an empty conversation history
conversation_history = []

# Step 2: Add the first user message
conversation_history.append(
    {"role": "user", "content": "What Redis course should I take next?"}
)

# Step 3: Add the AI's response (simulated)
conversation_history.append(
    {
        "role": "assistant",
        "content": """Based on your Python background and completion of RU101, 
I recommend RU201: Redis for Python Developers. This course will teach you 
how to build Redis applications using redis-py, which aligns perfectly with 
your goal of building a RAG system.""",
    }
)

# Step 4: Add a follow-up question that references previous context
conversation_history.append(
    {"role": "user", "content": "How long will that take me to complete?"}
)

Notice the user said "that" instead of "RU201". The AI needs the conversation history to understand what "that" refers to. Let's view the complete conversation history:

In [16]:
for i, msg in enumerate(conversation_history, 1):
    print(f"Turn {i} ({msg['role']}):")
    print(f"{msg['content']}\n")

Turn 1 (user):
What Redis course should I take next?

Turn 2 (assistant):
Based on your Python background and completion of RU101, 
I recommend RU201: Redis for Python Developers. This course will teach you 
how to build Redis applications using redis-py, which aligns perfectly with 
your goal of building a RAG system.

Turn 3 (user):
How long will that take me to complete?



### 4. Retrieved Context: "What Information Is Relevant?"

Lastly, there is retrieved context. This is information fetched on-demand based on the current query using search or lookup operations.

**Retrieval Methods:**
- **Direct lookup** - Fetch specific records by ID (e.g., "Get course CS301")
- **Search (RAG)** - Find relevant information using semantic/keyword search (e.g., "Find courses about machine learning")

It typically contains:
- Database records (course details, schedules)
- Search results (relevant documents, FAQs)
- API responses (real-time data, availability)
- Computed information (eligibility checks, recommendations)

For example, if a student asked the Redis University agent about "machine learning courses", the system would need to find relevant context to respond to the query. It would search the course catalog using semantic similarity, find ML-related courses, and retrieve information like course details (description, prerequisites, format), current availability (seats in sections), and instructor information.

**Note:** In Module 2, you'll learn how to implement Retrieved Context using RAG (Retrieval Augmented Generation) - a powerful pattern for semantic and keyword search over large knowledge bases.

Let's simulate what retrieved context will look like by hard-coding course content. Run the codeblock below (it will execute silently):

In [17]:
course_database = {
    "RU101": {
        "title": "Introduction to Redis Data Structures",
        "level": "Beginner",
        "description": "Master Redis fundamentals: strings, hashes, lists, sets, and sorted sets",
        "duration": "4-6 hours",
        "prerequisites": [],
        "topics": ["Data structures", "Basic commands", "Use cases"],
    },
    "RU201": {
        "title": "Redis for Python Developers",
        "level": "Intermediate",
        "description": "Build production Redis applications with Python and redis-py",
        "duration": "6-8 hours",
        "prerequisites": ["RU101", "Python experience"],
        "topics": ["redis-py library", "Connection pooling", "Pipelining", "Pub/Sub"],
    },
    "RU202": {
        "title": "Redis for Java Developers",
        "level": "Intermediate",
        "description": "Build production Redis applications with Java and Jedis",
        "duration": "6-8 hours",
        "prerequisites": ["RU101", "Java experience"],
        "topics": [
            "Jedis library",
            "Connection pooling",
            "Transactions",
            "Redis Streams",
        ],
    },
    "RU301": {
        "title": "Vector Similarity Search with Redis",
        "level": "Advanced",
        "description": "Implement semantic search and RAG systems with Redis vector capabilities",
        "duration": "8-10 hours",
        "prerequisites": ["RU201 or RU202", "ML/AI interest"],
        "topics": [
            "Vector embeddings",
            "Semantic search",
            "RAG architecture",
            "Hybrid search",
        ],
    },
}

Now let's simulate retrieving course information based on a query. Just like before, we'll need to transform the data. Run the code block below to create the function and simulate the retrieval.

In [18]:
# Step 2: Create a retrieval function

def retrieve_course_info(course_code):
    """Retrieve detailed information about a specific course"""
    course = course_database.get(course_code)
    if not course:
        return None

    return f"""Course Details:
        Code: {course_code}
        Title: {course['title']}
        Level: {course['level']}
        Description: {course['description']}
        Duration: {course['duration']}
        Prerequisites: {', '.join(course['prerequisites']) if course['prerequisites'] else 'None'}
        Topics Covered: {', '.join(course['topics'])}
    """


# Retrieve information about RU201
retrieved_context = retrieve_course_info("RU201")
print(retrieved_context)

Course Details:
        Code: RU201
        Title: Redis for Python Developers
        Level: Intermediate
        Description: Build production Redis applications with Python and redis-py
        Duration: 6-8 hours
        Prerequisites: RU101, Python experience
        Topics Covered: redis-py library, Connection pooling, Pipelining, Pub/Sub
    


### Understanding Retrieval Methods Across Context Types

Now that we've seen all four context types, let's clarify how information is retrieved for each:

| Context Type | What It Contains | How It's Retrieved | When RAG Applies |
|--------------|------------------|-------------------|------------------|
| **System Context** | Role, instructions, rules | Static (hardcoded) | Never - always included |
| **User Context** | Profile + memories | **Lookup** (profile by user ID)<br>**+ RAG** (search memories) | When searching long-term memories (Module 4) |
| **Conversation Context** | Chat history | **Lookup** (messages by session ID)<br>**+ RAG** (search history) | When searching long conversations (advanced) |
| **Retrieved Context** | Domain knowledge | **RAG** (semantic/keyword search) | Always - this is the primary RAG use case (Module 2) |

**Key Insight:**
- **Lookup** = "Get me the record with this ID" (direct, fast, returns one specific item)
- **RAG** = "Find me the most relevant items for this query" (search, returns top-k matches)

**In this module**, we use simple lookup for User Context (hardcoded profile) and Conversation Context (hardcoded history).

**In Module 2**, you'll implement RAG for Retrieved Context (search over courses).

**In Module 4**, you'll see RAG applied to User Context (search over long-term memories).

---


### Bringing It All Together: Complete Context Integration

Now that we understand each context type individually, let's see how they work together to create an intelligent, personalized response.

Here's how all four context types combine in a single LLM call:

```
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ           COMPLETE LLM REQUEST              ‚îÇ
‚îú‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î§
‚îÇ 1. SYSTEM CONTEXT (Static)                  ‚îÇ
‚îÇ    - Role: "You are a course advisor"       ‚îÇ
‚îÇ    - Domain: Available courses              ‚îÇ
‚îÇ    - Rules: Guidelines and constraints      ‚îÇ
‚îú‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î§
‚îÇ 2. USER CONTEXT (Dynamic - User Specific)   ‚îÇ
‚îÇ    - Profile: Sarah Chen, Python dev        ‚îÇ
‚îÇ    - History: Completed RU101               ‚îÇ
‚îÇ    - Goal: Build RAG system                 ‚îÇ
‚îú‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î§
‚îÇ 3. CONVERSATION CONTEXT (Dynamic - Session) ‚îÇ
‚îÇ    - User: "What course should I take?"     ‚îÇ
‚îÇ    - AI: "I recommend RU201..."             ‚îÇ
‚îÇ    - User: "How long will that take?"       ‚îÇ
‚îú‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î§
‚îÇ 4. RETRIEVED CONTEXT (Dynamic - Query)      ‚îÇ
‚îÇ    - RU201 course details                   ‚îÇ
‚îÇ    - Duration, prerequisites, topics        ‚îÇ
‚îú‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î§
‚îÇ RESULT: Personalized, context-aware answer  ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
```

Let's add them together into a messages list:

In [None]:
# Step 1: Start with system context
messages = [{"role": "system", "content": system_context}]

# Step 2: Add user context
messages.append({"role": "system", "content": user_context})

# Step 3: Add conversation history
messages.extend(conversation_history)

# Step 4: Add retrieved context
messages.append({"role": "system", "content": retrieved_context})

# Check the final messages structure
for i, msg in enumerate(messages, 1):
    print(f"Message {i} ({msg['role']}):")
    print(f"{msg['content']}\n")

Finally, we can use the context to call the LLM:

> ‚ö†Ô∏è Note: You'll need your own [OpenAI API](https://platform.openai.com/api-keys) key to make the LLM call. Make sure to replace it in the code block below before running it.

In [None]:
import os

from dotenv import load_dotenv
from openai import OpenAI

# Load environment variables
load_dotenv()

# Initialize OpenAI client
client = OpenAI()  # Uses OPENAI_API_KEY from environment

# Make the LLM call with complete context
response = client.chat.completions.create(
    model="gpt-4o-mini", messages=messages, temperature=0.7
)

answer = response.choices[0].message.content
print("AI Response:")
print(answer)

To recap what just happened, we sent an assembled context to the LLM. It received all four context types and used them to generate a personalized response:

1. **System Context** told it to act as a course advisor and provided course information
2. **User Context** gave it Sarah's background, interests, and goals
3. **Conversation Context** showed that "that" refers to RU201
4. **Retrieved Context** provided detailed information about RU201's duration and topics

The result is a response that:
- Understands what course "that" refers to
- Considers Sarah's available time (evenings and weekends)
- Relates the duration to her specific situation
- Stays aligned with her goal of building a RAG system

While this was a simple example, it demonstrated all four layers of context that we will be managing throughout this course. As we build more complex agents, these same principles will apply ‚Äî but the techniques for managing, optimizing, and orchestrating these contexts will become increasingly sophisticated.

Yet even with perfect context engineering, every AI system hits the same wall: the finite size of the model's context window. Let's explore this fundamental constraint next.

## Measuring Token Usage

Before we discuss context windows, let's build a practical tool to measure how many tokens our context actually uses. Understanding token consumption is essential for staying within limits and optimizing your context strategy.

Run the following code to create token estimation and analysis functions:

In [None]:
def estimate_tokens(text):
    """Rough token estimation (approximately 4 characters per token for English)"""
    return len(text) // 4


def analyze_context_usage(messages, model_limit=128000):
    """Analyze token usage across different context types in our messages"""
    usage = {"system": 0, "user": 0, "assistant": 0}

    for msg in messages:
        role = msg["role"]
        tokens = estimate_tokens(msg["content"])
        usage[role] += tokens

    total = sum(usage.values())
    remaining = model_limit - total
    percent_used = (total / model_limit) * 100

    print("=" * 50)
    print("CONTEXT TOKEN ANALYSIS")
    print("=" * 50)
    print(f"\nToken Usage by Role:")
    print(f"  System context:      {usage['system']:,} tokens")
    print(f"  User messages:       {usage['user']:,} tokens")
    print(f"  Assistant messages:  {usage['assistant']:,} tokens")
    print(f"\n  TOTAL:               {total:,} tokens")
    print(f"  Model limit:         {model_limit:,} tokens")
    print(f"  Remaining:           {remaining:,} tokens")
    print(f"  Usage:               {percent_used:.2f}%")
    print("=" * 50)

    return usage, total, remaining


# Analyze our current context
analyze_context_usage(messages)

This simple analysis shows that our current context uses only a small fraction of the available tokens. However, as conversations grow longer and we retrieve more documents, token usage can quickly balloon. Monitoring this helps you make informed decisions about what context to include or compress.

## A Fundamental Challenge: Context Windows

Even the most powerful AI models can only process a fixed amount of text in a single request. This is called the **context window**, which is essentially the size of our anecdotal clerk's desk. Here are the context window sizes of some of the most popular models available:

| Model | Context Window |
|-------|----------------|
| GPT-4o | 128,000 tokens (~96,000 words) |
| GPT-4o-mini | 128,000 tokens (~96,000 words) |
| Claude 3.5 Sonnet | 200,000 tokens (~150,000 words) |

This limit might look generous, but when we break it down, we have a ton of different context competing for space. Here is an example context budget breakdown for a single turn using a modeL:

```
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ   CONTEXT WINDOW - 128K tokens      ‚îÇ
‚îú‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î§
‚îÇ System Context    ‚îÇ 2,000 tokens    ‚îÇ  ‚Üê AI's role and rules
‚îÇ User Context      ‚îÇ 1,000 tokens    ‚îÇ  ‚Üê Your profile
‚îÇ Conversation      ‚îÇ 4,000 tokens    ‚îÇ  ‚Üê What we've discussed
‚îÇ Retrieved Info    ‚îÇ 5,000 tokens    ‚îÇ  ‚Üê Relevant data
‚îÇ Your Query        ‚îÇ 100 tokens      ‚îÇ  ‚Üê Current question
‚îÇ Response Space    ‚îÇ 4,000 tokens    ‚îÇ  ‚Üê AI's answer
‚îú‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î§
‚îÇ TOTAL TOKENS      ‚îÇ 16,100 tokens   ‚îÇ
‚îÇ REMAINING TOKENS  ‚îÇ 111,900 tokens  ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
```

That remaining space (111,900) might look comforting but remember this is just one turn. 

![Context engineering overview](../public/agents-token-hungry.png)


- Our inputs will likely only grow longer (especially around conversation history), and as they do models have a harder time producing effective results.


Additionally, [Chroma](https://www.trychroma.com/) revealed a critical challenge in working with LLMs: [context rot](https://research.trychroma.com/context-rot)
  - the phenomenon where LLM performance degrades non-uniformly as input context length increases, even on simple tasks.

If we summarized the research findings around the size of context in a AI system, it would break out into the following: 

1. **Longer context ‚â† better context**: Reliability drops and accelerates as inputs grow. This means we should treat the window like working memory; budget tokens when possible and keep the slice small.

2. **Relevance rules**: High semantic match drives accuracy; loosely related text hurts. We should prioritize optimizing retrieval precision (rerank, tighten filters, rewrite queries).

3. **Distractors are toxic**: Similar but wrong snippets do disproportionate damage as length increases. To reduce distractors, we should aggressively filter/threshold, dedupe, and avoid ‚Äújust in case‚Äù context.

4. **Placement & structure matter**: Models attend unevenly across the window; key facts can be missed when buried. We should surface essentials early, chunk clearly, and summarize instead of pasting raw dumps. 


![Context engineering overview](../public/chroma_distractors.png)

**Figure**: Performance degradation with context length.  
*Source: Adapted from "Many-Shot In-Context Learning in Multimodal Foundation Models" 
(Chroma Research, 2025). [Link to paper](https://research.trychroma.com/context-rot)*


Remember, every token spent on one thing is a token NOT available for another. This means you must constantly decide:

- Which context is most relevant?
- What can be omitted without hurting quality?
- When to retrieve more vs. use what you have?
- How to compress long conversations?

This makes context engineering a practice of optimization within constraints. As conversations grow longer, systems accumulate more data, and applications become more sophisticated, context management becomes increasingly critical.

## Other Challenges


![Context engineering overview](../public/context-problems.png)

While managing the size of your context window is critical, simply fitting everything in isn't the only hurdle. As D. Breunig highlights in [How Contexts Fail (and How to Fix Them)](https://www.dbreunig.com/2025/06/22/how-contexts-fail-and-how-to-fix-them.html), there are several other ways that context can degrade the performance of your agent, even when you are well within the token limits.

He mentions four key failure modes to watch out for:

1.  **Context Poisoning**: This occurs when an error or hallucination makes its way into your context (e.g., conversation history) and is then treated as fact in subsequent turns. The model "learns" the wrong information from its own previous output, leading to a loop of compounding errors.

2.  **Context Distraction**: As highlighted also by Chroma, sometimes, having *too much* history can be detrimental. If an agent has a massive log of past actions, it may start to over-focus on repeating those past behaviors rather than reasoning about the current, novel situation. It relies on its "contextual habit" instead of its training.

3.  **Context Confusion**: This happens when the context is cluttered with superfluous information‚Äîsuch as too many available tools or irrelevant document snippets. Even if the model *could* ignore them, the mere presence of these "distractors" increases the probability of the model getting confused or trying to use a tool that isn't needed.

4.  **Context Clash**: This is when different parts of your context contradict each other. For example, your system prompt might say one thing, but a retrieved document or an early turn in the conversation history says the opposite. When the model faces this internal conflict, its reasoning capabilities often degrade significantly.

Understanding these failure modes reinforces why context engineering is not just about *compression*‚Äîit's about *curation*. We must actively manage, clean, and sometimes even "forget" context to keep our agents sharp.

In this course, we will not solve every one of these failure modes. Some, like context poisoning, remain as complex open research problems in AI. However, we will systematically dismantle the most common ones by building a series of "progressive agents," each designed to overcome specific context limitations.

## Comparing Context Strategies

To see the real impact of context engineering, let's compare how the same query performs with different amounts of context. This demonstrates why adding the right context matters.

We'll test three strategies:
1. **Minimal** - System context only (generic responses)
2. **With User** - System + User context (personalized responses)
3. **With Retrieval** - System + User + Retrieved context (expert responses)

Run the following code to compare:

In [None]:
def test_context_strategies(user_profile, test_query):
    """Compare responses with different context strategies"""

    strategies = {
        "minimal": [
            {"role": "system", "content": system_context},
            {"role": "user", "content": test_query},
        ],
        "with_user": [
            {"role": "system", "content": system_context},
            {"role": "system", "content": format_user_context(user_profile)},
            {"role": "user", "content": test_query},
        ],
        "with_retrieval": [
            {"role": "system", "content": system_context},
            {"role": "system", "content": format_user_context(user_profile)},
            {"role": "system", "content": retrieve_course_info("RU201")},
            {"role": "system", "content": retrieve_course_info("RU301")},
            {"role": "user", "content": test_query},
        ],
    }

    print(f"Query: {test_query}")
    print("=" * 60)

    for strategy_name, messages in strategies.items():
        response = client.chat.completions.create(
            model="gpt-4o-mini", messages=messages, max_tokens=150
        )

        print(f"\nüîπ {strategy_name.upper()} strategy:")
        print(response.choices[0].message.content)
        print()


# Test with a sample query
test_context_strategies(olivia_profile, "What course should I take next?")

### What We Observed

Notice the dramatic difference in response quality:

| Strategy | Behavior | Result |
|----------|----------|--------|
| **Minimal** | Asks clarifying questions | "What's your background? What are your goals?" |
| **With User** | Personalizes immediately | "Based on your Python experience and RU101..." |
| **With Retrieval** | Provides specific details | Includes course duration, topics, prerequisites |

**Key Insight:** More relevant context = fewer back-and-forth interactions = better user experience.

- **Minimal context** requires 3-4 interactions to gather enough information
- **Rich context** provides a complete answer in 1 interaction

The challenge becomes: determining what context is "relevant" while managing token budgets.

## Best Practices for Context Engineering

Based on what we've learned, here are five essential practices to follow:

1. **Start simple, add complexity gradually** 
   - Begin with system context only
   - Add user context when you have it
   - Retrieve additional context only when needed

2. **Measure token usage continuously**
   - Track how much of your context window is being used
   - Set alerts before hitting limits
   - Monitor growth patterns over conversation length

3. **Optimize for relevance, not completeness**
   - Don't dump all available information into context
   - Retrieve only what's relevant to the current query
   - Filter and rank retrieved content by relevance

4. **Use clear, structured formatting**
   - Organize context with headers and sections
   - Use bullet points and key-value pairs
   - Make it easy for the model to find information

5. **Test different context combinations**
   - Experiment with what to include vs. exclude
   - Measure response quality with different strategies
   - Iterate based on real user interactions

## Wrap Up

In this introduction, we started to explore how real AI applications depend on carefully constructed context. More specifically we covered the following:

1. Context engineering is the practice of deliberately shaping the information surrounding an application, rather than just crafting individual prompts.
2. There are four types of context we will focus on: system context, user context, conversation context, and retrieved context.
3. All LLMs have fixed context windows, creating a fundamental constraint where every token used for one purpose is unavailable for another.
4. Context rot is a phenomenon where LLM performance degrades as input context length increases, even on simple tasks.
5. Context failures such as distraction, confusion, and clash occur when irrelevant or contradictory information is present in the context window.

We are now ready to start building. In **Module 2: Data Engineering for Context**, we will learn how to prepare data for LLM consumption‚Äîthe critical step before implementing RAG. We'll cover the data pipeline, chunking decisions, and see how context engineering can achieve 91% token reduction.