![Redis](https://redis.io/wp-content/uploads/2024/04/Logotype.svg?auto=webp&quality=85,75&width=120)


# Introduction

<a href="https://colab.research.google.com/github/redislabs-training/ce-redis-langchain/blob/main/section-1-context-engineering-foundations/01_Introduction_to_context_engineering.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Over the past few years, AI applications have undergone rapid evolution. Early on, engineers built a polite clerk behind a glass window: users would slide a note - "draft this email," "explain this code" — and the clerk would read it, scribble an answer, and slide it back. Their answers were generally good, but not always what we wanted. 

Users quickly realized that if they *precisely* engineer what the note says, they can, in turn, achieve better results from the clerk. So, they used more precise instructions — "act as my editor," "use this tone," "return a table" — and the outputs became sharper. This, in turn, became the practice of prompt engineering. 

Prompt engineering took users further, but eventually showed its limits — especially with our clerk behind the glass. We kept hoping the wording would stick, but longer notes got slow and contradictory. Tiny phrasing swings changed behavior in ways that were hard to predict or control, and no amount of wordsmithing seemed to fully fix it.

For a while, if you were building with these systems, the playbook for solving these issues was simple: upgrade the model. Swap in the latest version and your clerk gets sharper responses, a better grasp of nuance, and enough capability to attempt specialized tasks like research or creative exploration.

This did help! The clerk began to catch more nuances in the notes that were passed. It could juggle longer instructions, synthesize more information, and make fewer obvious mistakes. For many teams, it felt like the main knob to turn was simply: use a bigger, smarter model.

But under the hood, the old cracks were still there. The clerk was still working mostly from whatever happened to be on the note in front of them. They still forgot everything between one visit and the next. When they didn’t know something, they still guessed instead of checking. In other words, we’d upgraded the clerk, but not the glass, not the office, and not the way work flowed around them.

Then came agents.

Instead of a clerk answering one note at a time, we started asking that same clerk to run tasks, such as breaking down a goal, fetching information from the internet, and reading and updating complex code. Our polite assistant became a kind of project manager.

In our metaphor, it was as if we took the clerk from behind the glass and said: "Feel free to walk the halls. Use the phone, dig through the filing cabinets, talk to other departments, and come back with a finished result."

This unlocked striking capabilities — but it also made the old problems impossible to ignore. Small ambiguities in instructions ballooned over ten or twenty steps. Missing background information meant the clerk confidently walked in the wrong direction for minutes at a time. Every task and its associated subtasks came with its own little note — its own tiny prompt — and suddenly the clerk wasn't wrestling with one interaction, but with a swarm of them, all slightly inconsistent.

So, what do these issues look like if we were to crack open one of these agentic applications? Where do "missing background," "forgotten preferences," and that "swarm of tiny prompts" actually live?

In practice, they all appear as distinct components of context surrounding a model call: the state and history you carry forward, the prompts and policies you apply, the memories you load, the documents you retrieve, and the structured outputs you expect to receive back. Visually, it looks something like this:

![Context engineering overview](../public/ce-overview.png)

Things like user input, operational databases, web APIs, and data warehouses all feed into the application’s context layer. Everything inside that dashed red circle is what we, as engineers, design and control before a single token ever reaches the LLM model. Instead of only asking, "What’s the right prompt?" or "What’s the right model?", we are now shifting to ask, “What information should be present at every step of this process, and how do we control it?”

That shift — from crafting individual notes to deliberately shaping the information surrounding the application — is what has become known as context engineering.

Let's now ground this idea in an example application for a Redis University course agent and watch where context engineering makes the difference.

## Context Engineering in Action

We can imagine, that at most universities today, one of the biggest pain points for learners is the time-suck of researching and selecting their academic classes for the year. In most cases, this involves a tedious process of waiting for their registration time slot to open (as often more senior students have priority), scrolling various pages of courses/descriptions, finding if they meet the prerequisites, and so on. There are even dedicated roles designed to help students decide on what classes are the right fit and assist them with their planning. 

As an engineer part of the Redis University Innovation Team, our hope is to solve this issue by creating a seamless interface whereby a agent can help expedite the research and selection process of a student's courses. 

Let's now imagine we did indeed build this agent and are running a pilot of the results with a subset of students. Lets examine a common scenario we want the agent to deal with: A student seeking advice on what classes to take to help build a career in AI. 

We'll use a sample student profile as our base: 

- Name: Olivia Jansen
- Major: Computer Science
- Year: Sophomore
- Completed: Intro to Programming (CS101), Data Structures (CS201), Calculus I
- Interests: Machine learning, Data science
- Preferences: Prefers online courses, learns best with hands-on projects
- Goal: Build a career in AI

Olivia goes ahead and interacts with the agent and has the following results:

<details>
  <summary>Click the dropdown to review the conversation with the agent</summary>
  
**Turn 1:**

*Student:* "I'm interested in machine learning courses"

*Agent:* "Here are all machine learning courses available:
- CS301: Introduction to Machine Learning
- CS401: Advanced Machine Learning
- CS402: Deep Learning
- CS403: Natural Language Processing
- CS404: Computer Vision

All courses meet twice per week on campus."

---

**Turn 2:**

*Student:* "I prefer online courses."

*Agent:* "Here are all machine learning courses available:
- CS301: Introduction to Machine Learning
- CS401: Advanced Machine Learning
..."

Notice that the result is the same list, ignoring Olivia's preferences.

---

**Turn 3:**

*Student:* "Can I take CS401?"

*Agent:* "CS401 requires CS301 as a prerequisite. Do you meet this requirement?"

The agent does not respond adequately as it doesn't know what courses the student has completed

---

**Turn 4:**

*Student:* "What about that intro course you mentioned earlier?"

*Agent:* "I'm not sure which course you're referring to. Could you provide more details?"

The agent does not remember its old replies and has no context on what intro course was mentioned. 

---
</details>

This advisor is intentionally simple, but its failure modes mirror those of real systems: it has no memory (can’t recall prior turns), no personalization (ignores her preferences and background), and no awareness (doesn’t know her history, can’t check prerequisites, and cannot connect relevant information). The result is a frustrating experience that wastes time and delivers little value to the student.

We don't want to leave our students feeling disappointed, so we spend some time applying some common context engineering techniques to our agent to improve the results. Now, let's examine the agent, but with proper context management: 

<details>
    <summary> Click the dropdown to review the conversation with the agent with proper context management </summary>

**Turn 1:**

*Student:* "I'm interested in machine learning courses."

*Agent:* "Great! Based on your Computer Science major and the courses you've completed (CS101 and CS201), you're ready for CS301: Introduction to Machine Learning.

Since you prefer online courses and hands-on learning, I should mention that CS301 has:
- A fully online section is available
- Weekly programming assignments with real datasets
- A final project where you build your own ML system

This course aligns perfectly with your goal of building an AI career and serves as the foundation for more advanced ML courses."

---

**Turn 2:**

*Student:* "That sounds perfect! What comes after that?"

*Agent:* "After completing CS301, you'll have several great options:

**Immediately Available:**
- CS402: Deep Learning (Online, project-based)
  Focus: Neural networks, CNNs, RNNs
  
- CS403: Natural Language Processing (Online, project-based)
  Focus: Text analysis, transformers, LLMs

**Recommended Path for AI Career:**
1. CS301 (Introduction) ← Start here
2. CS402 (Deep Learning) ← Most relevant to modern AI
3. CS403 (NLP) or CS404 (Computer Vision) ← Specialize

Given your interest in data science, I'd especially recommend the NLP path - it combines well with your data structures background."

---

**Turn 3:**

*Student:* "How long will that path take?"

*Agent:* "Let me break it down for you:

**CS301:** 14 weeks (3 credits)
**CS402:** 14 weeks (3 credits)  
**CS403:** 14 weeks (3 credits)

If you take one course per semester:
- Fall 2024: CS301
- Spring 2025: CS402
- Fall 2025: CS403

Since you're currently a Junior, this timeline would have you completing this sequence before graduation, with one semester remaining for an internship or capstone project.

Alternatively, since these are online courses, you could potentially take CS301 this fall and CS402 in a winter intersession to accelerate your path."

---

**Turn 4:**

*Student:* "I like that accelerated option. Can I register now?"

*Agent:* "Yes! Since you've met the prerequisites for CS301, I can guide you through registration. The online section (CS301-W01) has 15 seats remaining.

Would you like me to:
1. Reserve a seat for you in CS301-W01 for Fall 2024?
2. Add you to the notification list for the Winter CS402 section (registration opens in October)?
3. Send you the course syllabus and professor information?"

</details>

The difference in this interaction is stark compared to the previous. With proper context engineering, the advisor carries the context instead of the student: it remembers who Olivia is (major, completions, preferences), filters to relevant options (online, project-based ML), checks prerequisites automatically, turns choices into a sequenced plan and timeline, and can even propose recommended actions (register, notify, send syllabi). The result is a fast, personalized, high-confidence experience that actually delivers value.

So, while in our example, we waved a magic wand and the agent was suddenly more effective, what actually goes into making this improvement happen? To grasp the techniques used, we must first explore the types of contexts that an AI system manages. 

## Building a complete context: the four pillars

We can breakdown the context that goes into an AI application into four distinct types: 

1. System context
2. User context
3. Conversation context
4. Retrieved context 

Each of these types plays a crucial role in making sure that an AI system has what it needs to properly respond. In this section, we'll explore each context type in more detail and setup a sample context to call a LLM. As you go through, make sure to run each code block that is provided as we build up the context and send it to a LLM at the end.

### 1. System Context: "What Am I?"

The system context defines the AI's identity, capabilities, and knowledge. It is static, universal, and always present.

It typically contains: 
- A role definition ("You are a course advisor")
- Available tools and actions (`search_courses(query)`)
- Domain knowledge and business rules ("CS401 requires CS301")
- Behavioral guidelines ("Ask one clarifying question at a time")

An example of system context (also commonly called the system prompt) for our Redis University agent, might look like the following:

In [None]:
system_context = """You are a Redis University course advisor.

Guidelines:
1. Always provide specific course recommendations with clear reasoning
2. Consider the student's background, completed courses, and interests
3. Ensure prerequisites are met before recommending advanced courses
4. Be encouraging and supportive in your guidance
"""

### 2. User Context: "Who Are You?"

The user context contains personal information about the specific user. It is typically dynamic, personalized, and retrieved from storage. 

It typically contains: 
- Profile information (major, year, background)
- Preferences and learning style ("prefers online/asynchronous")
- History and achievements ("completed CS101/CS201/Calc I")
- Goals and constraints ("Tue/Thu availability")

We have already seen an example of user context when we defined the user profile and preferences for Olivia: 

In [None]:
# Step 1: Create a user profile as a dictionary
# Note: In production, this would come from a database, here, we hardcode it for demonstration purposes

olivia_profile = {
    "name": "Olivia Jansen",
    "background": "Python developer, 2 years experience",
    "completed_courses": ["RU101"],
    "interests": ["machine learning", "data science", "vector search"],
    "time_availability": "evenings and weekends",
    "goal": "Build a RAG system for my company's documentation",
}

Note: In production, the above, would likely from a database, but we have hardcoded it above for demonstration purposes. Additionally, we would likely write a function to construct this profile for our context. Something like this:

In [None]:
def format_user_context(profile):
    """Convert user profile dictionary to formatted context string"""
    return f"""Student Profile:
- Name: {profile['name']}
- Background: {profile['background']}
- Completed Courses: {', '.join(profile['completed_courses'])}
- Interests: {', '.join(profile['interests'])}
- Availability: {profile['time_availability']}
- Goal: {profile['goal']}
"""

user_context = format_user_context(olivia_profile)
print(user_context)

### 3. Conversation Context: "What Have We Discussed?"

Conversation context is one of the most important parts of context since it represents the history of the current conversation. Typically this is implemented in the form of short-term memory. It is dynamic, session-specific, and will grow over time. 

It typically contains: 
- Previous user messages
- Previous AI responses
- Decisions and commitments made
- Topics explored

We have also seen an example of conversation context earlier:
```
Turn 1:
User: "I'm interested in machine learning courses."
AI: "I recommend CS301: Introduction to Machine Learning..."

Turn 2:
User: "What comes after that course?"
[Current query - needs context from Turn 1 to understand "that course"]
AI: "After CS301, you can take CS402 or CS403..."

Turn 3:
User: "How long will that path take?"
[Current query - needs context from Turn 2 to understand "that path"]
```

Let's examine a simple implementation of conversation context: 

In [None]:
# Step 1: Start with an empty conversation history
conversation_history = []

# Step 2: Add the first user message
conversation_history.append(
    {"role": "user", "content": "What Redis course should I take next?"}
)

# Step 3: Add the AI's response (simulated)
conversation_history.append(
    {
        "role": "assistant",
        "content": """Based on your Python background and completion of RU101, 
I recommend RU201: Redis for Python Developers. This course will teach you 
how to build Redis applications using redis-py, which aligns perfectly with 
your goal of building a RAG system.""",
    }
)

# Step 4: Add a follow-up question that references previous context
conversation_history.append(
    {"role": "user", "content": "How long will that take me to complete?"}
)


Notice the user said "that" instead of "RU201". The AI needs the conversation history to understand what "that" refers to. Let's view the complete conversation history:

In [None]:
for i, msg in enumerate(conversation_history, 1):
    print(f"Turn {i} ({msg['role']}):")
    print(f"{msg['content']}\n")

### 4. Retrieved Context: "What Information Is Relevant?"

Lastly, there is retrieved context. This is information fetched on-demand based on the current query. This involves the system retrieving specific and relevant context across other systems. It is dynamic and highly targeted. 

It typically contains: 
- Database records (course details, schedules)
- Search results (relevant documents, FAQs)
- API responses (real-time data, availability)
- Computed information (eligibility checks, recommendations)

For example, if a student asked the Redis University agent about "CS301", the system would need to find any relevant context to respond to the query. For instance, it would call a course catalog database, find the CS301 course and retrieve information like course details (description, prerequisites, format), current availability (X seats in online section), and professor ratings and reviews.

Let's simulate what retrieved context will look like:

In [None]:
# Step 1: Simulate a course database
# In production, this would be Redis, etc.
course_database = {
    "RU101": {
        "title": "Introduction to Redis Data Structures",
        "level": "Beginner",
        "description": "Master Redis fundamentals: strings, hashes, lists, sets, and sorted sets",
        "duration": "4-6 hours",
        "prerequisites": [],
        "topics": ["Data structures", "Basic commands", "Use cases"],
    },
    "RU201": {
        "title": "Redis for Python Developers",
        "level": "Intermediate",
        "description": "Build production Redis applications with Python and redis-py",
        "duration": "6-8 hours",
        "prerequisites": ["RU101", "Python experience"],
        "topics": ["redis-py library", "Connection pooling", "Pipelining", "Pub/Sub"],
    },
    "RU202": {
        "title": "Redis for Java Developers",
        "level": "Intermediate",
        "description": "Build production Redis applications with Java and Jedis",
        "duration": "6-8 hours",
        "prerequisites": ["RU101", "Java experience"],
        "topics": [
            "Jedis library",
            "Connection pooling",
            "Transactions",
            "Redis Streams",
        ],
    },
    "RU301": {
        "title": "Vector Similarity Search with Redis",
        "level": "Advanced",
        "description": "Implement semantic search and RAG systems with Redis vector capabilities",
        "duration": "8-10 hours",
        "prerequisites": ["RU201 or RU202", "ML/AI interest"],
        "topics": [
            "Vector embeddings",
            "Semantic search",
            "RAG architecture",
            "Hybrid search",
        ],
    },
}

Now let's simulate retrieving course information based on a query.

In [None]:
# Step 2: Create a retrieval function

def retrieve_course_info(course_code):
    """Retrieve detailed information about a specific course"""
    course = course_database.get(course_code)
    if not course:
        return None

    return f"""Course Details:
        Code: {course_code}
        Title: {course['title']}
        Level: {course['level']}
        Description: {course['description']}
        Duration: {course['duration']}
        Prerequisites: {', '.join(course['prerequisites']) if course['prerequisites'] else 'None'}
        Topics Covered: {', '.join(course['topics'])}
    """


# Retrieve information about RU201
retrieved_context = retrieve_course_info("RU201")
print(retrieved_context)

### Bringing It All Together: Complete Context Integration

Now that we understand each context type individually, let's see how they work together to create an intelligent, personalized response.

Here's how all four context types combine in a single LLM call:

```
┌─────────────────────────────────────────────┐
│           COMPLETE LLM REQUEST              │
├─────────────────────────────────────────────┤
│ 1. SYSTEM CONTEXT (Static)                  │
│    - Role: "You are a course advisor"       │
│    - Domain: Available courses              │
│    - Rules: Guidelines and constraints      │
├─────────────────────────────────────────────┤
│ 2. USER CONTEXT (Dynamic - User Specific)   │
│    - Profile: Sarah Chen, Python dev        │
│    - History: Completed RU101               │
│    - Goal: Build RAG system                 │
├─────────────────────────────────────────────┤
│ 3. CONVERSATION CONTEXT (Dynamic - Session) │
│    - User: "What course should I take?"     │
│    - AI: "I recommend RU201..."             │
│    - User: "How long will that take?"       │
├─────────────────────────────────────────────┤
│ 4. RETRIEVED CONTEXT (Dynamic - Query)      │
│    - RU201 course details                   │
│    - Duration, prerequisites, topics        │
├─────────────────────────────────────────────┤
│ RESULT: Personalized, context-aware answer  │
└─────────────────────────────────────────────┘
```

Let's add them together:

In [None]:
# Step 1: Start with system context
messages = [{"role": "system", "content": system_context}]

# Step 2: Add user context
messages.append({"role": "system", "content": user_context})

# Step 3: Add conversation history
messages.extend(conversation_history)

# Step 4: Add retrieved context
messages.append({"role": "system", "content": retrieved_context})

And now, we can use the context to call the LLM:

In [None]:
import os

from dotenv import load_dotenv
from openai import OpenAI

# Load environment variables
load_dotenv()

# Initialize OpenAI client
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# Make the LLM call with complete context
response = client.chat.completions.create(
    model="gpt-4o-mini", messages=messages, temperature=0.7
)

answer = response.choices[0].message.content
print("AI Response:")
print(answer)

The LLM received all four context types and used them to generate a personalized response:

1. **System Context** told it to act as a course advisor and provided course information
2. **User Context** gave it Sarah's background, interests, and goals
3. **Conversation Context** showed that "that" refers to RU201
4. **Retrieved Context** provided detailed information about RU201's duration and topics

The result is a response that:
- Understands what course "that" refers to
- Considers Sarah's available time (evenings and weekends)
- Relates the duration to her specific situation
- Stays aligned with her goal of building a RAG system

When building complex AI systems, you'll be working with all four of these context types and will have to find ways to engineer them to make the system more effective. 
That said, every model has one the bottleneck never moves: what fits in the model’s context window.

## A Fundamental Challenge: Context Windows

Even the most powerful AI models can only process a fixed amount of text in a single request. This is called the **context window**, which is essentially the size of our anecdotal clerk's desk. Here are the context window sizes of some of the most popular models available:

| Model | Context Window |
|-------|----------------|
| GPT-4o | 128,000 tokens (~96,000 words) |
| GPT-4o-mini | 128,000 tokens (~96,000 words) |
| Claude 3.5 Sonnet | 200,000 tokens (~150,000 words) |

This limit might look generous, but when we break it down, we have a ton of different context competing for space. Here is an example context budget breakdown for a single turn using a modeL:

```
┌─────────────────────────────────────┐
│   CONTEXT WINDOW - 128K tokens      │
├─────────────────────────────────────┤
│ System Context    │ 2,000 tokens    │  ← AI's role and rules
│ User Context      │ 1,000 tokens    │  ← Your profile
│ Conversation      │ 4,000 tokens    │  ← What we've discussed
│ Retrieved Info    │ 5,000 tokens    │  ← Relevant data
│ Your Query        │ 100 tokens      │  ← Current question
│ Response Space    │ 4,000 tokens    │  ← AI's answer
├─────────────────────────────────────┤
│ TOTAL TOKENS      │ 16,100 tokens   │
│ REMAINING TOKENS  │ 111,900 tokens  │
└─────────────────────────────────────┘
```

That remaining space (111,900) might look comforting but remember this is just one turn. Our inputs will likely only grow longer (especially around conversation history), and as they do models have a harder time producing effective results. In fact [Chroma](https://www.trychroma.com/) revealed a critical challenge in working with LLMs: [context rot](https://research.trychroma.com/context-rot) - the phenomenon where LLM performance degrades non-uniformly as input context length increases, even on simple tasks.

If we summarized the research findings around the size of context in a AI system, it would break out into the following: 

1. Longer context ≠ better context: Reliability drops and accelerates as inputs grow. This means we should treat the window like working memory; budget tokens when possible and keep the slice small.

2. Relevance rules: High semantic match drives accuracy; loosely related text hurts. We should prioritize optimizing retrieval precision (rerank, tighten filters, rewrite queries).

3. Distractors are toxic: Similar but wrong snippets do disproportionate damage as length increases. To reduce distractors, we should aggressively filter/threshold, dedupe, and avoid “just in case” context.

4. Placement & structure matter: Models attend unevenly across the window; key facts can be missed when buried. We should surface essentials early, chunk clearly, and summarize instead of pasting raw dumps. 






Remember, every token spent on one thing is a token NOT available for another. This means you must constantly decide:

- Which context is most relevant?
- What can be omitted without hurting quality?
- When to retrieve more vs. use what you have?
- How to compress long conversations?

This makes context engineering a practice of optimization within constraints. As conversations grow longer, systems accumulate more data, and applications become more sophisticated, context management becomes increasingly critical.

## Other Challenges

While managing the size of your context window is critical, simply fitting everything in isn't the only hurdle. As D. Breunig highlights in [How Contexts Fail (and How to Fix Them)](https://www.dbreunig.com/2025/06/22/how-contexts-fail-and-how-to-fix-them.html), there are several other ways that context can degrade the performance of your agent, even when you are well within the token limits.

He mentions four key failure modes to watch out for:

1.  **Context Poisoning**: This occurs when an error or hallucination makes its way into your context (e.g., conversation history) and is then treated as fact in subsequent turns. The model "learns" the wrong information from its own previous output, leading to a loop of compounding errors.

2.  **Context Distraction**: Sometimes, having *too much* history can be detrimental. If an agent has a massive log of past actions, it may start to over-focus on repeating those past behaviors rather than reasoning about the current, novel situation. It relies on its "contextual habit" instead of its training.

3.  **Context Confusion**: This happens when the context is cluttered with superfluous information—such as too many available tools or irrelevant document snippets. Even if the model *could* ignore them, the mere presence of these "distractors" increases the probability of the model getting confused or trying to use a tool that isn't needed.

4.  **Context Clash**: This is when different parts of your context contradict each other. For example, your system prompt might say one thing, but a retrieved document or an early turn in the conversation history says the opposite. When the model faces this internal conflict, its reasoning capabilities often degrade significantly.

Understanding these failure modes reinforces why context engineering is not just about *compression*—it's about *curation*. We must actively manage, clean, and sometimes even "forget" context to keep our agents sharp.

In this course, we will not solve every one of these failure modes. Some, like context poisoning, remain as complex open research problems in AI. However, we will systematically dismantle the most common ones by building a series of "progressive agents," each designed to overcome specific context limitations.