# Stage 1: The Baseline RAG Agent

Now that you've established some working knowledge of system context, it's time to start exploring our agent's capabilities and, more importantly, its limitations. More specifically, we'll do the following:

1.  Explore the baseline architecture of the simple RAG agent. Our Stage 1 Agent takes the simplest possible approach: Find relevant courses and give the LLM *everything* about them.
2.  Witness "Information Overload": See firsthand what happens when you retrieve *too much* context.
3.  Analyze the Cost: Measure the token usage of a naive approach.

Let's now dive in by going over how the agent works.

## Agent Overview

First, an overview of the code. If you want to explore on your own, the baseline agent lives in the `progressive_agents/stage1_baseline_rag/` directory. You can reference the code at any time throughout this lesson.

We are using the following technologies in the stack:

- **LangGraph**: The orchestrator. It manages the control flow of our application, defining how data moves between steps (nodes). If you're not familiar with the components of LangGraph, we recommend you check out their overview on ["Thinking in LangGraph"](https://docs.langchain.com/oss/python/langgraph/thinking-in-langgraph).
- **Redis**: Our datastore. It serves as our vector database, allowing us to perform semantic search to find relevant courses.
- **LangChain**: The connector. It provides the standard interfaces for interacting with LLMs and prompts.
- **OpenAI (GPT-4o-mini)**: Our reasoing model

> **A note on architecture**: In a real-world scenario, a simple RAG pipeline doesn't necessarily require a full "agent" architecture with state management and graph orchestration. A simple function chain would often suffice. However, for this course, we have implemented it as a basic Agent using LangGraph. This allows us to introduce the core components (state, nodes, workflow) from the start, providing a solid foundation for adding complexity, such as memory, tools, and decision-making, in later stages.

Note the agent has three important components:

### 1. LangGraph Nodes (agent/nodes.py)

The logic is split into two functions (nodes):
*   `research_node`:
    *   Searches Redis for the top 5 courses matching the user's query.
    *   Note that it retrieves the FULL hierarchical data for all 5 courses. This includes every single week of the syllabus, every homework assignment, and every reading list.
*   `synthesize_node`:
    *   Receives the massive block of text about the 5 courses
    *   Sends it all to the LLM with a prompt to answer the user's question.

### 2. LangGraph State (agent/state.py)

We use a `TypedDict` to pass data between nodes.
```python
class AgentState(TypedDict):
    query: str              # The query sent to the LLM
    raw_context: str        # The JSON blob of course data retrieved
    final_answer: str       # The LLM's response
    total_tokens: int       # Tracking token usage
```

### 3. LangGraph Workflow (agent/workflow.py)

We use LangGraph to orchestrate the flow. It's a linear graph:

```mermaid
graph LR
    START([Start]) --> Research[Research Node]
    Research --> Synthesize[Synthesize Node]
    Synthesize --> END([End])
    
    style Research fill:#ff9999,stroke:#333,stroke-width:2px
    style Synthesize fill:#99ccff,stroke:#333,stroke-width:2px
```


Let's now set up the agent and jump into seeing the context we are working with.

## Setup

Just like before, we'll need to import the agent code from the `progressive_agents` directory. Run the code block below.

In [None]:
#This code sets up the notebook to be able to access the provided OpenAI API Key and access to the agent code

import sys
import os
from pathlib import Path

if "OPENAI_API_BASE" in os.environ:
    os.environ["OPENAI_BASE_URL"] = os.environ["OPENAI_API_BASE"]

project_root = Path("..").resolve()

stage1_path = project_root / "progressive_agents" / "stage1_baseline_rag"
src_path = project_root / "src"

sys.path.insert(0, str(src_path))
sys.path.insert(0, str(stage1_path))

print('OpenAI API key and agent access setup!')

Again, just like before, we will use the `setup_agent` helper. This function performs a crucial step:
*   It connects to your Redis instance.
*   It checks if the course data exists.
*   If not, it generates 50 sample courses and loads them into Redis.

Unlike stage 0, we'll have a more verbose output this time anytime we run the agent. This will help us visualize what the agent is doing more clearly. Run the code block below to start the agent.

> ‚ö†Ô∏è **Note**: This might take a few seconds the first time you run it.

In [None]:
from agent import setup_agent

print("Initializing Stage 1 Agent...")
# auto_load_courses=True ensures we have data to query
workflow, course_manager = setup_agent(auto_load_courses=True)
print("Agent is ready!")

## Drowning in Data

Imagine a student just wants a quick list of options. They ask:

> *"What computer science courses are available?"*

If this were a human advisor, they would likely reply with something like: *"We have CS001 (Intro to ML) and CS002 (Deep Learning)."*

But will that be what our agent does? Let's observe. Run the code block below to send the query.

In [None]:
# Define the user's query
query = "What computer science courses are available?"

print(f"User asks: '{query}'")
print("Running workflow...")

# Run the graph!
# We use .ainvoke() because our agent is async
result = await workflow.ainvoke({"query": query})

print("Workflow complete!")

In most cases, the agent answered the question quite well and provided us with a few courses about computer science. But, pay close attention to the token usage. Let's examine the metrics a bit more closely. 

Run the code below to see the tokens received by the LLM.

In [None]:
# Display the Answer
print("="*60)
print(f"Agent Answer:\n\n{result['final_answer']}")
print("="*60)

# Display the Metrics
courses_found = result.get('courses_found', 0)
total_tokens = result.get('total_tokens', 0)

print(f"\nStatistics:")
print(f"   Courses Retrieved: {courses_found}")
print(f"   Total Tokens Used: {total_tokens:,}")

Take a look at the total number of tokens. It is likely over 6,000 tokens. This means that for a simple question like *"What courses are available?"*, we used enough tokens to write a short essay. Why is it so big? Let's take a closer look at the `raw_context` that was sent to the LLM. 

Run the code block below to examine the raw context.

In [None]:
# Let's look at the first 2000 characters of the context sent to the LLM
raw_context = result.get('raw_context', '')

print(f"Total Context Size: {len(raw_context):,} characters")
print("-" * 40)
print("PREVIEW OF CONTEXT SENT TO LLM")
print("-" * 40)
print(raw_context[:2000] + "\n\n... [TRUNCATED 20,000+ CHARACTERS] ...")

Notice what is in that context:
*   `"week_number"` that covered the topics in class per week from the first week all the way to the end
*   `"assignments"` with detailed lists of every homework
*   `"grading_policy"` with breakdowns of percentages

In order to sufficiently answer the question about computer science courses, the LLM didn‚Äôt need all of this context‚Äîit really just needed the course titles and descriptions. But by sending everything, we pay the price in multiple ways:

1. Financial waste: LLMs have an associated per-token cost, and in this case, roughly 90% of those tokens were unnecessary.
2. Latency: Processing 6,000 tokens takes significantly longer than processing 500.
3. Distraction: When you flood the LLM with irrelevant data, it‚Äôs more likely to get confused or ‚Äúhallucinate‚Äù ‚Äî like trying to find one phone number by reading every book in the library. This has been proven through the research on context rot by Chroma, which we covered in the introduction.

## Wrap Up üèÅ

You've completed Stage 1 and run the RAG agent. While it works, you've also experienced firsthand the core challenge of context engineering: more context isn't always better.

In this stage, you:

- Gained familiarity with the basic RAG pipeline that retrieves and synthesizes course information
- Observed information overload when sending full documents to the LLM
- Measured the token cost of a naive "retrieve everything" approach

The key insight from this baseline is that retrieving full documents is rarely the right strategy. By sending 6,000+ tokens for a simple query that likely required only 500, you witnessed both financial waste and potential distraction that can lead to hallucinations.

In Stage 2, you'll address this issue through basic data engineering techniques on the context, including trimming unnecessary data, filtering to display only summaries initially, and formatting with clean Markdown instead of raw JSON. You'll see how strategic context curation can reduce token usage by 80% while maintaining‚Äîor even improving‚Äîanswer quality.