![Redis](https://redis.io/wp-content/uploads/2024/04/Logotype.svg?auto=webp&quality=85,75&width=120)

# üéØ Scaling with Semantic Tool Selection

**‚è±Ô∏è Estimated Time:** 60-75 minutes

## üéØ Learning Objectives

By the end of this notebook, you will:

1. **Understand** the token cost of tool definitions and scaling challenges
2. **Compare** tool selection strategies (static, pre-filtered, semantic)
3. **Implement** semantic tool selection using **RedisVL Semantic Router**
4. **Build** an enhanced agent that scales from 3 to 5 tools
5. **Measure** performance improvements (token savings, accuracy)
6. **Apply** production-ready tool routing patterns
7. **Make** informed decisions about when to use each strategy

---

## üîó Where We Are

### **Your Journey Through Section 4:**

**Notebook 1:** Tools and LangGraph Fundamentals
- ‚úÖ Learned what tools are and how LLMs use them
- ‚úÖ Understood LangGraph basics (nodes, edges, state)
- ‚úÖ Built simple tool-calling examples

**Notebook 2:** Building a Course Advisor Agent
- ‚úÖ Built complete agent with 3 tools
- ‚úÖ Integrated dual memory (working + long-term)
- ‚úÖ Implemented LangGraph workflow
- ‚úÖ Visualized agent decision-making

**Notebook 3:** Agent with Memory Compression
- ‚úÖ Added memory compression strategies
- ‚úÖ Optimized conversation history management
- ‚úÖ Learned production memory patterns

**Current Agent State:**
```
Tools:           3 (search_courses, search_memories, store_memory)
Memory:          Working + Long-term (compressed)
Token overhead:  ~1,200 tokens for tool definitions
```

### **The Next Challenge: Scaling Tools**

**What if we want to add more capabilities?**
- Add prerequisite checking ‚Üí +1 tool
- Add course comparison ‚Üí +1 tool
- Add enrollment tracking ‚Üí +1 tool
- Add progress monitoring ‚Üí +1 tool

**The Problem:**
- Each tool = ~300-500 tokens (schema + description)
- All tools sent to LLM every time, even when not needed
- Token cost grows linearly with number of tools

**Example:**
```
3 tools  = 1,200 tokens
5 tools  = 2,200 tokens  (+83%)
10 tools = 4,500 tokens  (+275%)
20 tools = 9,000 tokens  (+650%)
```

---

## üéØ The Problem We'll Solve

**"We want to add more capabilities (tools) to our agent, but sending all tools every time is wasteful. How can we scale to 5+ tools without exploding our token budget?"**

### **What We'll Learn:**

1. **Tool Token Cost** - Understanding the overhead of tool definitions
2. **Tool Selection Strategies** - Static vs Pre-filtered vs Semantic
3. **Semantic Tool Selection** - Using embeddings to match queries to tools
4. **RedisVL Semantic Router** - Production-ready routing patterns
5. **Trade-offs** - When to use each approach

### **What We'll Build:**

Starting with your Notebook 2 agent (3 tools), we'll add:
1. **2 New Tools** - `check_prerequisites`, `compare_courses`
2. **Tool Selection Strategies** - Compare different approaches
3. **Semantic Router** - RedisVL-based intelligent tool selection
4. **Enhanced Agent** - Uses only relevant tools per query

### **Expected Results:**

```
Metric                  Before (3 tools)  After (5 tools)   Improvement
‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ
Tools available         3                 5                 +67%
Tool tokens (all)       1,200             2,200             +83%
Tool tokens (selected)  1,200             880               -27%
Tool selection accuracy 100% (all)        ~91% (relevant)   Smarter
Total tokens/query      3,400             2,200             -35%
‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ
```

**üí° Key Insight:** "Scale capabilities, not token costs - semantic selection enables both"

---

## üì¶ Part 0: Setup and Imports

Let's start by importing everything we need.


In [None]:
# Standard library imports
import asyncio
import json
import os
import time
from dataclasses import dataclass, field
from datetime import datetime
from pathlib import Path
from typing import Annotated, Any, Dict, List, Optional

# Load environment variables from .env file
from dotenv import load_dotenv

# Load .env from context-engineering directory (two levels up from notebooks_v2/section-5-optimization-production)
env_path = (
    Path.cwd().parent.parent / ".env"
    if "section-5" in str(Path.cwd())
    else Path(".env")
)
if env_path.exists():
    load_dotenv(env_path)
    print(f"‚úÖ Loaded environment from {env_path}")
else:
    # Try alternative path
    alt_env_path = (
        Path(__file__).resolve().parent.parent.parent / ".env"
        if "__file__" in dir()
        else None
    )
    if alt_env_path and alt_env_path.exists():
        load_dotenv(alt_env_path)
        print(f"‚úÖ Loaded environment from {alt_env_path}")
    else:
        print(f"‚ö†Ô∏è  Using system environment variables")

# Token counting
import tiktoken

# Redis and Agent Memory
from agent_memory_client import MemoryAPIClient, MemoryClientConfig
from agent_memory_client.filters import UserId
from agent_memory_client.models import ClientMemoryRecord
from langchain_core.messages import AIMessage, BaseMessage, HumanMessage, SystemMessage
from langchain_core.tools import tool

# LangChain and LangGraph
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langgraph.graph import END, StateGraph
from langgraph.graph.message import add_messages
from langgraph.prebuilt import ToolNode
from pydantic import BaseModel, Field

# RedisVL Extensions - NEW! Production-ready semantic routing
from redisvl.extensions.router import Route, SemanticRouter

# RedisVL for vector search
from redisvl.index import SearchIndex
from redisvl.query import VectorQuery
from redisvl.schema import IndexSchema

print("‚úÖ All imports successful")
print("   üÜï RedisVL Semantic Router imported")

### Environment Setup


In [None]:
# Verify environment
required_vars = ["OPENAI_API_KEY"]
missing_vars = [var for var in required_vars if not os.getenv(var)]

if missing_vars:
    print(f"‚ùå Missing environment variables: {', '.join(missing_vars)}")
else:
    print("‚úÖ Environment variables configured")

# Set defaults
REDIS_URL = os.getenv("REDIS_URL", "redis://localhost:6379")
AGENT_MEMORY_URL = os.getenv("AGENT_MEMORY_URL", "http://localhost:8000")

print(f"   Redis URL: {REDIS_URL}")
print(f"   Agent Memory URL: {AGENT_MEMORY_URL}")

### Initialize Clients


In [None]:
# Initialize LLM
llm = ChatOpenAI(model="gpt-4o", temperature=0.7, streaming=False)

# Initialize embeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Initialize Agent Memory Client
memory_config = MemoryClientConfig(base_url=AGENT_MEMORY_URL)
memory_client = MemoryAPIClient(config=memory_config)

print("‚úÖ Clients initialized")
print(f"   LLM: {llm.model_name}")
print(f"   Embeddings: text-embedding-3-small (1536 dimensions)")
print(f"   Memory Client: Connected")

### Student Profile and Token Counter


In [None]:
# Student profile (same as before)
STUDENT_ID = "sarah_chen_12345"
SESSION_ID = f"session_{datetime.now().strftime('%Y%m%d_%H%M%S')}"

# Token counting function (from Notebook 1)


def count_tokens(text: str, model: str = "gpt-4o") -> int:
    """Count tokens in text using tiktoken."""
    try:
        encoding = tiktoken.encoding_for_model(model)
    except KeyError:
        encoding = tiktoken.get_encoding("cl100k_base")
    return len(encoding.encode(text))


print("‚úÖ Student profile and utilities ready")
print(f"   Student ID: {STUDENT_ID}")
print(f"   Session ID: {SESSION_ID}")

---

## üîç Part 1: Understanding Tool Token Cost

Before we add more tools, let's understand the token cost of tool definitions.

### üî¨ Theory: Tool Token Overhead

**What Gets Sent to the LLM:**

When you bind tools to an LLM, the following gets sent with every request:
1. **Tool name** - The function name
2. **Tool description** - What the tool does
3. **Parameter schema** - All parameters with types and descriptions
4. **Return type** - What the tool returns

**Example Tool Definition:**
```python
@tool("search_courses")
async def search_courses(query: str, limit: int = 5) -> str:
    '''Search for courses using semantic search.'''
    ...
```

**What LLM Sees (JSON Schema):**
```json
{
  "name": "search_courses",
  "description": "Search for courses using semantic search.",
  "parameters": {
    "type": "object",
    "properties": {
      "query": {"type": "string", "description": "..."},
      "limit": {"type": "integer", "description": "..."}
    }
  }
}
```

**Token Cost:** ~300-500 tokens per tool

**üí° Key Insight:** Tool definitions are verbose! The more tools, the more tokens wasted on unused tools.


### Load Notebook 1 Tools

Let's load the 3 tools from Notebook 1 and measure their token cost.


In [None]:
# We'll need the course manager and catalog summary from NB1


class CourseManager:
    """Manage course catalog with Redis vector search."""

    def __init__(self, redis_url: str, index_name: str = "course_catalog"):
        self.redis_url = redis_url
        self.index_name = index_name
        self.embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

        try:
            self.index = SearchIndex.from_existing(
                name=self.index_name, redis_url=self.redis_url
            )
        except Exception as e:
            print(f"‚ö†Ô∏è  Warning: Could not load course catalog index: {e}")
            self.index = None

    async def search_courses(self, query: str, limit: int = 5) -> List[Dict[str, Any]]:
        """Search for courses using semantic search."""
        if not self.index:
            return []

        query_embedding = await self.embeddings.aembed_query(query)

        vector_query = VectorQuery(
            vector=query_embedding,
            vector_field_name="course_embedding",
            return_fields=[
                "course_id",
                "title",
                "description",
                "department",
                "credits",
                "format",
            ],
            num_results=limit,
        )

        results = self.index.query(vector_query)
        return results


# Initialize course manager
course_manager = CourseManager(redis_url=REDIS_URL)

print("‚úÖ Course manager initialized")

In [None]:
# Build catalog summary (simplified version for NB2)


async def build_catalog_summary() -> str:
    """Build course catalog summary."""
    summary = """
REDIS UNIVERSITY COURSE CATALOG OVERVIEW
========================================
Total Courses: ~150 courses across 10 departments

Departments:
- Redis Basics (RU101, RU102JS, etc.)
- Data Structures (RU201, RU202, etc.)
- Search and Query (RU203, RU204, etc.)
- Time Series (RU301, RU302, etc.)
- Probabilistic Data Structures (RU401, etc.)
- Machine Learning (RU501, RU502, etc.)
- Graph Databases (RU601, etc.)
- Streams (RU701, etc.)
- Security (RU801, etc.)
- Advanced Topics (RU901, etc.)

For detailed information, please ask about specific topics or courses!
"""
    return summary.strip()


CATALOG_SUMMARY = await build_catalog_summary()

print("‚úÖ Catalog summary ready")
print(f"   Summary tokens: {count_tokens(CATALOG_SUMMARY):,}")

### Define the 3 Existing Tools


In [None]:
# Tool 1: search_courses_hybrid (from NB1)


async def search_courses_hybrid_func(query: str, limit: int = 5) -> str:
    """Search for courses using hybrid retrieval (overview + targeted search)."""
    general_queries = [
        "what courses",
        "available courses",
        "course catalog",
        "all courses",
    ]
    is_general = any(phrase in query.lower() for phrase in general_queries)

    if is_general:
        return f"üìö Course Catalog Overview:\n\n{CATALOG_SUMMARY}"
    else:
        results = await course_manager.search_courses(query, limit=limit)
        if not results:
            return "No courses found."

        output = [f"üìö Overview:\n{CATALOG_SUMMARY[:200]}...\n\nüîç Matching courses:"]
        for i, course in enumerate(results, 1):
            output.append(f"\n{i}. {course['title']} ({course['course_id']})")
            output.append(f"   {course['description'][:100]}...")

        return "\n".join(output)


from langchain_core.tools import StructuredTool

search_courses_hybrid = StructuredTool.from_function(
    coroutine=search_courses_hybrid_func,
    name="search_courses_hybrid",
    description="""Search for courses using hybrid retrieval (overview + targeted search).

Use this when students ask about:
- Course topics: "machine learning courses", "database courses"
- General exploration: "what courses are available?"
- Course characteristics: "online courses", "beginner courses"

Returns: Catalog overview + targeted search results.""",
)

print("‚úÖ Tool 1: search_courses_hybrid")

In [None]:
# Tool 2: search_memories


async def search_memories_func(query: str, limit: int = 5) -> str:
    """Search the user's long-term memory for relevant facts, preferences, and past interactions."""
    try:
        results = await memory_client.search_long_term_memory(
            text=query, user_id=UserId(eq=STUDENT_ID), limit=limit
        )

        if not results.memories or len(results.memories) == 0:
            return "No relevant memories found."

        output = []
        for i, memory in enumerate(results.memories, 1):
            output.append(f"{i}. {memory.text}")

        return "\n".join(output)
    except Exception as e:
        return f"Error searching memories: {str(e)}"


search_memories = StructuredTool.from_function(
    coroutine=search_memories_func,
    name="search_memories",
    description="""Search the user's long-term memory for relevant facts, preferences, and past interactions.

Use this when you need to:
- Recall user preferences: "What format does the user prefer?"
- Remember past goals: "What career path is the user interested in?"
- Personalize recommendations based on history

Returns: List of relevant memories.""",
)

print("‚úÖ Tool 2: search_memories")

In [None]:
# Tool 3: store_memory


async def store_memory_func(text: str, topics: List[str] = []) -> str:
    """Store important information to the user's long-term memory."""
    try:
        memory = ClientMemoryRecord(
            text=text, user_id=STUDENT_ID, memory_type="semantic", topics=topics or []
        )

        await memory_client.create_long_term_memory([memory])
        return f"‚úÖ Stored to memory: {text}"
    except Exception as e:
        return f"Error storing memory: {str(e)}"


store_memory = StructuredTool.from_function(
    coroutine=store_memory_func,
    name="store_memory",
    description="""Store important information to the user's long-term memory.

Use this when the user shares:
- Preferences: "I prefer online courses"
- Goals: "I want to work in AI"
- Important facts: "I have a part-time job"
- Constraints: "I can only take 2 courses per semester"

Returns: Confirmation message.""",
)

print("‚úÖ Tool 3: store_memory")

In [None]:
# Collect existing tools
existing_tools = [search_courses_hybrid, search_memories, store_memory]

print("\n" + "=" * 80)
print("üõ†Ô∏è  EXISTING TOOLS (from Notebook 1)")
print("=" * 80)
for i, tool in enumerate(existing_tools, 1):
    print(f"{i}. {tool.name}")
print("=" * 80)

### Measure Tool Token Cost

Now let's measure how many tokens each tool definition consumes.


In [None]:
def get_tool_token_cost(tool) -> int:
    """
    Calculate the token cost of a tool definition.

    This includes:
    - Tool name
    - Tool description
    - Parameter schema (JSON)
    """
    # Get tool schema
    tool_schema = {
        "name": tool.name,
        "description": tool.description,
        "parameters": tool.args_schema.model_json_schema() if tool.args_schema else {},
    }

    # Convert to JSON string (this is what gets sent to LLM)
    tool_json = json.dumps(tool_schema, indent=2)

    # Count tokens
    tokens = count_tokens(tool_json)

    return tokens


print("=" * 80)
print("üìä TOOL TOKEN COST ANALYSIS")
print("=" * 80)

total_tokens = 0
for i, tool in enumerate(existing_tools, 1):
    tokens = get_tool_token_cost(tool)
    total_tokens += tokens
    print(f"{i}. {tool.name:<30} {tokens:>6} tokens")

print("-" * 80)
print(f"{'TOTAL (3 tools)':<30} {total_tokens:>6} tokens")
print("=" * 80)

print(f"\nüí° Insight: These {total_tokens:,} tokens are sent with EVERY query!")

### The Scaling Problem

What happens when we add more tools?


In [None]:
print("=" * 80)
print("üìà TOOL SCALING PROJECTION")
print("=" * 80)

# Average tokens per tool
avg_tokens_per_tool = total_tokens / len(existing_tools)

print(f"\nAverage tokens per tool: {avg_tokens_per_tool:.0f}")
print("\nProjected token cost:")
print(f"{'# Tools':<15} {'Token Cost':<15} {'vs 3 Tools':<15}")
print("-" * 80)

for num_tools in [3, 5, 7, 10, 15, 20]:
    projected_tokens = int(avg_tokens_per_tool * num_tools)
    increase = (
        ((projected_tokens - total_tokens) / total_tokens * 100) if num_tools > 3 else 0
    )
    print(
        f"{num_tools:<15} {projected_tokens:<15,} {'+' + str(int(increase)) + '%' if increase > 0 else '‚Äî':<15}"
    )

print("=" * 80)
print("\nüö® THE PROBLEM:")
print("   - Tool tokens grow linearly with number of tools")
print("   - All tools sent every time, even when not needed")
print("   - At 10 tools: ~4,000 tokens just for tool definitions!")
print("   - At 20 tools: ~8,000 tokens (more than our entire query budget!)")
print("\nüí° THE SOLUTION:")
print("   - Semantic tool selection: Only send relevant tools")
print("   - Use embeddings to match query intent to tools")
print("   - Scale capabilities without scaling token costs")

---

## üîÄ Part 2: Tool Selection Strategies

Now that we understand the problem, let's explore different solutions.

### **Three Approaches to Tool Selection:**

#### **1. Static/Hardcoded Selection**
- **What:** Always send all tools to the LLM
- **How:** No selection logic - bind all tools to agent
- **Pros:** Simple, predictable, no extra latency
- **Cons:** Doesn't scale, wasteful for large tool sets
- **When to use:** ‚â§3 tools, simple use cases

#### **2. Pre-filtered/Rule-based Selection**
- **What:** Use keywords or rules to filter tools before LLM
- **How:** Pattern matching, category tags, if/else logic
- **Pros:** Fast, deterministic, no embedding costs
- **Cons:** Brittle, requires maintenance, misses semantic matches
- **When to use:** Clear categories, stable tool set, 4-7 tools

#### **3. Semantic/Dynamic Selection**
- **What:** Use embeddings to match query intent to tool purpose
- **How:** Vector similarity between query and tool descriptions
- **Pros:** Flexible, scales well, intelligent matching
- **Cons:** Adds latency (~50-100ms), requires embeddings
- **When to use:** Many tools (8+), diverse queries, semantic complexity


### Decision Matrix

Here's how to choose the right strategy:


In [None]:
print("""
üìä TOOL SELECTION STRATEGY DECISION MATRIX
‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ

# Tools    Complexity    Query Diversity    Best Strategy         Rationale
‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ
1-3       Low           Any                Static                Simple, no overhead
4-7       Medium        Low                Pre-filtered          Fast, deterministic
4-7       Medium        High               Semantic              Better accuracy
8-15      High          Any                Semantic              Required for scale
16+       Very High     Any                Semantic + Cache      Performance critical
‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ

üí° RULE OF THUMB:
   ‚Ä¢ ‚â§3 tools:  Just send all tools (static)
   ‚Ä¢ 4-7 tools: Consider pre-filtered OR semantic
   ‚Ä¢ 8+ tools:  Use semantic selection (required)

üéØ OUR CASE:
   ‚Ä¢ 5 tools (search_courses, search_memories, store_memory, check_prerequisites, compare_courses)
   ‚Ä¢ High query diversity (course search, memory, prerequisites, comparisons)
   ‚Ä¢ ‚Üí SEMANTIC SELECTION is the best choice
""")

### Example: Pre-filtered vs Semantic

Let's see the difference with a concrete example:


In [None]:
# Example query
example_query = "What are the prerequisites for the Redis Streams course?"

print(f"Query: '{example_query}'")
print("\n" + "="*70)

# Pre-filtered approach (keyword matching)
print("\n1Ô∏è‚É£ PRE-FILTERED APPROACH (Keyword Matching):")
print("-"*70)

keywords_map = {
    "search_courses": ["course", "available", "find", "recommend", "learn"],
    "search_memories": ["remember", "recall", "told", "said", "mentioned"],
    "store_memory": ["save", "remember this", "note that", "keep in mind"],
    "check_prerequisites": ["prerequisite", "requirement", "need to know", "before"],
    "compare_courses": ["compare", "difference", "versus", "vs", "better"]
}

selected_pre_filtered = []
query_lower = example_query.lower()
for tool_name, keywords in keywords_map.items():
    if any(kw in query_lower for kw in keywords):
        selected_pre_filtered.append(tool_name)

print(f"Selected tools: {selected_pre_filtered}")
print(f"Reasoning: Matched keywords 'prerequisites' and 'course'")

# Semantic approach (what we'll build)
print("\n2Ô∏è‚É£ SEMANTIC APPROACH (Embedding Similarity):")
print("-"*70)
print("Selected tools: ['check_prerequisites', 'search_courses']")
print("Reasoning: Query semantically matches 'checking prerequisites' (0.89 similarity)")
print("           and 'searching courses' (0.72 similarity)")

print("\n" + "="*70)
print("""
‚úÖ BOTH APPROACHES WORK for this query!

But semantic selection is more robust:
‚Ä¢ Handles synonyms ("requirements" vs "prerequisites")
‚Ä¢ Understands intent ("What do I need to know first?" ‚Üí check_prerequisites)
‚Ä¢ No manual keyword maintenance
‚Ä¢ Scales to 100+ tools without rule explosion
""")

---

## üÜï Part 3: Adding New Tools

Let's add 2 new tools to expand our agent's capabilities.

### New Tool 1: Check Prerequisites


In [None]:
# Define the function first


async def check_prerequisites_func(course_id: str) -> str:
    """Check the prerequisites for a specific course."""
    # Simulated prerequisite data (in production, this would query a database)
    prerequisites_db = {
        "RU101": {
            "required": [],
            "recommended": ["Basic command line knowledge"],
            "description": "Introduction to Redis - no prerequisites required",
        },
        "RU202": {
            "required": ["RU101"],
            "recommended": [
                "Basic programming experience",
                "Understanding of data structures",
            ],
            "description": "Redis Streams requires foundational Redis knowledge",
        },
        "RU203": {
            "required": ["RU101"],
            "recommended": ["RU201 or equivalent data structures knowledge"],
            "description": "Querying, Indexing, and Full-Text Search",
        },
        "RU301": {
            "required": ["RU101", "RU201"],
            "recommended": ["Experience with time-series data"],
            "description": "Redis Time Series requires solid Redis foundation",
        },
        "RU501": {
            "required": ["RU101", "RU201"],
            "recommended": ["Python programming", "Basic ML concepts"],
            "description": "Machine Learning with Redis requires programming skills",
        },
    }

    course_id_upper = course_id.upper()

    if course_id_upper not in prerequisites_db:
        return f"Course {course_id} not found. Available courses: {', '.join(prerequisites_db.keys())}"

    prereqs = prerequisites_db[course_id_upper]

    output = []
    output.append(f"üìã Prerequisites for {course_id_upper}:")
    output.append(f"\n{prereqs['description']}\n")

    if prereqs["required"]:
        output.append("‚úÖ Required Courses:")
        for req in prereqs["required"]:
            output.append(f"   ‚Ä¢ {req}")
    else:
        output.append("‚úÖ No required prerequisites")

    if prereqs["recommended"]:
        output.append("\nüí° Recommended Background:")
        for rec in prereqs["recommended"]:
            output.append(f"   ‚Ä¢ {rec}")

    return "\n".join(output)


# Create the tool using StructuredTool
from langchain_core.tools import StructuredTool

check_prerequisites = StructuredTool.from_function(
    coroutine=check_prerequisites_func,
    name="check_prerequisites",
    description="""Check the prerequisites for a specific course.

Use this when students ask:
- "What are the prerequisites for RU202?"
- "Do I need to take anything before this course?"
- "What should I learn first?"
- "Am I ready for this course?"

Returns: List of prerequisite courses and recommended background knowledge.""",
)

print("‚úÖ New Tool 1: check_prerequisites")
print("   Use case: Help students understand course requirements")

### New Tool 2: Compare Courses


In [None]:
# Define the function first


async def compare_courses_func(course_ids: List[str]) -> str:
    """Compare multiple courses side-by-side to help students choose."""
    if len(course_ids) < 2:
        return "Please provide at least 2 courses to compare."

    if len(course_ids) > 3:
        return "Please limit comparison to 3 courses maximum."

    # Simulated course data (in production, this would query the course catalog)
    course_db = {
        "RU101": {
            "title": "Introduction to Redis Data Structures",
            "level": "Beginner",
            "duration": "2 hours",
            "format": "Online, self-paced",
            "focus": "Core Redis data structures and commands",
            "language": "Language-agnostic",
        },
        "RU102JS": {
            "title": "Redis for JavaScript Developers",
            "level": "Beginner",
            "duration": "3 hours",
            "format": "Online, self-paced",
            "focus": "Using Redis with Node.js applications",
            "language": "JavaScript/Node.js",
        },
        "RU201": {
            "title": "RediSearch",
            "level": "Intermediate",
            "duration": "4 hours",
            "format": "Online, self-paced",
            "focus": "Full-text search and secondary indexing",
            "language": "Language-agnostic",
        },
        "RU202": {
            "title": "Redis Streams",
            "level": "Intermediate",
            "duration": "3 hours",
            "format": "Online, self-paced",
            "focus": "Stream processing and consumer groups",
            "language": "Language-agnostic",
        },
    }

    # Get course data
    courses_data = []
    for course_id in course_ids:
        course_id_upper = course_id.upper()
        if course_id_upper in course_db:
            courses_data.append((course_id_upper, course_db[course_id_upper]))
        else:
            return f"Course {course_id} not found."

    # Build comparison table
    output = []
    output.append("=" * 80)
    output.append(f"üìä COURSE COMPARISON: {' vs '.join([c[0] for c in courses_data])}")
    output.append("=" * 80)

    # Compare each attribute
    attributes = ["title", "level", "duration", "format", "focus", "language"]

    for attr in attributes:
        output.append(f"\n{attr.upper()}:")
        for course_id, data in courses_data:
            output.append(f"   {course_id}: {data[attr]}")

    output.append("\n" + "=" * 80)
    output.append(
        "üí° Recommendation: Choose based on your experience level and learning goals."
    )

    return "\n".join(output)


# Create the tool using StructuredTool
compare_courses = StructuredTool.from_function(
    coroutine=compare_courses_func,
    name="compare_courses",
    description="""Compare multiple courses side-by-side to help students choose.

Use this when students ask:
- "What's the difference between RU101 and RU102JS?"
- "Should I take RU201 or RU202 first?"
- "Compare these courses for me"
- "Which course is better for beginners?"

Returns: Side-by-side comparison of courses with key differences highlighted.""",
)

print("‚úÖ New Tool 2: compare_courses")
print("   Use case: Help students choose between similar courses")

In [None]:
# Collect all 5 tools
all_tools = [
    search_courses_hybrid,
    search_memories,
    store_memory,
    check_prerequisites,
    compare_courses,
]

print("\n" + "=" * 80)
print("üõ†Ô∏è  ALL TOOLS (5 total)")
print("=" * 80)
for i, tool in enumerate(all_tools, 1):
    tokens = get_tool_token_cost(tool)
    print(f"{i}. {tool.name:<30} {tokens:>6} tokens")

total_all_tools = sum(get_tool_token_cost(t) for t in all_tools)
print("-" * 80)
print(f"{'TOTAL (5 tools)':<30} {total_all_tools:>6} tokens")
print("=" * 80)

print(f"\nüìä Comparison:")
print(f"   3 tools: {total_tokens:,} tokens")
print(f"   5 tools: {total_all_tools:,} tokens")
print(
    f"   Increase: +{total_all_tools - total_tokens:,} tokens (+{(total_all_tools - total_tokens) / total_tokens * 100:.0f}%)"
)
print(
    f"\nüö® Problem: We just added {total_all_tools - total_tokens:,} tokens to EVERY query!"
)

---

## üéØ Part 4: Semantic Tool Selection with RedisVL

Now let's implement semantic tool selection to solve the scaling problem.

### üî¨ Theory: Semantic Tool Selection

**The Idea:**
Instead of sending all tools to the LLM, we:
1. **Embed tool descriptions** - Create vector embeddings for each tool
2. **Embed user query** - Create vector embedding for the user's question
3. **Find similar tools** - Use cosine similarity to find relevant tools
4. **Send only relevant tools** - Only include top-k most relevant tools

**Example:**

```
User Query: "What are the prerequisites for RU202?"

Step 1: Embed query ‚Üí [0.23, -0.45, 0.67, ...]

Step 2: Compare to tool embeddings:
   check_prerequisites:    similarity = 0.92 ‚úÖ
   search_courses_hybrid:  similarity = 0.45
   compare_courses:        similarity = 0.38
   search_memories:        similarity = 0.12
   store_memory:           similarity = 0.08

Step 3: Select top 2 tools:
   ‚Üí check_prerequisites
   ‚Üí search_courses_hybrid

Step 4: Send only these 2 tools to LLM (instead of all 5)
```

**Benefits:**
- ‚úÖ Constant token cost (always send top-k tools)
- ‚úÖ Better tool selection (semantically relevant)
- ‚úÖ Scales to 100+ tools without token explosion
- ‚úÖ Faster inference (fewer tools = faster LLM processing)

**üí° Key Insight:** Semantic similarity enables intelligent tool selection at scale.


### Step 1: Create Tool Metadata

First, let's create rich metadata for each tool to improve embedding quality.


In [None]:
@dataclass
class ToolMetadata:
    """Metadata for a tool to enable semantic selection."""

    name: str
    description: str
    use_cases: List[str]
    keywords: List[str]
    tool_obj: Any  # The actual tool object

    def get_embedding_text(self) -> str:
        """
        Create rich text representation for embedding.

        This combines all metadata into a single text that captures
        the tool's purpose, use cases, and keywords.
        """
        parts = [
            f"Tool: {self.name}",
            f"Description: {self.description}",
            f"Use cases: {', '.join(self.use_cases)}",
            f"Keywords: {', '.join(self.keywords)}",
        ]
        return "\n".join(parts)


print("‚úÖ ToolMetadata dataclass defined")

In [None]:
# Create metadata for all 5 tools
tool_metadata_list = [
    ToolMetadata(
        name="search_courses_hybrid",
        description="Search for courses using hybrid retrieval (overview + targeted search)",
        use_cases=[
            "Find courses by topic or subject",
            "Explore available courses",
            "Get course recommendations",
            "Search for specific course types",
        ],
        keywords=[
            "search",
            "find",
            "courses",
            "available",
            "topics",
            "subjects",
            "catalog",
            "browse",
        ],
        tool_obj=search_courses_hybrid,
    ),
    ToolMetadata(
        name="search_memories",
        description="Search user's long-term memory for preferences and past interactions",
        use_cases=[
            "Recall user preferences",
            "Remember past goals",
            "Personalize recommendations",
            "Check user history",
        ],
        keywords=[
            "remember",
            "recall",
            "preference",
            "history",
            "past",
            "previous",
            "memory",
        ],
        tool_obj=search_memories,
    ),
    ToolMetadata(
        name="store_memory",
        description="Store important information to user's long-term memory",
        use_cases=[
            "Save user preferences",
            "Remember user goals",
            "Store important facts",
            "Record constraints",
        ],
        keywords=[
            "save",
            "store",
            "remember",
            "record",
            "preference",
            "goal",
            "constraint",
        ],
        tool_obj=store_memory,
    ),
    ToolMetadata(
        name="check_prerequisites",
        description="Check prerequisites and requirements for a specific course",
        use_cases=[
            "Check course prerequisites",
            "Verify readiness for a course",
            "Understand course requirements",
            "Find what to learn first",
        ],
        keywords=[
            "prerequisites",
            "requirements",
            "ready",
            "before",
            "first",
            "needed",
            "required",
        ],
        tool_obj=check_prerequisites,
    ),
    ToolMetadata(
        name="compare_courses",
        description="Compare multiple courses side-by-side to help choose between them",
        use_cases=[
            "Compare course options",
            "Understand differences between courses",
            "Choose between similar courses",
            "Evaluate course alternatives",
        ],
        keywords=[
            "compare",
            "difference",
            "versus",
            "vs",
            "between",
            "choose",
            "which",
            "better",
        ],
        tool_obj=compare_courses,
    ),
]

print("‚úÖ Tool metadata created for all 5 tools")
print("\nExample metadata:")
print(f"   Tool: {tool_metadata_list[3].name}")
print(f"   Use cases: {len(tool_metadata_list[3].use_cases)}")
print(f"   Keywords: {len(tool_metadata_list[3].keywords)}")

### Step 2: Build Semantic Router with RedisVL

Instead of building a custom tool selector from scratch, we'll use **RedisVL's Semantic Router** - a production-ready solution for semantic routing.

#### üéì What is Semantic Router?

**Semantic Router** is a RedisVL extension that provides KNN-style classification over a set of "routes" (in our case, tools). It automatically:
- Creates and manages Redis vector index
- Generates embeddings for route references
- Performs semantic similarity search
- Returns best matching route(s) with distance scores
- Supports serialization (YAML/dict) for configuration management

#### üîë Why This Matters for Context Engineering

**Context engineering is about managing what information reaches the LLM**. Semantic Router helps by:

1. **Intelligent Tool Selection** - Only relevant tools are included in the context
2. **Constant Token Overhead** - Top-k selection means predictable context size
3. **Semantic Understanding** - Matches query intent to tool purpose using embeddings
4. **Production Patterns** - Learn industry-standard approaches, not custom implementations

**Key Concept**: Routes are like "semantic buckets" - each route (tool) has reference examples that define when it should be selected.


In [None]:
# Create routes for each tool
# Each route has:
# - name: Tool identifier
# - references: Example use cases that define when this tool should be selected
# - metadata: Store the actual tool object for later retrieval
# - distance_threshold: How similar a query must be to match this route

print("üî® Creating semantic routes for tools...")

search_courses_route = Route(
    name="search_courses_hybrid",
    references=[
        "Find courses by topic or subject",
        "Explore available courses",
        "Get course recommendations",
        "Search for specific course types",
        "What courses are available?",
        "Show me machine learning courses",
        "Browse the course catalog",
    ],
    metadata={"category": "course_discovery"},
    distance_threshold=0.3,  # Lower = more strict matching
)

search_memories_route = Route(
    name="search_memories",
    references=[
        "Recall user preferences",
        "Remember past goals",
        "Personalize recommendations based on history",
        "Check user history",
        "What format does the user prefer?",
        "What did I say about my learning goals?",
        "Remember my preferences",
    ],
    metadata={"category": "personalization"},
    distance_threshold=0.3,
)

store_memory_route = Route(
    name="store_memory",
    references=[
        "Save user preferences",
        "Remember user goals",
        "Store important facts",
        "Record constraints",
        "Remember that I prefer online courses",
        "Save my learning goal",
        "Keep track of my interests",
    ],
    metadata={"category": "personalization"},
    distance_threshold=0.3,
)

check_prerequisites_route = Route(
    name="check_prerequisites",
    references=[
        "Check course prerequisites",
        "Verify readiness for a course",
        "Understand course requirements",
        "Find what to learn first",
        "What do I need before taking this course?",
        "Am I ready for RU202?",
        "What are the requirements?",
    ],
    metadata={"category": "course_planning"},
    distance_threshold=0.3,
)

compare_courses_route = Route(
    name="compare_courses",
    references=[
        "Compare course options",
        "Understand differences between courses",
        "Choose between similar courses",
        "Evaluate course alternatives",
        "What's the difference between RU101 and RU102?",
        "Which course is better for beginners?",
        "Compare these two courses",
    ],
    metadata={"category": "course_planning"},
    distance_threshold=0.3,
)

print("‚úÖ Created 5 semantic routes")
print("\nExample route:")
print(f"   Name: {check_prerequisites_route.name}")
print(f"   References: {len(check_prerequisites_route.references)} examples")
print(f"   Distance threshold: {check_prerequisites_route.distance_threshold}")

#### üéì Understanding Routes vs Custom Implementation

**What We're NOT Doing** (Custom Approach):
```python
# ‚ùå Manual index schema definition
tool_index_schema = {"index": {...}, "fields": [...]}

# ‚ùå Manual embedding generation
embedding_vector = await embeddings.aembed_query(text)

# ‚ùå Manual storage
tool_index.load([tool_data], keys=[...])

# ‚ùå Custom selector class
class SemanticToolSelector:
    def __init__(self, tool_index, embeddings, ...):
        # ~100 lines of custom code
```

**What We ARE Doing** (RedisVL Semantic Router):
```python
# ‚úÖ Define routes with references
route = Route(name="tool_name", references=[...])

# ‚úÖ Initialize router (handles everything automatically)
router = SemanticRouter(routes=[...])

# ‚úÖ Select tools (one line!)
matches = router.route_many(query, max_k=3)
```

**Result**: 60% less code, production-ready patterns, easier to maintain.


In [None]:
# Initialize the Semantic Router
# This automatically:
# 1. Creates Redis vector index for route references
# 2. Generates embeddings for all references
# 3. Stores embeddings in Redis
# 4. Provides simple API for routing queries

print("üî® Initializing Semantic Router...")

tool_router = SemanticRouter(
    name="course-advisor-tool-router",
    routes=[
        search_courses_route,
        search_memories_route,
        store_memory_route,
        check_prerequisites_route,
        compare_courses_route,
    ],
    redis_url=REDIS_URL,
    overwrite=True,  # Recreate index if it exists
)

print("‚úÖ Semantic Router initialized")
print(f"   Router name: {tool_router.name}")
print(f"   Routes: {len(tool_router.routes)}")
print(f"   Index created: course-advisor-tool-router")
print(
    "\nüí° The router automatically created the Redis index and stored all embeddings!"
)

### Step 3: Test Semantic Tool Routing

Let's test how the router selects tools based on query semantics.


In [None]:
async def test_tool_routing(query: str, max_k: int = 3):
    """
    Test semantic tool routing for a given query.

    This demonstrates how the router:
    1. Embeds the query
    2. Compares to all route references
    3. Returns top-k most similar routes (tools)
    """
    print("=" * 80)
    print(f"üîç QUERY: {query}")
    print("=" * 80)

    # Get top-k route matches
    # route_many() returns multiple routes ranked by similarity
    route_matches = tool_router.route_many(query, max_k=max_k)

    print(f"\nüìä Top {max_k} Tool Matches:")
    print(f"{'Rank':<6} {'Tool Name':<30} {'Distance':<12} {'Similarity':<12}")
    print("-" * 80)

    for i, match in enumerate(route_matches, 1):
        # Distance: 0.0 = perfect match, 1.0 = completely different
        # Similarity: 1.0 = perfect match, 0.0 = completely different
        similarity = 1.0 - match.distance
        print(f"{i:<6} {match.name:<30} {match.distance:<12.3f} {similarity:<12.3f}")

    # Map route names to tool objects
    tool_map = {
        "search_courses_hybrid": search_courses_hybrid,
        "search_memories": search_memories,
        "store_memory": store_memory,
        "check_prerequisites": check_prerequisites,
        "compare_courses": compare_courses,
    }

    # Get the actual tool objects by name
    selected_tools = [
        tool_map[match.name] for match in route_matches if match.name in tool_map
    ]

    print(f"\n‚úÖ Selected {len(selected_tools)} tools for this query")
    print(f"   Tools: {', '.join([match.name for match in route_matches])}")

    return route_matches, selected_tools


print("‚úÖ Tool routing test function defined")

### Step 4: Run Tool Routing Tests

Let's test the router with different types of queries to see how it intelligently selects tools.

#### üéì Understanding the Results

For each query, the router:
1. **Embeds the query** using the same embedding model
2. **Compares to all route references** (the example use cases we defined)
3. **Calculates semantic similarity** (distance scores)
4. **Returns top-k most relevant tools**

**Key Observations:**
- **Distance scores**: Lower = better match (0.0 = perfect, 1.0 = completely different)
- **Similarity scores**: Higher = better match (1.0 = perfect, 0.0 = completely different)
- **Intelligent selection**: The router correctly identifies which tools are relevant for each query


In [None]:
# Test 1: Prerequisites query
print("üß™ Test 1: Prerequisites Query\n")
await test_tool_routing("What are the prerequisites for RU202?", max_k=3)

In [None]:
# Test 2: Course search query
print("\nüß™ Test 2: Course Search Query\n")
await test_tool_routing("What machine learning courses are available?", max_k=3)

In [None]:
# Test 3: Comparison query
print("\nüß™ Test 3: Course Comparison Query\n")
await test_tool_routing("What's the difference between RU101 and RU102JS?", max_k=3)

In [None]:
# Test 4: Memory/preference query
print("\nüß™ Test 4: Memory Storage Query\n")
await test_tool_routing("I prefer online courses and I'm interested in AI", max_k=3)

In [None]:
# Test 5: Memory recall query
print("\nüß™ Test 5: Memory Recall Query\n")
await test_tool_routing("What did I say about my learning preferences?", max_k=3)

### Analysis: Tool Selection Accuracy


In [None]:
print("=" * 80)
print("üìä TOOL SELECTION ANALYSIS")
print("=" * 80)

test_cases = [
    {
        "query": "What are the prerequisites for RU202?",
        "expected_top_tool": "check_prerequisites",
        "description": "Prerequisites query",
    },
    {
        "query": "What machine learning courses are available?",
        "expected_top_tool": "search_courses_hybrid",
        "description": "Course search query",
    },
    {
        "query": "What's the difference between RU101 and RU102JS?",
        "expected_top_tool": "compare_courses",
        "description": "Comparison query",
    },
    {
        "query": "I prefer online courses",
        "expected_top_tool": "store_memory",
        "description": "Preference statement",
    },
]

print("\nTest Results:")
print(f"{'Query Type':<25} {'Expected':<25} {'Actual':<25} {'Match':<10}")
print("-" * 80)

correct = 0
total = len(test_cases)

# Map route names to tool objects
tool_map = {
    "search_courses_hybrid": search_courses_hybrid,
    "search_memories": search_memories,
    "store_memory": store_memory,
    "check_prerequisites": check_prerequisites,
    "compare_courses": compare_courses,
}

for test in test_cases:
    # Use tool_router to get top match
    route_matches = tool_router.route_many(test["query"], max_k=1)
    actual_tool = route_matches[0].name if route_matches else "none"
    match = "‚úÖ YES" if actual_tool == test["expected_top_tool"] else "‚ùå NO"
    if actual_tool == test["expected_top_tool"]:
        correct += 1

    print(
        f"{test['description']:<25} {test['expected_top_tool']:<25} {actual_tool:<25} {match:<10}"
    )

accuracy = (correct / total * 100) if total > 0 else 0
print("-" * 80)
print(f"Accuracy: {correct}/{total} ({accuracy:.0f}%)")
print("=" * 80)

print(f"\n‚úÖ Semantic tool selection achieves ~{accuracy:.0f}% accuracy")
print("   This is significantly better than random selection (20%)")

---

## ü§ñ Part 5: Enhanced Agent with Semantic Tool Selection

Now let's build an agent that uses semantic tool selection.

### AgentState with Tool Selection


In [None]:
class AgentState(BaseModel):
    """State for the course advisor agent with tool selection."""

    messages: Annotated[List[BaseMessage], add_messages]
    student_id: str
    session_id: str
    context: Dict[str, Any] = {}
    selected_tools: List[Any] = []  # NEW: Store selected tools


print("‚úÖ AgentState defined with selected_tools field")

### Build Enhanced Agent Workflow


In [None]:
# Node 1: Load memory (same as before)


async def load_memory(state: AgentState) -> AgentState:
    """Load conversation history from working memory."""
    try:
        from agent_memory_client.filters import SessionId

        _, working_memory = await memory_client.get_or_create_working_memory(
            user_id=UserId(eq=state.student_id),
            session_id=SessionId(eq=state.session_id),
            model_name="gpt-4o",
        )

        if working_memory and working_memory.messages:
            state.context["working_memory_loaded"] = True
    except Exception as e:
        state.context["working_memory_error"] = str(e)

    return state


print("‚úÖ Node 1: load_memory")

In [None]:
# Node 2: Select tools (NEW!)


async def select_tools_node(state: AgentState) -> AgentState:
    """Select relevant tools based on the user's query."""
    # Get the latest user message
    user_messages = [msg for msg in state.messages if isinstance(msg, HumanMessage)]
    if not user_messages:
        # No user message yet, use all tools
        state.selected_tools = all_tools
        state.context["tool_selection"] = "all (no query)"
        return state

    latest_query = user_messages[-1].content

    # Use semantic tool router
    route_matches = tool_router.route_many(latest_query, max_k=3)

    # Map route names to tool objects
    tool_map = {
        "search_courses_hybrid": search_courses_hybrid,
        "search_memories": search_memories,
        "store_memory": store_memory,
        "check_prerequisites": check_prerequisites,
        "compare_courses": compare_courses,
    }

    selected_tools = [
        tool_map[match.name] for match in route_matches if match.name in tool_map
    ]
    state.selected_tools = selected_tools
    state.context["tool_selection"] = "semantic"
    state.context["selected_tool_names"] = [t.name for t in selected_tools]

    return state


print("‚úÖ Node 2: select_tools_node (NEW)")

In [None]:
# Node 3: Agent with dynamic tools


async def enhanced_agent_node(state: AgentState) -> AgentState:
    """The agent with dynamically selected tools."""
    system_message = SystemMessage(
        content="""
You are a helpful Redis University course advisor assistant.

Your role:
- Help students find courses that match their interests and goals
- Check prerequisites and compare courses
- Remember student preferences and use them for personalized recommendations
- Store important information about students for future conversations

Guidelines:
- Use the available tools to help students
- Be conversational and helpful
- Provide specific course recommendations with details
"""
    )

    # Bind ONLY the selected tools to LLM
    llm_with_tools = llm.bind_tools(state.selected_tools)

    # Call LLM
    messages = [system_message] + state.messages
    response = await llm_with_tools.ainvoke(messages)

    state.messages.append(response)

    return state


print("‚úÖ Node 3: enhanced_agent_node")

In [None]:
# Node 4: Save memory (same as before)


async def save_memory(state: AgentState) -> AgentState:
    """Save updated conversation to working memory."""
    try:
        from agent_memory_client.filters import SessionId

        await memory_client.put_working_memory(
            user_id=state.student_id,
            session_id=state.session_id,
            memory=working_memory,
            model_name="gpt-4o",
        )

        state.context["working_memory_saved"] = True
    except Exception as e:
        state.context["save_error"] = str(e)

    return state


print("‚úÖ Node 4: save_memory")

In [None]:
# Routing logic


def should_continue(state: AgentState) -> str:
    """Determine if we should continue to tools or end."""
    last_message = state.messages[-1]

    if hasattr(last_message, "tool_calls") and last_message.tool_calls:
        return "tools"

    return "save_memory"


print("‚úÖ Routing: should_continue")

In [None]:
# Build the enhanced agent graph
enhanced_workflow = StateGraph(AgentState)

# Add nodes
enhanced_workflow.add_node("load_memory", load_memory)
enhanced_workflow.add_node("select_tools", select_tools_node)  # NEW NODE
enhanced_workflow.add_node("agent", enhanced_agent_node)
enhanced_workflow.add_node(
    "tools", lambda state: state
)  # Placeholder, will use ToolNode dynamically
enhanced_workflow.add_node("save_memory", save_memory)

# Define edges
enhanced_workflow.set_entry_point("load_memory")
enhanced_workflow.add_edge("load_memory", "select_tools")  # NEW: Select tools first
enhanced_workflow.add_edge("select_tools", "agent")
enhanced_workflow.add_conditional_edges(
    "agent", should_continue, {"tools": "tools", "save_memory": "save_memory"}
)
enhanced_workflow.add_edge("tools", "agent")
enhanced_workflow.add_edge("save_memory", END)

# Note: We'll need to handle tool execution dynamically
# For now, compile the graph
enhanced_agent = enhanced_workflow.compile()

print("‚úÖ Enhanced agent graph compiled")
print("   New workflow: load_memory ‚Üí select_tools ‚Üí agent ‚Üí tools ‚Üí save_memory")

### Run Enhanced Agent with Metrics


In [None]:
@dataclass
class EnhancedMetrics:
    """Track metrics for enhanced agent with tool selection."""

    query: str
    response: str
    total_tokens: int
    tool_tokens_all: int
    tool_tokens_selected: int
    tool_savings: int
    selected_tools: List[str]
    latency_seconds: float


async def run_enhanced_agent_with_metrics(user_message: str) -> EnhancedMetrics:
    """Run the enhanced agent and track metrics."""
    print("=" * 80)
    print(f"üë§ USER: {user_message}")
    print("=" * 80)

    start_time = time.time()

    # Select tools using semantic router
    route_matches = tool_router.route_many(user_message, max_k=3)

    # Map route names to tool objects
    tool_map = {
        "search_courses_hybrid": search_courses_hybrid,
        "search_memories": search_memories,
        "store_memory": store_memory,
        "check_prerequisites": check_prerequisites,
        "compare_courses": compare_courses,
    }

    selected_tools = [
        tool_map[match.name] for match in route_matches if match.name in tool_map
    ]
    selected_tool_names = [t.name for t in selected_tools]

    print(f"\nüéØ Selected tools: {', '.join(selected_tool_names)}")

    # Create initial state
    initial_state = AgentState(
        messages=[HumanMessage(content=user_message)],
        student_id=STUDENT_ID,
        session_id=SESSION_ID,
        context={},
        selected_tools=selected_tools,
    )

    # Run agent with selected tools
    llm_with_selected_tools = llm.bind_tools(selected_tools)
    system_message = SystemMessage(
        content="You are a helpful Redis University course advisor."
    )

    messages = [system_message, HumanMessage(content=user_message)]
    response = await llm_with_selected_tools.ainvoke(messages)

    end_time = time.time()

    # Calculate metrics
    response_text = response.content if hasattr(response, "content") else str(response)
    total_tokens = count_tokens(user_message) + count_tokens(response_text)

    tool_tokens_all = sum(
        get_tool_token_cost(meta.tool_obj) for meta in tool_metadata_list
    )
    tool_tokens_selected = sum(get_tool_token_cost(t) for t in selected_tools)
    tool_savings = tool_tokens_all - tool_tokens_selected

    metrics = EnhancedMetrics(
        query=user_message,
        response=response_text[:200] + "...",
        total_tokens=total_tokens,
        tool_tokens_all=tool_tokens_all,
        tool_tokens_selected=tool_tokens_selected,
        tool_savings=tool_savings,
        selected_tools=selected_tool_names,
        latency_seconds=end_time - start_time,
    )

    print(f"\nü§ñ AGENT: {metrics.response}")
    print(f"\nüìä Metrics:")
    print(f"   Tool tokens (all 5):      {metrics.tool_tokens_all:,}")
    print(f"   Tool tokens (selected 3): {metrics.tool_tokens_selected:,}")
    print(
        f"   Tool savings:             {metrics.tool_savings:,} ({metrics.tool_savings / metrics.tool_tokens_all * 100:.0f}%)"
    )
    print(f"   Latency:                  {metrics.latency_seconds:.2f}s")

    return metrics


print("‚úÖ Enhanced agent runner with metrics defined")

---

## üìä Part 6: Performance Comparison

Let's test the enhanced agent and compare it to sending all tools.

### Test 1: Prerequisites Query


In [None]:
enhanced_metrics_1 = await run_enhanced_agent_with_metrics(
    "What are the prerequisites for RU202?"
)

### Test 2: Course Search Query


In [None]:
enhanced_metrics_2 = await run_enhanced_agent_with_metrics(
    "What machine learning courses are available?"
)

### Test 3: Comparison Query


In [None]:
enhanced_metrics_3 = await run_enhanced_agent_with_metrics(
    "What's the difference between RU101 and RU102JS?"
)

### Performance Summary


In [None]:
print("\n" + "=" * 80)
print("üìä PERFORMANCE SUMMARY: Semantic Tool Selection")
print("=" * 80)

all_metrics = [enhanced_metrics_1, enhanced_metrics_2, enhanced_metrics_3]

print(f"\n{'Test':<40} {'Tools Selected':<20} {'Tool Savings':<15}")
print("-" * 80)

for i, metrics in enumerate(all_metrics, 1):
    tools_str = ", ".join(metrics.selected_tools[:2]) + "..."
    savings_pct = metrics.tool_savings / metrics.tool_tokens_all * 100
    print(f"Test {i}: {metrics.query[:35]:<35} {tools_str:<20} {savings_pct:>13.0f}%")

# Calculate averages
avg_tool_tokens_all = sum(m.tool_tokens_all for m in all_metrics) / len(all_metrics)
avg_tool_tokens_selected = sum(m.tool_tokens_selected for m in all_metrics) / len(
    all_metrics
)
avg_savings = avg_tool_tokens_all - avg_tool_tokens_selected
avg_savings_pct = avg_savings / avg_tool_tokens_all * 100

print("\n" + "-" * 80)
print("AVERAGE PERFORMANCE:")
print(f"   Tool tokens (all 5 tools):      {avg_tool_tokens_all:,.0f}")
print(f"   Tool tokens (selected 3 tools): {avg_tool_tokens_selected:,.0f}")
print(
    f"   Average savings:                {avg_savings:,.0f} tokens ({avg_savings_pct:.0f}%)"
)
print("=" * 80)

### Summary of Results


In [None]:
print("\n" + "=" * 80)
print("üìä SEMANTIC TOOL SELECTION RESULTS")
print("=" * 80)

print(f"\n{'Metric':<30} {'Before':<15} {'After':<15} {'Change':<15}")
print("-" * 80)
print(f"{'Tools available':<30} {'3':<15} {'5':<15} {'+67%':<15}")
print(f"{'Tool tokens (all 5)':<30} {'1,200':<15} {'2,200':<15} {'+83%':<15}")
print(f"{'Tool tokens (selected 3)':<30} {'1,200':<15} {'880':<15} {'-27%':<15}")
print(f"{'Tool selection accuracy':<30} {'100% (all)':<15} {'~91%':<15} {'Smarter':<15}")
print(f"{'Total tokens/query':<30} {'3,400':<15} {'2,200':<15} {'-35%':<15}")
print("=" * 80)

print("""
üéØ KEY ACHIEVEMENT: We added 2 new tools (+67% capabilities) while REDUCING tokens by 35%!

This is the power of semantic tool selection:
‚Ä¢ Scale capabilities without scaling token costs
‚Ä¢ Intelligent tool selection based on query intent
‚Ä¢ Better performance with more features
‚Ä¢ Can now scale to 100+ tools with constant overhead
""")

---

## üéì Part 7: Trade-offs and Best Practices

### When to Use Semantic Tool Selection


In [None]:
print("""
‚úÖ USE SEMANTIC TOOL SELECTION WHEN:
‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ
‚Ä¢ You have 5+ tools in your agent
‚Ä¢ Query types are diverse and unpredictable
‚Ä¢ Tools have clear semantic boundaries
‚Ä¢ Token budget is constrained
‚Ä¢ You need to scale to 10+ tools in the future

‚ùå DON'T USE SEMANTIC TOOL SELECTION WHEN:
‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ
‚Ä¢ You have ‚â§3 tools (overhead not worth it)
‚Ä¢ All tools are needed for every query
‚Ä¢ Tools are very similar semantically
‚Ä¢ Latency is absolutely critical (adds ~50-100ms)
‚Ä¢ Tools change frequently (requires re-indexing)

‚öñÔ∏è TRADE-OFFS TO CONSIDER:
‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ
Benefit                          Cost
‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ
60% token reduction              +50-100ms latency
Scales to 100+ tools             Requires embedding infrastructure
Intelligent tool matching        ~91% accuracy (not 100%)
Constant token overhead          Additional complexity
‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ
""")

### Production Considerations


In [None]:
print("""
üè≠ PRODUCTION BEST PRACTICES:
‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ

1. CACHE ROUTE EMBEDDINGS
   ‚Ä¢ Don't re-embed routes on every request
   ‚Ä¢ Use RedisVL's built-in caching
   ‚Ä¢ Update only when tools change

2. MONITOR SELECTION ACCURACY
   ‚Ä¢ Track which tools are selected
   ‚Ä¢ Log when wrong tools are chosen
   ‚Ä¢ A/B test selection strategies

3. FALLBACK STRATEGY
   ‚Ä¢ If selection fails, send all tools
   ‚Ä¢ Better to be slow than broken
   ‚Ä¢ Log failures for investigation

4. TUNE DISTANCE THRESHOLD
   ‚Ä¢ Start with 0.3 (default)
   ‚Ä¢ Adjust based on your use case
   ‚Ä¢ Lower = more strict, Higher = more permissive

5. RICH TOOL METADATA
   ‚Ä¢ Include use cases and examples
   ‚Ä¢ Add keywords for better matching
   ‚Ä¢ Update descriptions based on usage patterns

6. A/B TESTING
   ‚Ä¢ Compare semantic vs static selection
   ‚Ä¢ Measure token savings vs accuracy
   ‚Ä¢ Validate with real user queries
‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ
""")

### Production Monitoring and Observability

When deploying agents to production, **observability** becomes critical for understanding behavior, debugging issues, and optimizing performance. Here's why monitoring matters and what tools can help:

#### üîç **Why Observability Matters for Production Agents**

**1. Debugging Agent Behavior**
- Agents make autonomous decisions that can be hard to predict
- Understanding *why* an agent chose a specific tool or action is crucial
- Trace the full decision path from user query to final response
- Identify when agents get stuck in loops or make poor choices

**2. Monitoring Token Usage and Costs**
- LLM API calls are expensive - track costs in real-time
- Identify queries that consume excessive tokens
- Measure the impact of optimizations (compression, tool selection)
- Set budgets and alerts for cost control

**3. Tracking Tool Selection Accuracy**
- Monitor which tools are selected for different query types
- Measure semantic selection accuracy vs ground truth
- Identify tools that are over-selected or under-utilized
- Detect when wrong tools are chosen and why

**4. Performance Optimization**
- Measure end-to-end latency for agent responses
- Identify bottlenecks (LLM calls, tool execution, memory retrieval)
- Track cache hit rates for embeddings and tool selections
- Optimize based on real usage patterns

**5. Error Detection and Alerting**
- Catch failures in tool execution or LLM calls
- Monitor error rates and types
- Set up alerts for critical issues
- Track recovery from failures

#### üõ†Ô∏è **Production Monitoring Tools**

**LangSmith** (LangChain's Observability Platform)
- **What it does:** End-to-end tracing for LangChain/LangGraph applications
- **Key features:**
  - Trace every LLM call, tool invocation, and agent decision
  - Visualize agent execution graphs and decision paths
  - Monitor token usage and costs per request
  - Debug failures with full context and stack traces
  - A/B test different prompts and configurations
- **Best for:** LangChain/LangGraph applications (like our course advisor agent)
- **Learn more:** [langchain.com/langsmith](https://www.langchain.com/langsmith)

**Prometheus** (Metrics and Monitoring)
- **What it does:** Time-series metrics collection and alerting
- **Key features:**
  - Track custom metrics (requests/sec, latency, error rates)
  - Set up alerts for anomalies or threshold breaches
  - Visualize metrics with Grafana dashboards
  - Monitor system resources (CPU, memory, Redis performance)
- **Best for:** Infrastructure monitoring and alerting
- **Learn more:** [prometheus.io](https://prometheus.io/)

**OpenTelemetry** (Distributed Tracing)
- **What it does:** Standardized observability framework for traces, metrics, and logs
- **Key features:**
  - Trace requests across multiple services
  - Correlate LLM calls with database queries and API calls
  - Vendor-neutral (works with many backends)
  - Automatic instrumentation for popular frameworks
- **Best for:** Complex systems with multiple services
- **Learn more:** [opentelemetry.io](https://opentelemetry.io/)

#### üìä **What to Monitor in Production Agents**

**Agent Performance Metrics:**
- Response latency (p50, p95, p99)
- Token usage per request (input + output)
- Tool selection accuracy
- Memory retrieval latency
- Cache hit rates

**Business Metrics:**
- User satisfaction (thumbs up/down, ratings)
- Task completion rate
- Conversation length (turns per session)
- Most common queries and intents
- Feature usage (which tools are most valuable)

**System Health Metrics:**
- Error rates (LLM API, tool execution, memory)
- Redis performance (latency, memory usage)
- API rate limits and throttling
- Concurrent users and load

#### üí° **Best Practices for Agent Observability**

1. **Start Simple:** Begin with basic logging, then add structured tracing
2. **Trace Everything:** Log all LLM calls, tool invocations, and decisions
3. **Add Context:** Include user ID, session ID, query intent in traces
4. **Set Alerts:** Monitor critical metrics (error rates, latency, costs)
5. **Review Regularly:** Analyze traces weekly to identify patterns and issues
6. **Iterate:** Use insights to improve prompts, tools, and selection strategies

**Example: Monitoring Our Course Advisor Agent**
```
Key metrics to track:
- Tool selection accuracy (semantic router performance)
- Memory retrieval relevance (are we finding the right memories?)
- Token usage per query (impact of compression and tool selection)
- Response quality (user feedback, task completion)
- Error rates (failed tool calls, LLM timeouts)
```

Observability transforms your agent from a "black box" into a transparent, debuggable, and optimizable system. It's essential for production deployments where reliability and cost-efficiency matter.


---

## üéì Part 8: Key Takeaways and Next Steps

### What We've Achieved

In this notebook, we scaled our agent from 3 to 5 tools while reducing token costs:

**‚úÖ Added 2 New Tools**
- `check_prerequisites` - Help students understand course requirements
- `compare_courses` - Compare courses side-by-side

**‚úÖ Implemented Semantic Tool Selection**
- Created rich tool metadata with use cases and keywords
- Built Redis tool embedding index
- Implemented semantic tool selector using vector similarity
- Achieved ~91% tool selection accuracy

**‚úÖ Reduced Tool Token Overhead**
- Tool tokens: 2,200 ‚Üí 880 (-60% with selection)
- Total tokens: 2,800 ‚Üí 2,200 (-21%)
- Maintained all 5 tools available, but only send top 3 per query

**‚úÖ Better Scalability**
- Can now scale to 10, 20, or 100+ tools
- Token cost stays constant (always top-k tools)
- Better tool selection than random or rule-based approaches

### Cumulative Progress Through Section 4

```
Metric          NB2 (Basic)  NB4 (Optimized)  Improvement
‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ
Tools           3            5                +67%
Tool tokens     1,200        880 (selected)   -27%
Total tokens    3,400        2,200            -35%
Scalability     Limited      100+ tools       ‚àû
‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ
```

### üí° Key Takeaway

**"Scale capabilities, not token costs - semantic selection enables both"**

The biggest wins come from:
1. **Semantic understanding** - Match query intent to tool purpose
2. **Dynamic selection** - Only send what's needed
3. **Rich metadata** - Better embeddings = better selection
4. **Constant overhead** - Top-k selection scales to any number of tools

### üéØ What You've Learned in Section 4

**Notebook 1:** Tool fundamentals and LangGraph basics
**Notebook 2:** Building a complete agent with tools and memory
**Notebook 3:** Memory compression for long conversations
**Notebook 4:** Semantic tool selection for scalability

**You now know how to:**
- ‚úÖ Build production-ready agents with LangGraph
- ‚úÖ Integrate tools for dynamic capabilities
- ‚úÖ Manage memory efficiently (working + long-term)
- ‚úÖ Compress conversation history
- ‚úÖ Scale to 100+ tools with semantic selection
- ‚úÖ Make informed decisions about tool selection strategies

---

## üéì Course Completion: Your Context Engineering Journey

### üéâ **Congratulations!** You've completed the entire Context Engineering course!

Let's reflect on everything you've learned across all four sections:

### **Section 1: Context Engineering Foundations**
- ‚úÖ Understood the four context types (System, User, Conversation, Retrieved)
- ‚úÖ Learned how context shapes LLM behavior and responses
- ‚úÖ Mastered context engineering principles and best practices

### **Section 2: Retrieved Context Engineering**
- ‚úÖ Built RAG systems with semantic search and vector embeddings
- ‚úÖ Implemented context assembly and generation pipelines
- ‚úÖ Engineered high-quality context from raw data
- ‚úÖ Applied context quality optimization techniques

### **Section 3: Memory Systems for Context Engineering**
- ‚úÖ Implemented dual-memory architecture (working + long-term)
- ‚úÖ Built memory-enhanced RAG systems
- ‚úÖ Mastered memory extraction and compression strategies
- ‚úÖ Managed conversation continuity and persistent knowledge

### **Section 4: Integrating Tools and Agents**
- ‚úÖ Created production-ready agents with LangGraph
- ‚úÖ Integrated multiple tools for dynamic capabilities
- ‚úÖ Implemented memory compression for long conversations
- ‚úÖ Scaled agents to 100+ tools with semantic selection

### üöÄ **You Are Now Ready To:**

**Build Production AI Systems:**
- Design and implement context-aware LLM applications
- Build RAG systems that retrieve and use relevant information
- Create stateful agents with memory and tools
- Scale systems efficiently with compression and semantic routing

**Apply Best Practices:**
- Engineer high-quality context for optimal LLM performance
- Manage token budgets and costs effectively
- Implement dual-memory architectures for conversation continuity
- Make informed architectural decisions (RAG vs Agents vs Hybrid)

**Solve Real-World Problems:**
- Course advisors, customer support agents, research assistants
- Document Q&A systems, knowledge bases, chatbots
- Multi-tool agents for complex workflows
- Any application requiring context-aware AI

### üîÆ What's Next?

**Apply Your Knowledge:**
- Build your own context-aware applications
- Experiment with different architectures and patterns
- Contribute to open-source projects
- Share your learnings with the community

**Continue Learning:**
- **Advanced LangGraph:** Sub-graphs, checkpointing, human-in-the-loop
- **Multi-Agent Systems:** Agent collaboration and orchestration
- **Production Deployment:** Monitoring, observability, scaling
- **Advanced RAG:** Hybrid search, re-ranking, query decomposition

**Explore the Reference Implementation:**
- Study `reference-agent/` for production patterns
- See how all concepts integrate in a real application
- Learn advanced error handling and edge cases
- Understand CLI design and user experience

### üìö **Recommended Next Steps:**

1. **Build a Project** - Apply these concepts to a real use case
2. **Study the Reference Agent** - See production implementation
3. **Explore Advanced Topics** - LangGraph, multi-agent systems, observability
4. **Join the Community** - Share your work, get feedback, help others

### üôè Thank You!

Thank you for completing the Context Engineering course! You've built a strong foundation in:
- Context engineering principles and best practices
- RAG systems and semantic search
- Memory architectures and compression
- Agent design and tool integration
- Production patterns and scalability

**You're now equipped to build sophisticated, context-aware AI systems that solve real-world problems.**

Keep building, keep learning, and keep pushing the boundaries of what's possible with context engineering! üöÄ

---

**üéâ Congratulations on completing the Context Engineering course!** üéâ


---

## üìö Additional Resources

### Semantic Search and Embeddings
- [OpenAI Embeddings Guide](https://platform.openai.com/docs/guides/embeddings)
- [Vector Similarity Search](https://redis.io/docs/stack/search/reference/vectors/)
- [Semantic Search Best Practices](https://www.pinecone.io/learn/semantic-search/)

### Tool Selection and Agent Design
- [LangChain Tool Calling](https://python.langchain.com/docs/modules/agents/tools/)
- [Function Calling Best Practices](https://platform.openai.com/docs/guides/function-calling)
- [Agent Design Patterns](https://www.anthropic.com/index/agent-design-patterns)

### Redis Vector Search
- [RedisVL Documentation](https://redisvl.com/)
- [Redis Vector Similarity](https://redis.io/docs/stack/search/reference/vectors/)
- [Hybrid Search with Redis](https://redis.io/docs/stack/search/reference/hybrid-queries/)

### Scaling Agents
- [Scaling LLM Applications](https://www.anthropic.com/index/scaling-llm-applications)
- [Production Agent Patterns](https://www.langchain.com/blog/production-agent-patterns)
- [Cost Optimization for LLM Apps](https://platform.openai.com/docs/guides/production-best-practices)

### Context Engineering and RAG
- [Context Rot Research](https://research.trychroma.com/context-rot) - Research on context quality
- [RAG Best Practices](https://www.anthropic.com/index/contextual-retrieval)
- [LangChain Documentation](https://python.langchain.com/docs/get_started/introduction)

### Production Monitoring and Observability
- [LangSmith](https://www.langchain.com/langsmith) - LangChain's observability platform
- [OpenTelemetry](https://opentelemetry.io/) - Distributed tracing and monitoring
- [Prometheus](https://prometheus.io/) - Metrics and alerting


