# Lesson 5: Hybrid Architectures and Rule-Based Agents

## 🎯 Learning Objectives

By the end of this lesson, you will be able to:

1. **Understand** that "agent" doesn't always mean "LLM"
2. **Create** rule-based agents using pure Python logic (no LLM calls)
3. **Build** hybrid systems combining rules and LLMs intelligently
4. **Optimize** costs by using rules for 70%+ of requests
5. **Implement** LoopAgent for iterative workflows
6. **Design** production-grade cost-effective architectures

## 📚 Quick Recap: Lessons 1-4

So far, you've learned:
- ✅ LLM agents with tools (function calling)
- ✅ Hierarchical routing (coordinator + specialists)
- ✅ Sequential workflows (pipeline processing)
- ✅ Parallel execution (concurrent information gathering)

**All used LLM calls for decisions.** Now we'll learn when NOT to use LLMs!

## 🚀 What's New: Rule-Based and Hybrid Patterns

Not every decision needs an LLM:
- 💰 **Rule-based agents**: $0.00 per request (pure Python logic)
- 🎯 **Hybrid systems**: Rules + LLMs = best of both worlds
- 🔄 **LoopAgent**: Iterative workflows with termination conditions

## 🏢 Use Case: Cost-Optimized IT Support

We'll build a production-grade IT support system that:
- Uses rules for simple, predictable cases (70% of tickets)
- Uses LLMs only when complexity requires it (30% of tickets)
- Saves ~70% on API costs compared to all-LLM approach

---

## 💡 Part 1: When Rules Beat LLMs

### The LLM Tax

Every LLM call costs:
- 💸 **Money**: $0.0002+ per request (even gpt-5-nano)
- ⏱️ **Time**: 500-2000ms latency
- 🎲 **Variance**: Slight inconsistency in outputs

### When Rules Are Better

Use rule-based logic when:

| Scenario | Rule-Based | LLM-Based |
|----------|-----------|----------|
| **Simple keyword matching** | ✅ Perfect | ❌ Overkill |
| **Binary decisions** | ✅ Instant | ❌ Wasteful |
| **Deterministic logic** | ✅ 100% consistent | ❌ Slight variation |
| **High volume** | ✅ Scales cheaply | ❌ Expensive |
| **Speed critical** | ✅ <10ms | ❌ 500ms+ |
| **Complex reasoning** | ❌ Limited | ✅ Excellent |
| **Nuanced understanding** | ❌ Can't handle | ✅ Best use case |

### Real-World Hybrid Examples

**E-commerce:**
- Rules: "out of stock" → auto-response (90% of queries)
- LLM: Complex product comparisons (10% of queries)

**Customer Support:**
- Rules: FAQ matching, business hours check (70%)
- LLM: Complex troubleshooting, empathy required (30%)

**IT Support (our use case):**
- Rules: Simple keyword triage "password", "wifi", "printer" (70%)
- LLM: Ambiguous or complex issues (30%)

### Cost Comparison Example

**10,000 daily tickets:**

**All-LLM approach:**
- 10,000 × 500 tokens × $0.0002/1K = **$1.00/day** = **$365/year**

**Hybrid approach (70% rules, 30% LLM):**
- Rules: 7,000 × $0 = **$0**
- LLM: 3,000 × 500 tokens × $0.0002/1K = **$0.30/day** = **$110/year**
- **Savings: $255/year (70%)**

**The hybrid approach is a production best practice, not just a learning exercise!**

---

## 🔧 Part 2: Environment Setup

In [None]:
# Install the Google Agent Development Kit and dependencies
!pip install -q google-adk litellm openai python-dotenv nest-asyncio

print("✅ Packages installed successfully!")

In [None]:
# Core ADK imports - including BaseAgent for custom agents!
from google.adk.agents import LlmAgent, BaseAgent, SequentialAgent, LoopAgent
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google.adk.models.lite_llm import LiteLlm
from google.genai import types

# System imports
import os
import asyncio
import time
import re
from typing import Dict, List, Any, AsyncGenerator
from pydantic import Field
from datetime import datetime, timezone

print("✅ Imports successful!")
print("   Key import: BaseAgent for creating rule-based agents")

In [None]:
# Configure OpenAI API key
try:
    from google.colab import userdata
    OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')
    print("✅ API key loaded from Colab secrets")
except:
    from getpass import getpass
    print("💡 To use Colab secrets: Go to 🔑 (left sidebar) → Add new secret → Name: OPENAI_API_KEY")
    OPENAI_API_KEY = getpass("Enter your OpenAI API Key: ")

os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

if not OPENAI_API_KEY or OPENAI_API_KEY.strip() == "":
    raise ValueError("❌ ERROR: No API key provided!")

print("✅ Authentication configured!")

# Model configuration
OPENAI_MODEL = "gpt-5-nano"  # For LLM components only

print(f"\n🤖 Model (for LLM agents): {OPENAI_MODEL}")
print(f"💡 Rule-based agents: $0 (no API calls!)")

---

## 🎯 Part 3: Building Rule-Based Agents

### What is a Rule-Based Agent?

A rule-based agent:
- Subclasses `BaseAgent` from ADK
- Implements `_run_async_impl()` with pure Python logic
- Makes **zero LLM calls**
- Uses if/else, regex, keyword matching, etc.
- Returns deterministic, instant results

### Use Case: Keyword-Based Ticket Triage

We'll create an agent that routes tickets based on simple keyword matching:
- "password" → security_team
- "wifi", "network", "internet" → network_team
- "laptop", "computer", "hardware" → hardware_team
- "software", "application", "program" → software_team

No LLM needed for this!

### 3.1: Create Sample Ticket Data

In [None]:
# Sample tickets for testing
SAMPLE_TICKETS = [
    {"id": "T-6001", "description": "I forgot my password and can't log in"},
    {"id": "T-6002", "description": "The WiFi in my office is not working"},
    {"id": "T-6003", "description": "My laptop screen is cracked"},
    {"id": "T-6004", "description": "Microsoft Word keeps crashing when I open documents"},
    {"id": "T-6005", "description": "I need help with something complicated and unusual"},  # Fallback to LLM
    {"id": "T-6006", "description": "Can't connect to company VPN from home"},
    {"id": "T-6007", "description": "Printer is jammed and won't print"},
    {"id": "T-6008", "description": "Excel application won't start"},
]

print("✅ Sample tickets loaded!")
print(f"   Total tickets: {len(SAMPLE_TICKETS)}")

### 3.2: Implement Rule-Based Triage Agent

In [None]:
class RuleBasedTriageAgent(BaseAgent):
    """
    A rule-based agent that routes tickets using keyword matching.
    Makes ZERO LLM calls - pure Python logic.
    Fast, cheap, and deterministic.
    """
    
    routing_rules: Dict[str, List[str]] = Field(default_factory=dict)
    def __init__(self, name: str = "rule_triage"):
        super().__init__(name=name)
        
        # Define keyword rules for each team
        self.routing_rules = {
            "security_team": [
                "password", "login", "access", "credentials", 
                "locked out", "forgot password", "can't log in",
                "authentication", "2fa", "mfa"
            ],
            "network_team": [
                "wifi", "wi-fi", "network", "internet", "connection",
                "vpn", "ethernet", "connectivity", "can't connect",
                "slow internet", "no internet"
            ],
            "hardware_team": [
                "laptop", "computer", "desktop", "monitor", "screen",
                "keyboard", "mouse", "hardware", "device", "printer",
                "physical", "broken", "cracked", "jammed"
            ],
            "software_team": [
                "software", "application", "program", "app",
                "microsoft", "excel", "word", "outlook", "teams",
                "crash", "won't start", "won't open", "freezing"
            ]
        }
    
    async def _run_async_impl(
        self,
        ctx: Any,
    ) -> AsyncGenerator[Any, None]:
        """
        Core logic: Route ticket based on keyword matching.
        This is where the magic happens - no LLM calls!
        """
        # Get the user's message from context
        user_message = ""
        if hasattr(ctx, 'new_message') and ctx.new_message:
            if hasattr(ctx.new_message, 'parts') and ctx.new_message.parts:
                user_message = ctx.new_message.parts[0].text
        
        print(f"\n🔍 [RULE-BASED TRIAGE] Analyzing: {user_message[:60]}...")
        
        # Convert to lowercase for matching
        message_lower = user_message.lower()
        
        # Check each team's keywords
        team_scores = {}
        for team, keywords in self.routing_rules.items():
            score = sum(1 for keyword in keywords if keyword in message_lower)
            if score > 0:
                team_scores[team] = score
        
        # Determine routing
        if team_scores:
            # Route to team with highest score
            best_team = max(team_scores, key=team_scores.get)
            confidence = "high" if team_scores[best_team] >= 2 else "medium"
            
            response = f"""
ROUTING: {best_team}
CONFIDENCE: {confidence}
MATCHED_KEYWORDS: {team_scores[best_team]}
METHOD: Rule-based keyword matching
COST: $0.00 (no LLM call)
TIME: <10ms

This ticket has been automatically routed to {best_team} based on keyword analysis.
            """.strip()
            
            print(f"✅ [RULE-BASED] Routed to: {best_team} (confidence: {confidence})")
        else:
            # No clear match - escalate to LLM
            response = f"""
ROUTING: escalate_to_llm
CONFIDENCE: low
MATCHED_KEYWORDS: 0
METHOD: No keyword matches found
COST: $0.00 (rule check only)
TIME: <10ms

This ticket requires LLM analysis - no clear keyword matches found.
Escalating to intelligent triage system.
            """.strip()
            
            print(f"⚠️  [RULE-BASED] No clear match - escalating to LLM")
        
        # Yield the response as an event
        response_content = types.Content(
            role='model',
            parts=[types.Part(text=response)]
        )
        
        # Create a simple event object
        class SimpleEvent:
            def __init__(self, content):
                self.content = content
                self.partial = False
                self.timestamp = datetime.now(timezone.utc)
                class SimpleActions:
                    def __init__(self):
                        self.state_delta = {}
                self.actions = SimpleActions()
            
            def is_final_response(self):
                return True
        
        yield SimpleEvent(response_content)

print("✅ Rule-Based Triage Agent created!")
print("   Cost per routing: $0.00")
print("   Speed: <10ms")
print("   Deterministic: 100% consistent")

### 3.3: Test the Rule-Based Agent

In [None]:
# Create the rule-based triage agent
rule_triage = RuleBasedTriageAgent(name="keyword_triage")

# Setup runner
rule_session_service = InMemorySessionService()
RULE_APP = "rule_triage_app"

rule_runner = Runner(
    app_name=RULE_APP,
    agent=rule_triage,
    session_service=rule_session_service
)

print("✅ Rule-based system initialized!")

In [None]:
# Helper function to test rule-based routing
_rule_sessions = set()

async def test_rule_routing_async(ticket: Dict, session_id: str = None):
    """Test rule-based routing for a ticket."""
    if session_id is None:
        session_id = f"rule_session_{ticket['id']}"
    
    user_id = "rule_system"
    
    # Create session
    session_key = (session_id, user_id)
    if session_key not in _rule_sessions:
        await rule_session_service.create_session(
            app_name=RULE_APP,
            user_id=user_id,
            session_id=session_id,
            state={}
        )
        _rule_sessions.add(session_key)
    
    content = types.Content(role='user', parts=[types.Part(text=ticket['description'])])
    
    print(f"\n{'='*80}")
    print(f"🎫 TICKET {ticket['id']}")
    print(f"{'='*80}")
    print(f"Description: {ticket['description']}")
    
    start_time = time.time()
    
    events = rule_runner.run_async(user_id=user_id, session_id=session_id, new_message=content)
    
    final_response = None
    async for event in events:
        if event.is_final_response():
            final_response = event.content.parts[0].text
    
    elapsed_time = (time.time() - start_time) * 1000  # Convert to ms
    
    if final_response:
        print(f"\n📊 ROUTING RESULT:")
        print(f"{'-'*80}")
        print(final_response)
        print(f"{'-'*80}")
        print(f"⏱️  Processing time: {elapsed_time:.1f}ms")
        print(f"{'='*80}\n")
    
    return final_response

def test_rule_routing(ticket: Dict, session_id: str = None):
    """Synchronous wrapper."""
    try:
        loop = asyncio.get_running_loop()
        import nest_asyncio
        nest_asyncio.apply()
        return asyncio.run(test_rule_routing_async(ticket, session_id))
    except RuntimeError:
        return asyncio.run(test_rule_routing_async(ticket, session_id))

print("✅ Test function ready!")

### 3.4: Run Rule-Based Routing Tests

In [None]:
# Test with various tickets
for ticket in SAMPLE_TICKETS[:4]:  # Test first 4
    test_rule_routing(ticket)

### 3.5: Rule-Based Agent Key Observations

**What you just saw:**

1. ✅ **Zero LLM calls**: Pure Python keyword matching
2. ✅ **Instant results**: <10ms response time
3. ✅ **$0 cost**: No API charges whatsoever
4. ✅ **100% deterministic**: Same input → same output always
5. ✅ **Transparent logic**: You can see exactly why decisions were made
6. ✅ **Escalation to LLM**: Handles edge cases gracefully

**When to Use Rule-Based Agents:**
- Clear, predictable patterns (keywords, thresholds)
- High-volume, low-complexity decisions
- Cost optimization is critical
- Speed is essential (real-time requirements)
- Determinism is required (compliance, auditing)

---

## 🎨 Part 4: Hybrid Architecture - Best of Both Worlds

### The Hybrid Strategy

```
            User Ticket
                |
                ▼
        ┌───────────────┐
        │ Rule-Based    │  ← Fast, free triage
        │ Triage Agent  │     (70% of tickets)
        └───────┬───────┘
                │
        ┌───────┴────────┐
        │                │
        ▼                ▼
    Clear match?    No match?
        │                │
        ▼                ▼
    Route to      LLM-Based
    Team ($0)     Analysis
                  ($0.0002)
```

### Use Case: Production IT Support System

Let's build a complete hybrid system:
1. **Rule triage** catches 70% of tickets (instant, free)
2. **LLM fallback** handles complex cases (30% of tickets)
3. **Specialist agents** resolve the issues

This is how production systems actually work!

### 4.1: Create LLM Fallback Agent

In [None]:
# LLM-based triage for complex cases
llm_triage = LlmAgent(
    model=LiteLlm(model=f"openai/{OPENAI_MODEL}"),
    name="llm_triage",
    instruction="""
    You are an intelligent IT support triage agent.
    
    YOUR TASK:
    Analyze the ticket and determine which team should handle it:
    - security_team: Password, authentication, access control issues
    - network_team: Internet, WiFi, VPN, connectivity issues
    - hardware_team: Physical devices, broken equipment
    - software_team: Applications, programs, software crashes
    
    OUTPUT FORMAT:
    ROUTING: [team_name]
    CONFIDENCE: [high/medium/low]
    REASONING: [brief explanation]
    METHOD: LLM-based analysis
    
    Analyze the ticket carefully and use your reasoning to make the best routing decision.
    """
)

print("✅ LLM Triage Agent created!")
print(f"   Model: {OPENAI_MODEL}")
print(f"   Cost: ~$0.0002 per routing")
print(f"   Use: Complex/ambiguous cases only")

### 4.2: Build Hybrid Triage System

In [None]:
class HybridTriageAgent(BaseAgent):
    """
    Hybrid agent that tries rules first, falls back to LLM.
    Optimizes for cost and speed.
    """
    
    def __init__(self, rule_agent: BaseAgent, llm_agent: LlmAgent, name: str = "hybrid_triage"):
        super().__init__(name=name, sub_agents=[rule_agent, llm_agent])
        self.rule_agent = rule_agent
        self.llm_agent = llm_agent
        self.stats = {"rule_count": 0, "llm_count": 0}
    
    async def _run_async_impl(self, ctx: Any) -> AsyncGenerator[Any, None]:
        """
        Try rules first. If no match, use LLM.
        """
        print(f"\n🔀 [HYBRID] Attempting rule-based triage first...")
        
        # Try rule-based first
        rule_response = None
        async for event in self.rule_agent._run_async_impl(ctx):
            if event.is_final_response():
                rule_response = event.content.parts[0].text
        
        # Check if rule found a match
        if rule_response and "escalate_to_llm" not in rule_response:
            # Rule worked! Use it.
            self.stats["rule_count"] += 1
            print(f"✅ [HYBRID] Rule-based routing successful!")
            print(f"   Cost: $0.00 | Stats: {self.stats['rule_count']} rule, {self.stats['llm_count']} LLM")
            
            response_content = types.Content(
                role='model',
                parts=[types.Part(text=f"[RULE-BASED ROUTING]\n{rule_response}")]
            )
        else:
            # Need LLM
            self.stats["llm_count"] += 1
            print(f"⚡ [HYBRID] Escalating to LLM...")
            
            # Run LLM agent
            llm_response = None
            # Create a runner for the LLM agent
            llm_session_service = InMemorySessionService()
            await llm_session_service.create_session(
                app_name="llm_fallback",
                user_id="hybrid_system",
                session_id="llm_session",
                state={}
            )
            llm_runner = Runner(
                app_name="llm_fallback",
                agent=self.llm_agent,
                session_service=llm_session_service
            )
            
            events = llm_runner.run_async(
                user_id="hybrid_system",
                session_id="llm_session",
                new_message=ctx.new_message
            )
            
            async for event in events:
                if event.is_final_response():
                    llm_response = event.content.parts[0].text
            
            print(f"✅ [HYBRID] LLM routing complete!")
            print(f"   Cost: ~$0.0002 | Stats: {self.stats['rule_count']} rule, {self.stats['llm_count']} LLM")
            
            response_content = types.Content(
                role='model',
                parts=[types.Part(text=f"[LLM-BASED ROUTING]\n{llm_response}")]
            )
        
        # Create event
        class SimpleEvent:
            def __init__(self, content):
                self.content = content
                self.partial = False
                self.timestamp = datetime.now(timezone.utc)
                class SimpleActions:
                    def __init__(self):
                        self.state_delta = {}
                self.actions = SimpleActions()
            def is_final_response(self):
                return True
        
        yield SimpleEvent(response_content)

print("✅ Hybrid Triage Agent class created!")

In [None]:
# Create hybrid system
hybrid_triage = HybridTriageAgent(
    rule_agent=rule_triage,
    llm_agent=llm_triage,
    name="cost_optimized_triage"
)

# Setup runner
hybrid_session_service = InMemorySessionService()
HYBRID_APP = "hybrid_triage_app"

hybrid_runner = Runner(
    app_name=HYBRID_APP,
    agent=hybrid_triage,
    session_service=hybrid_session_service
)

print("✅ Hybrid System initialized!")
print("   Strategy: Try rules first, LLM fallback")
print("   Expected: 70% rule-based, 30% LLM")

### 4.3: Test Hybrid System

In [None]:
# Helper for testing hybrid system
_hybrid_sessions = set()

async def test_hybrid_routing_async(ticket: Dict, session_id: str = None):
    """Test hybrid routing."""
    if session_id is None:
        session_id = f"hybrid_session_{ticket['id']}"
    
    user_id = "hybrid_system"
    
    session_key = (session_id, user_id)
    if session_key not in _hybrid_sessions:
        await hybrid_session_service.create_session(
            app_name=HYBRID_APP,
            user_id=user_id,
            session_id=session_id,
            state={}
        )
        _hybrid_sessions.add(session_key)
    
    content = types.Content(role='user', parts=[types.Part(text=ticket['description'])])
    
    print(f"\n{'='*80}")
    print(f"🎫 TICKET {ticket['id']}")
    print(f"{'='*80}")
    print(f"Description: {ticket['description']}")
    
    start_time = time.time()
    
    events = hybrid_runner.run_async(user_id=user_id, session_id=session_id, new_message=content)
    
    final_response = None
    async for event in events:
        if event.is_final_response():
            final_response = event.content.parts[0].text
    
    elapsed_time = (time.time() - start_time) * 1000
    
    if final_response:
        print(f"\n📊 ROUTING RESULT:")
        print(f"{'-'*80}")
        print(final_response)
        print(f"{'-'*80}")
        print(f"⏱️  Total time: {elapsed_time:.1f}ms")
        print(f"{'='*80}\n")
    
    return final_response

def test_hybrid_routing(ticket: Dict, session_id: str = None):
    """Synchronous wrapper."""
    try:
        loop = asyncio.get_running_loop()
        import nest_asyncio
        nest_asyncio.apply()
        return asyncio.run(test_hybrid_routing_async(ticket, session_id))
    except RuntimeError:
        return asyncio.run(test_hybrid_routing_async(ticket, session_id))

print("✅ Hybrid test function ready!")

In [None]:
# Test all tickets with hybrid system
print("Testing all 8 tickets with hybrid triage...\n")

for ticket in SAMPLE_TICKETS:
    test_hybrid_routing(ticket)

# Show final statistics
print(f"\n{'='*80}")
print(f"📊 HYBRID SYSTEM PERFORMANCE")
print(f"{'='*80}")
print(f"Rule-based routings: {hybrid_triage.stats['rule_count']} ({hybrid_triage.stats['rule_count']/len(SAMPLE_TICKETS)*100:.0f}%)")
print(f"LLM-based routings: {hybrid_triage.stats['llm_count']} ({hybrid_triage.stats['llm_count']/len(SAMPLE_TICKETS)*100:.0f}%)")
print(f"\nCost analysis (per ticket):")
print(f"  Rule-based: $0.00")
print(f"  LLM-based: ~$0.0002")
print(f"  Average: ${(hybrid_triage.stats['llm_count']/len(SAMPLE_TICKETS))*0.0002:.6f}")
print(f"\nVs. all-LLM approach: ${len(SAMPLE_TICKETS)*0.0002:.4f}")
savings = (1 - (hybrid_triage.stats['llm_count']/len(SAMPLE_TICKETS))) * 100
print(f"Cost savings: {savings:.0f}%")
print(f"{'='*80}\n")

### 4.4: Hybrid Architecture Key Observations

**What you just saw:**

1. ✅ **Intelligent routing**: Rules handle ~70%, LLM handles ~30%
2. ✅ **Cost optimization**: Significant savings vs all-LLM approach
3. ✅ **Speed optimization**: Rules are instant, LLM only when needed
4. ✅ **Best of both worlds**: Determinism + Intelligence
5. ✅ **Transparent metrics**: See exactly what's using LLM calls

**Production Scaling:**

For 10,000 daily tickets:
- All-LLM: 10,000 × $0.0002 = **$2.00/day** = $730/year
- Hybrid (70/30): 3,000 × $0.0002 = **$0.60/day** = $219/year
- **Annual savings: $511 (70%)**

At scale, this matters!

---

## 🔄 Part 5: LoopAgent for Iterative Workflows

### What is LoopAgent?

LoopAgent repeats a workflow until:
- Max iterations reached, OR
- Break condition met

### Use Case: Iterative Troubleshooting

IT support often requires iteration:
1. Try solution A
2. Check if problem solved
3. If not, try solution B
4. Repeat until fixed or max attempts

Let's build this!

### 5.1: Create Troubleshooting Agents

In [None]:
# Agent that suggests solutions
solution_suggester = LlmAgent(
    model=LiteLlm(model=f"openai/{OPENAI_MODEL}"),
    name="solution_suggester",
    instruction="""
    You are an IT troubleshooting expert.
    
    Given a problem description, suggest ONE specific troubleshooting step.
    Each iteration, suggest a different approach if previous didn't work.
    
    OUTPUT FORMAT:
    STEP: [step number]
    ACTION: [specific action to try]
    EXPECTED RESULT: [what should happen if this works]
    
    Common progression:
    1. Simple restart/reconnect
    2. Check settings/configuration
    3. Update drivers/software
    4. Advanced troubleshooting
    5. Escalate to specialist
    
    Be specific and actionable.
    """
)

# Agent that checks if problem is solved
solution_checker = LlmAgent(
    model=LiteLlm(model=f"openai/{OPENAI_MODEL}"),
    name="solution_checker",
    instruction="""
    You are evaluating if a troubleshooting step resolved the issue.
    
    For this simulation, use logic:
    - Step 1-2: Usually don't solve complex issues (continue)
    - Step 3: Often solves the problem (can stop)
    - Step 4+: Definitely solved or needs escalation
    
    OUTPUT FORMAT:
    SOLVED: [yes/no]
    REASONING: [why you think it's solved or not]
    RECOMMENDATION: [stop/continue/escalate]
    
    If you output SOLVED: yes, the loop should terminate.
    """
)

print("✅ Troubleshooting agents created!")

### 5.2: Create LoopAgent with Termination

In [None]:
# Create a loop agent for iterative troubleshooting
# Note: LoopAgent continues until max_iterations or manual break

troubleshooting_loop = LoopAgent(
    name="troubleshooting_loop",
    sub_agents=[
        solution_suggester,
        solution_checker
    ],
    max_iterations=5  # Safety limit
)

print("✅ LoopAgent created!")
print(f"   Sub-agents: {len(troubleshooting_loop.sub_agents)}")
print(f"   Max iterations: 5")
print(f"   Strategy: Suggest → Check → Repeat")

### 5.3: Test LoopAgent

In [None]:
# Setup runner for loop agent
loop_session_service = InMemorySessionService()
LOOP_APP = "loop_troubleshooting_app"

loop_runner = Runner(
    app_name=LOOP_APP,
    agent=troubleshooting_loop,
    session_service=loop_session_service
)

print("✅ LoopAgent runner initialized!")

In [None]:
# Test the loop agent
async def test_loop_troubleshooting_async(problem: str):
    """Test iterative troubleshooting."""
    session_id = "loop_test_session"
    user_id = "loop_user"
    
    await loop_session_service.create_session(
        app_name=LOOP_APP,
        user_id=user_id,
        session_id=session_id,
        state={}
    )
    
    content = types.Content(role='user', parts=[types.Part(text=problem)])
    
    print(f"\n{'='*80}")
    print(f"🔄 ITERATIVE TROUBLESHOOTING")
    print(f"{'='*80}")
    print(f"Problem: {problem}")
    print(f"\n⏳ Starting iterative loop (max 5 iterations)...\n")
    
    events = loop_runner.run_async(user_id=user_id, session_id=session_id, new_message=content)
    
    iteration = 0
    final_response = None
    
    async for event in events:
        if event.is_final_response():
            final_response = event.content.parts[0].text
    
    if final_response:
        print(f"\n📊 TROUBLESHOOTING COMPLETE:")
        print(f"{'-'*80}")
        print(final_response)
        print(f"{'-'*80}")
        print(f"\n✅ Loop terminated")
        print(f"{'='*80}\n")
    
    return final_response

def test_loop_troubleshooting(problem: str):
    """Synchronous wrapper."""
    try:
        loop = asyncio.get_running_loop()
        import nest_asyncio
        nest_asyncio.apply()
        return asyncio.run(test_loop_troubleshooting_async(problem))
    except RuntimeError:
        return asyncio.run(test_loop_troubleshooting_async(problem))

print("✅ Loop test function ready!")

In [None]:
# Test with a WiFi problem
test_loop_troubleshooting("WiFi connection keeps dropping every few minutes")

### 5.4: LoopAgent Key Observations

**What you just saw:**

1. ✅ **Iterative execution**: Agents run multiple times in sequence
2. ✅ **Max iterations**: Safety mechanism prevents infinite loops
3. ✅ **Stateful iteration**: Each iteration can build on previous
4. ✅ **Termination conditions**: Can stop early if goal achieved
5. ✅ **Real-world pattern**: Common in troubleshooting, optimization, refinement

**When to Use LoopAgent:**
- Iterative refinement (code review cycles, document editing)
- Trial-and-error troubleshooting
- Optimization problems (keep trying until good enough)
- Multi-step verification (try, check, retry if needed)

**Cost Consideration:**
- Each iteration = 2 LLM calls (suggester + checker)
- 5 iterations = 10 LLM calls total
- Set max_iterations appropriately for your use case

---

## 🎓 Part 6: Student Exercises

### Exercise 1: Enhance Rule-Based Triage (Intermediate)

**Task:** Add priority detection to the rule-based agent.

**Requirements:**
1. Detect urgency keywords: "urgent", "asap", "critical", "emergency"
2. Assign priority: critical/high/medium/low
3. Include priority in routing decision
4. Still maintain $0 cost (no LLM calls)

**Hint:** Add another rules dictionary for priority keywords.

In [None]:
# Exercise 1: Your code here

# TODO: Extend RuleBasedTriageAgent with priority detection
# class EnhancedRuleBasedTriageAgent(BaseAgent):
#     def __init__(self, name: str = "enhanced_rule_triage"):
#         super().__init__(name=name)
#         # Add priority rules
#         self.priority_rules = {
#             "critical": ["urgent", "emergency", ...],
#             ...
#         }
#     ...

# TODO: Test with tickets containing urgency keywords

### Exercise 2: Cost Analysis Tool (Beginner)

**Task:** Create a function that calculates cost savings for different hybrid ratios.

**Requirements:**
1. Function takes: daily_tickets, rule_percentage
2. Calculates: all-LLM cost vs hybrid cost
3. Shows: daily, monthly, annual savings
4. Compare: 50%, 70%, 90% rule coverage

**Goal:** Understand the economics of hybrid architectures.

In [None]:
# Exercise 2: Your code here

def calculate_hybrid_savings(daily_tickets: int, rule_percentage: float):
    """
    Calculate cost savings of hybrid vs all-LLM approach.
    
    Args:
        daily_tickets: Number of tickets per day
        rule_percentage: Percentage handled by rules (0.0 to 1.0)
    """
    # TODO: Implement cost calculation
    # LLM cost per request: $0.0002
    # Rule cost per request: $0.00
    pass

# TODO: Test with different scenarios
# calculate_hybrid_savings(10000, 0.70)
# calculate_hybrid_savings(10000, 0.90)

### Exercise 3: Build Custom Loop Logic (Advanced)

**Task:** Create a custom agent that implements retry logic manually.

**Requirements:**
1. Subclass BaseAgent
2. Implement custom retry logic in _run_async_impl
3. Try an action up to 3 times
4. Track success/failure
5. Stop on first success

**Challenge:** Implement this without using LoopAgent.

In [None]:
# Exercise 3: Your code here

# class CustomRetryAgent(BaseAgent):
#     def __init__(self, action_agent: LlmAgent, max_retries: int = 3, name: str = "retry_agent"):
#         super().__init__(name=name, sub_agents=[action_agent])
#         self.action_agent = action_agent
#         self.max_retries = max_retries
#     
#     async def _run_async_impl(self, ctx: Any) -> AsyncGenerator[Any, None]:
#         # TODO: Implement retry logic
#         for attempt in range(self.max_retries):
#             # Try action
#             # Check if successful
#             # If success, break
#             # Otherwise, retry
#         ...

# TODO: Test the retry logic

---

## 📋 Part 7: Design Principles - Choosing the Right Approach

### Decision Tree

```
Is the logic deterministic and rule-based?
├─ YES → Use rule-based agent ($0, <10ms)
└─ NO → Is it a simple decision?
    ├─ YES → Try hybrid (rules first, LLM fallback)
    └─ NO → Is complex reasoning required?
        ├─ YES → Use LLM agent
        └─ NO → Is iteration needed?
            ├─ YES → Use LoopAgent
            └─ NO → Use appropriate workflow pattern
```

### Pattern Selection Matrix

| Pattern | Cost | Speed | Use When | Example |
|---------|------|-------|----------|--------|
| **Rule-based** | $0 | <10ms | Clear keywords/thresholds | "password" → security |
| **Hybrid** | $0.0001 | 10-500ms | 70% simple, 30% complex | Try rules, fallback LLM |
| **LLM** | $0.0002 | 500ms+ | Nuanced understanding | Complex reasoning |
| **Sequential** | Nx cost | Nx time | Pipeline processing | Classify → Prioritize → Route |
| **Parallel** | Nx cost | 1x time | Independent searches | Multi-source research |
| **Loop** | Iteration × cost | Iteration × time | Trial-and-error | Troubleshooting steps |
| **Hierarchical** | 1-2x cost | Variable | Complex triage | Route to specialist |

### Cost Optimization Strategies

**Layer 1: Rule-Based Filtering (70%)**
- Handle obvious cases with keywords
- Instant, free, deterministic
- Example: FAQ matching, status checks

**Layer 2: Hybrid Triage (20%)**
- Rules couldn't decide confidently
- Use cheap model (gpt-5-nano) for classification
- Route to appropriate specialist

**Layer 3: LLM Specialists (10%)**
- Complex reasoning required
- Use appropriate model for complexity
- May use tools, multiple steps

**Result: ~70-90% cost reduction vs all-LLM**

### Production Best Practices

1. **Start with rules**: Can you solve it without LLM?
2. **Measure everything**: Track rule hit rate, LLM usage
3. **Iterate on rules**: As patterns emerge, add more rules
4. **Right-size models**: Don't use gpt-4o for simple tasks
5. **Cache when possible**: Identical queries = cached responses
6. **Set limits**: max_iterations, timeouts, fallbacks
7. **Monitor costs**: Alert on unexpected LLM usage spikes

---

## 🎯 Part 8: Key Takeaways

Congratulations! You've learned advanced production patterns!

### What You Learned Today ✅

1. **Rule-Based Agents**
   - Subclass BaseAgent for custom logic
   - $0 cost, <10ms speed
   - Perfect for deterministic decisions
   - 100% consistent and debuggable

2. **Hybrid Architectures**
   - Combine rules + LLMs intelligently
   - 70-90% cost reduction possible
   - Rules handle simple, LLM handles complex
   - Production best practice, not theory

3. **LoopAgent Pattern**
   - Iterative workflows with max_iterations
   - Useful for troubleshooting, refinement
   - Built-in safety mechanisms
   - Stateful iteration support

4. **Design Principles**
   - "Agent" ≠ "LLM" always
   - Right tool for the job
   - Cost-awareness in design
   - Measure and optimize

### Production Architecture Example

```
                    User Request
                         |
                         ▼
                ┌─────────────────┐
                │  Rule-Based     │  70% handled
                │  Triage         │  Cost: $0
                └────────┬────────┘
                         │
                ┌────────┴────────┐
                │                 │
           Clear match       No match
                │                 │
                ▼                 ▼
         Route to Team    LLM Triage  30% handled
         Cost: $0         Cost: $0.0002
                                  │
                                  ▼
                          Specialist Agent
                          Cost: $0.0002-0.0010

Result: 70% savings, same quality
```

### Real-World Impact

**Startup (1,000 daily tickets):**
- All-LLM: $73/year
- Hybrid: $22/year
- Savings: $51/year

**Scale-up (10,000 daily tickets):**
- All-LLM: $730/year
- Hybrid: $219/year
- Savings: $511/year

**Enterprise (100,000 daily tickets):**
- All-LLM: $7,300/year
- Hybrid: $2,190/year
- Savings: $5,110/year

**At scale, architecture matters!**

### Complete ADK Pattern Library 🎨

You now know:
1. **Basic agents** (Lesson 1): LLM agents with personality
2. **Tools** (Lesson 2): Function calling
3. **Hierarchical routing** (Lesson 3): Coordinator + specialists
4. **Sequential workflows** (Lesson 4): Pipeline processing
5. **Parallel execution** (Lesson 4): Concurrent operations
6. **Rule-based agents** (Lesson 5): Zero-cost deterministic logic
7. **Hybrid systems** (Lesson 5): Cost-optimized architectures
8. **Loop patterns** (Lesson 5): Iterative workflows

### Next Steps 🚀

You're ready to build production systems! Consider:
- Persistent state (Firestore)
- Error handling and retries
- Monitoring and observability
- MCP integration
- Production deployment

### Resources 📚

- [ADK Custom Agents](https://google.github.io/adk-docs/agents/custom-agents/)
- [ADK LoopAgent](https://google.github.io/adk-docs/agents/workflow-agents/loop-agents/)
- [ADK Documentation](https://google.github.io/adk-docs/)

---

### 🎓 Final Challenge

Build a complete production IT support system that:
1. Uses rule-based triage for 70%+ of tickets
2. Falls back to LLM for complex cases
3. Routes to appropriate specialist agents
4. Implements retry logic for failed actions
5. Tracks and reports cost metrics

**You now have all the tools to build cost-effective, production-grade AI systems! 🎉**

---