# Testing ADK Agents: From Simple LLMs to Agentic Behavior

**Course:** LLM and Agent Testing - Lesson 2  
**Domain:** IT Support/Services Company  
**Environment:** Google Colab

---

## 📚 Learning Objectives

By the end of this notebook, you will be able to:

1. Build testable agents using Google's ADK (Agent Development Kit)
2. Test agent tool selection (which tools the agent chooses to call)
3. Test tool parameters (accuracy of extracted information)
4. Test multi-step reasoning (tool call sequences)
5. Test edge cases (invalid inputs, ambiguous requests)

---

## 🎯 Context: Why Agent Testing is Different

### Quick Recap: Notebook 01

In the previous lesson, you learned to:
- ✅ Write automated tests with pytest
- ✅ Test LLM text outputs for factual correctness
- ✅ Validate structured outputs with Pydantic
- ✅ Use parameterized tests

### What's Different with Agents?

**Simple LLM (Notebook 01):**
```
User: "What port does SSH use?"
LLM: "Port 22"
Test: Check if response contains "22" ✅
```

**Agent with Tools (This Notebook):**
```
User: "What's the status of ticket #5678?"
Agent: Thinks... I need to look up this ticket
       Calls: lookup_ticket("5678")
       Tool returns: {ticket_id: 5678, status: "In Progress", ...}
       Agent: "Ticket #5678 is currently In Progress..."
       
Test: Did agent call the right tool? ✅
Test: Did agent extract correct ticket ID? ✅
Test: Did agent use the tool results properly? ✅
```

### Key Differences

| Simple LLM Testing | Agent Testing |
|-------------------|---------------|
| Test final text output | Test tool selection & parameters |
| Single-step response | Multi-step reasoning |
| Stateless | Stateful (tool results affect next steps) |
| Straightforward assertions | Test tool call sequences |

### What We're Building Today

An **IT Support Agent** with these tools:
- 🎫 `lookup_ticket(ticket_id)` - Retrieve ticket details
- 📚 `search_knowledge_base(query)` - Find help articles
- 🔍 `check_system_status(service)` - Check if systems are up

**Testing scenarios:**
- User asks about a ticket → Agent calls `lookup_ticket` with correct ID
- User has a problem → Agent searches KB for solution
- User asks multi-step question → Agent calls multiple tools in order

---

Let's get started! 🚀

## 1. Environment Setup

First, we'll install the required packages.

In [None]:
# Install required packages
!pip install -q google-adk litellm openai python-dotenv nest-asyncio deprecated google-genai pydantic

In [None]:
# Import required libraries
import os
from openai import OpenAI
from pydantic import BaseModel, Field
import json
import asyncio
from typing import List, Optional, Dict, Any
import nest_asyncio

# Enable nested event loops (required for Colab)
nest_asyncio.apply()

# Core ADK imports
from google.adk.agents import LlmAgent
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google.adk.models.lite_llm import LiteLlm
from google.genai import types

print("✅ All imports successful!")

### API Key Setup

In [None]:
# Configure OpenAI API key
try:
    from google.colab import userdata
    OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')
    print("✅ API key loaded from Colab secrets")
except:
    from getpass import getpass
    print("💡 To use Colab secrets: Go to 🔑 (left sidebar) → Add new secret → Name: OPENAI_API_KEY")
    OPENAI_API_KEY = getpass("Enter your OpenAI API Key: ")

os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

if not OPENAI_API_KEY or OPENAI_API_KEY.strip() == "":
    raise ValueError("❌ ERROR: No API key provided!")

print("✅ Authentication configured!")

# Model configuration
OPENAI_MODEL = "gpt-5-nano"

# Initialize OpenAI client
client = OpenAI(api_key=OPENAI_API_KEY)

## 2. Creating a Testable IT Support Agent

Let's build a simple IT support agent with three tools. These tools will return mock data for fast testing.

### Understanding ADK Tool Structure

An ADK tool consists of:
1. **Function** - The actual code that runs
2. **Tool definition** - Describes the function to the agent
3. **Registration** - Adding the tool to the agent

In [None]:
# Tool 1: Lookup Ticket
def lookup_ticket(ticket_id: str) -> dict:
    """
    Look up details for a support ticket.
    
    Args:
        ticket_id: The ticket ID to look up (e.g., "5678")
    
    Returns:
        Dictionary with ticket details
    """
    mock_tickets = {
        "5678": {
            "ticket_id": "5678",
            "status": "In Progress",
            "priority": "High",
            "user": "Alice Johnson",
            "issue": "Cannot access email",
            "assigned_to": "Tech Support Team"
        },
        "1234": {
            "ticket_id": "1234",
            "status": "Resolved",
            "priority": "Medium",
            "user": "Bob Smith",
            "issue": "Printer not working",
            "assigned_to": "Hardware Team"
        },
        "9999": {
            "ticket_id": "9999",
            "status": "Open",
            "priority": "Critical",
            "user": "Charlie Brown",
            "issue": "Server down",
            "assigned_to": "Infrastructure Team"
        }
    }
    
    if ticket_id in mock_tickets:
        return mock_tickets[ticket_id]
    else:
        return {"error": f"Ticket {ticket_id} not found"}


# Tool 2: Search Knowledge Base
def search_knowledge_base(query: str) -> dict:
    """
    Search the IT knowledge base for help articles.
    
    Args:
        query: Search query (e.g., "how to reset password")
    
    Returns:
        Dictionary with search results
    """
    mock_kb = {
        "password": [
            {"title": "How to Reset Your Password", "article_id": "KB001"},
            {"title": "Password Requirements", "article_id": "KB002"}
        ],
        "email": [
            {"title": "Troubleshooting Email Access", "article_id": "KB010"},
            {"title": "Email Configuration Guide", "article_id": "KB011"}
        ],
        "vpn": [
            {"title": "VPN Setup Instructions", "article_id": "KB020"},
            {"title": "VPN Connection Issues", "article_id": "KB021"}
        ],
        "printer": [
            {"title": "Printer Offline Solutions", "article_id": "KB030"},
            {"title": "How to Install Printer Drivers", "article_id": "KB031"}
        ]
    }
    
    query_lower = query.lower()
    results = []
    
    for keyword, articles in mock_kb.items():
        if keyword in query_lower:
            results.extend(articles)
    
    if results:
        return {"query": query, "results": results, "count": len(results)}
    else:
        return {"query": query, "results": [], "count": 0}


# Tool 3: Check System Status
def check_system_status(service_name: str) -> dict:
    """
    Check the operational status of a service or system.
    
    Args:
        service_name: Name of the service (e.g., "email", "vpn", "database")
    
    Returns:
        Dictionary with service status
    """
    mock_status = {
        "email": {"service": "email", "status": "operational", "uptime": "99.9%"},
        "vpn": {"service": "vpn", "status": "operational", "uptime": "100%"},
        "database": {"service": "database", "status": "degraded", "uptime": "95.2%"},
        "file_server": {"service": "file_server", "status": "down", "uptime": "0%"},
        "web_portal": {"service": "web_portal", "status": "operational", "uptime": "99.5%"}
    }
    
    service_lower = service_name.lower().replace(" ", "_")
    
    if service_lower in mock_status:
        return mock_status[service_lower]
    else:
        return {"service": service_name, "status": "unknown"}


print("✅ Tools defined successfully!")

### Creating the ADK Agent

Now we'll create an ADK agent and register our tools.

**Note:** In ADK, you can pass Python functions directly as tools. The agent will automatically understand them based on their docstrings and type hints!

In [None]:
# Create the LiteLlm model instance for OpenAI
llm_model = LiteLlm(
    model=f"openai/{OPENAI_MODEL}",
    api_key=OPENAI_API_KEY
)

# Create the IT Support Agent
it_support_agent = LlmAgent(
    name="it_support_agent",
    model=llm_model,
    description="An IT support agent that helps users with tickets, knowledge base searches, and system status checks",
    instruction="""You are an IT support agent. Help users with their IT issues by:
    1. Looking up ticket information when asked about specific tickets
    2. Searching the knowledge base for solutions to problems
    3. Checking system status when users report service issues
    
    Always use the appropriate tools to get accurate information.""",
    tools=[lookup_ticket, search_knowledge_base, check_system_status]
)

print("✅ IT Support Agent created!")
print(f"Agent has {len(it_support_agent.tools)} tools available")

## 3. Helper Function for Testing

We need a simple way to run the agent and capture tool calls for testing.

In [None]:
async def run_agent_and_get_tools(user_message: str, session_id: str = "test_session"):
    """
    Run the agent and capture tool calls.
    
    Returns:
        dict: {
            'tool_calls': list of {name, parameters},
            'tool_count': int,
            'response': str
        }
    """
    # Create session
    session_service = InMemorySessionService()
    user_id = "test_user"
    
    await session_service.create_session(
        app_name="it_support_test",
        user_id=user_id,
        session_id=session_id,
        state={}
    )
    
    # Create runner
    runner = Runner(
        app_name="it_support_test",
        agent=it_support_agent,
        session_service=session_service
    )
    
    # Format message and run
    content = types.Content(role='user', parts=[types.Part(text=user_message)])
    events = runner.run_async(user_id=user_id, session_id=session_id, new_message=content)
    
    # Collect tool calls and response
    tool_calls = []
    final_response = ""
    
    async for event in events:
        if hasattr(event, 'tool_use') and event.tool_use:
            for tool_use in event.tool_use:
                tool_calls.append({
                    'name': tool_use.name if hasattr(tool_use, 'name') else str(tool_use),
                    'parameters': tool_use.input if hasattr(tool_use, 'input') else {}
                })
        
        if event.is_final_response():
            final_response = event.content.parts[0].text
    
    return {
        'tool_calls': tool_calls,
        'tool_count': len(tool_calls),
        'response': final_response
    }

print("✅ Helper function defined!")

## 4. Testing Tool Selection

The first thing to test: **Does the agent choose the right tool?**

This is fundamental - if the agent picks the wrong tool, nothing else matters!

### Pattern: Test Tool Selection

```python
1. Give agent a task that requires a specific tool
2. Run the agent
3. Assert that the expected tool was called
```

### Test 1: Agent calls lookup_ticket tool

In [None]:
# Test: When user asks about a ticket, agent should call lookup_ticket
user_message = "Can you check the status of ticket 5678?"
result = await run_agent_and_get_tools(user_message, session_id="test_1")

print(f"User: {user_message}")
print(f"\nTools called: {result['tool_count']}")
print(f"Tool names: {[tc['name'] for tc in result['tool_calls']]}")
print(f"\nAgent response: {result['response'][:200]}...\n")

# Assertions
assert result['tool_count'] > 0, "❌ Agent should have called at least one tool"
tool_names = [tc['name'] for tc in result['tool_calls']]
assert 'lookup_ticket' in tool_names, f"❌ Expected 'lookup_ticket', got: {tool_names}"

print("✅ TEST PASSED: Agent correctly called lookup_ticket")

### Test 2: Agent calls knowledge base search

In [None]:
# Test: When user has a problem, agent should search knowledge base
user_message = "How do I reset my password?"
result = await run_agent_and_get_tools(user_message, session_id="test_2")

print(f"User: {user_message}")
print(f"\nTools called: {result['tool_count']}")
print(f"Tool names: {[tc['name'] for tc in result['tool_calls']]}")
print(f"\nAgent response: {result['response'][:200]}...\n")

# Assertions
tool_names = [tc['name'] for tc in result['tool_calls']]
assert 'search_knowledge_base' in tool_names, f"❌ Expected 'search_knowledge_base', got: {tool_names}"

print("✅ TEST PASSED: Agent correctly called search_knowledge_base")

### Test 3: Agent calls system status check

In [None]:
# Test: When user asks about system status, agent should check status
user_message = "Is the email service working?"
result = await run_agent_and_get_tools(user_message, session_id="test_3")

print(f"User: {user_message}")
print(f"\nTools called: {result['tool_count']}")
print(f"Tool names: {[tc['name'] for tc in result['tool_calls']]}")
print(f"\nAgent response: {result['response'][:200]}...\n")

# Assertions
tool_names = [tc['name'] for tc in result['tool_calls']]
assert 'check_system_status' in tool_names, f"❌ Expected 'check_system_status', got: {tool_names}"

print("✅ TEST PASSED: Agent correctly called check_system_status")

### Key Insight: Testing Behavior, Not Text

Notice what we're testing:
- ✅ **We test**: Which tool was called
- ❌ **We don't test**: The exact text of the response

Why? Because:
1. Tool calls are **deterministic** (agent logic)
2. Text responses are **variable** (natural language)
3. Tool calls prove the agent **understood** the task
4. Tool calls are what **actually matter** for functionality

## 5. Testing Tool Parameters

Choosing the right tool is good. But did the agent extract the correct parameters?

**Example:**
- User: "Check ticket 5678"
- Agent calls: `lookup_ticket("5678")` ✅
- Agent calls: `lookup_ticket("1234")` ❌ Wrong ticket!

### Pattern: Test Parameter Extraction

```python
1. Give agent a task with specific information
2. Run the agent
3. Assert the tool was called with correct parameters
```

### Test 4: Agent extracts ticket ID correctly

In [None]:
# Test: Agent correctly extracts ticket ID from user message
user_message = "What's the status of ticket 5678?"
result = await run_agent_and_get_tools(user_message, session_id="test_4")

print(f"User: {user_message}")
print(f"\nTools called: {[tc['name'] for tc in result['tool_calls']]}")

# Find the lookup_ticket call
ticket_calls = [tc for tc in result['tool_calls'] if tc['name'] == 'lookup_ticket']
assert len(ticket_calls) > 0, "❌ Agent should have called lookup_ticket"

# Check the ticket_id parameter
ticket_id = ticket_calls[0]['parameters'].get('ticket_id')
print(f"Extracted ticket_id: {ticket_id}\n")

assert ticket_id == "5678", f"❌ Expected ticket_id '5678', but got: {ticket_id}"

print("✅ TEST PASSED: Agent correctly extracted ticket ID")

### Test 5: Agent extracts search query correctly

In [None]:
# Test: Agent correctly extracts and formats search query
user_message = "I need help with VPN connection issues"
result = await run_agent_and_get_tools(user_message, session_id="test_5")

print(f"User: {user_message}")
print(f"\nTools called: {[tc['name'] for tc in result['tool_calls']]}")

# Find the KB search call
kb_calls = [tc for tc in result['tool_calls'] if tc['name'] == 'search_knowledge_base']
assert len(kb_calls) > 0, "❌ Agent should have called search_knowledge_base"

# Check that query contains relevant keywords
query = kb_calls[0]['parameters'].get('query', '').lower()
print(f"Search query: {query}\n")

assert 'vpn' in query, f"❌ Query should contain 'vpn', but got: {query}"

print("✅ TEST PASSED: Agent correctly extracted search query")

### Test 6: Agent extracts service name correctly

In [None]:
# Test: Agent correctly identifies service name from user query
user_message = "Is the email service operational right now?"
result = await run_agent_and_get_tools(user_message, session_id="test_6")

print(f"User: {user_message}")
print(f"\nTools called: {[tc['name'] for tc in result['tool_calls']]}")

# Find the status check call
status_calls = [tc for tc in result['tool_calls'] if tc['name'] == 'check_system_status']
assert len(status_calls) > 0, "❌ Agent should have called check_system_status"

# Check the service name
service = status_calls[0]['parameters'].get('service_name', '').lower()
print(f"Service name: {service}\n")

assert 'email' in service, f"❌ Service name should contain 'email', but got: {service}"

print("✅ TEST PASSED: Agent correctly extracted service name")

### Why Parameter Testing Matters

Imagine these scenarios:

**Scenario 1: Correct Parameters** ✅
```
User: "Check ticket 5678"
Agent: lookup_ticket("5678") → Returns correct ticket
User: Happy! Gets the right information
```

**Scenario 2: Wrong Parameters** ❌
```
User: "Check ticket 5678"
Agent: lookup_ticket("1234") → Returns wrong ticket
User: Confused! Gets incorrect information
```

**Testing parameters ensures data integrity!**

## 6. Testing Multi-Step Reasoning

Real-world agent tasks often require multiple steps:

**Example:**
```
User: "Check ticket 5678 and find solutions for the issue"

Step 1: Agent calls lookup_ticket("5678")
        Returns: {issue: "Cannot access email"}
        
Step 2: Agent calls search_knowledge_base("email access")
        Returns: [KB articles about email]
        
Step 3: Agent synthesizes information and responds
```

### Pattern: Test Tool Call Sequences

```python
1. Give agent a multi-step task
2. Run the agent
3. Assert on tool count
4. Assert on tool call order
5. Assert on parameter correctness across calls
```

### Test 7: Multi-step reasoning (ticket then KB)

In [None]:
# Test: Agent performs multi-step reasoning
# 1. Looks up ticket to understand the issue
# 2. Searches KB for solutions
user_message = "Check ticket 5678 and help me find solutions for the issue"
result = await run_agent_and_get_tools(user_message, session_id="test_7")

print(f"User: {user_message}")
print(f"\nTools called: {result['tool_count']}")
tool_sequence = [tc['name'] for tc in result['tool_calls']]
print(f"Tool sequence: {tool_sequence}\n")

# Assert multiple tools were called
assert result['tool_count'] >= 2, f"❌ Expected at least 2 tool calls, but got: {result['tool_count']}"

# Assert that both tools were called
assert 'lookup_ticket' in tool_sequence, "❌ Agent should look up the ticket"
assert 'search_knowledge_base' in tool_sequence, "❌ Agent should search KB for solutions"

# Assert correct order: ticket lookup should come before KB search
ticket_index = tool_sequence.index('lookup_ticket')
kb_index = tool_sequence.index('search_knowledge_base')
assert ticket_index < kb_index, f"❌ Agent should look up ticket before searching KB, but order was: {tool_sequence}"

print("✅ TEST PASSED: Agent correctly performed multi-step reasoning")

### Test 8: Single tool when appropriate

In [None]:
# Test: Agent uses single tool when that's all that's needed
# Not every query requires multiple steps!
user_message = "What's the status of ticket 1234?"
result = await run_agent_and_get_tools(user_message, session_id="test_8")

print(f"User: {user_message}")
print(f"\nTools called: {result['tool_count']}")
print(f"Tool names: {[tc['name'] for tc in result['tool_calls']]}\n")

# For this simple query, should only need lookup_ticket
assert result['tool_count'] == 1, f"❌ Simple ticket lookup should use 1 tool, but used: {result['tool_count']}"
assert result['tool_calls'][0]['name'] == 'lookup_ticket', "❌ Should use lookup_ticket"

print("✅ TEST PASSED: Agent correctly used single tool")

### Understanding Tool Call Order

Why does order matter?

**Good Order:**
```
1. lookup_ticket("5678") → Get issue: "email access"
2. search_knowledge_base("email access") → Find relevant articles
3. Provide informed response
```

**Bad Order:**
```
1. search_knowledge_base("unknown") → Generic results
2. lookup_ticket("5678") → Too late, already gave bad advice
```

**Testing ensures logical reasoning flow!**

## 7. Testing Edge Cases

What happens when things go wrong or are unclear?

**Edge cases to test:**
1. ❓ **Ambiguous requests** - "Help me with my problem" (what problem?)
2. ❌ **Invalid data** - "Check ticket XYZ" (invalid ticket ID)
3. 🤷 **No tool needed** - "What is IT support?" (general question)
4. 🔀 **Multiple interpretations** - "Check the email" (ticket or system status?)

### Why Test Edge Cases?

In production:
- Users won't always provide perfect input
- Systems may return errors
- Requests may be vague or ambiguous

**Your agent needs to handle these gracefully!**

### Test 9: General question (no tool needed)

In [None]:
# Test: Agent handles general questions without calling tools
user_message = "What is IT support?"
result = await run_agent_and_get_tools(user_message, session_id="test_9")

print(f"User: {user_message}")
print(f"\nTools called: {result['tool_count']}")
print(f"\nAgent response: {result['response'][:200]}...\n")

# For a general question, agent shouldn't need tools
assert result['tool_count'] == 0, f"❌ General question shouldn't require tools, but {result['tool_count']} tools were called"

# Should still provide a response
assert len(result['response']) > 0, "❌ Agent should provide a response"

print("✅ TEST PASSED: Agent handled general question without tools")

### Test 10: Invalid ticket ID

In [None]:
# Test: Agent attempts to look up non-existent ticket
# The tool will return an error, agent should handle it
user_message = "Check ticket 99999999"
result = await run_agent_and_get_tools(user_message, session_id="test_10")

print(f"User: {user_message}")
print(f"\nTools called: {[tc['name'] for tc in result['tool_calls']]}")
print(f"\nAgent response: {result['response'][:200]}...\n")

# Agent should still try to look up the ticket
tool_names = [tc['name'] for tc in result['tool_calls']]
assert 'lookup_ticket' in tool_names, "❌ Agent should attempt ticket lookup"

# The final response should indicate the ticket wasn't found
response_lower = result['response'].lower()
assert ('not found' in response_lower or 'doesn\'t exist' in response_lower or 
        'invalid' in response_lower or 'error' in response_lower), \
       f"❌ Response should indicate ticket not found"

print("✅ TEST PASSED: Agent handled invalid ticket ID")

### Test 11: Noisy input

In [None]:
# Test: Agent extracts ticket ID from messy/noisy user input
user_message = "Hey, so like, I was wondering, could you maybe check ticket 5678 for me? Thanks!"
result = await run_agent_and_get_tools(user_message, session_id="test_11")

print(f"User: {user_message}")
print(f"\nTools called: {[tc['name'] for tc in result['tool_calls']]}")

# Agent should extract the ticket ID despite the noise
ticket_calls = [tc for tc in result['tool_calls'] if tc['name'] == 'lookup_ticket']
assert len(ticket_calls) > 0, "❌ Agent should extract ticket ID from noisy input"

ticket_id = ticket_calls[0]['parameters'].get('ticket_id')
print(f"Extracted ticket_id: {ticket_id}\n")

assert ticket_id == "5678", f"❌ Expected ticket '5678', but got: {ticket_id}"

print("✅ TEST PASSED: Agent extracted info from noisy input")

### Edge Case Testing Strategy

When testing edge cases, consider:

1. **Invalid inputs** - What if data is malformed?
2. **Missing information** - What if user doesn't provide required details?
3. **Ambiguity** - What if request could mean multiple things?
4. **Error conditions** - What if tools fail or return errors?
5. **Boundary conditions** - What about extreme values or edge values?

**Good agents degrade gracefully, not catastrophically!**

## 8. Student Exercises 🎓

Now it's your turn! Apply what you've learned about testing agents.

### Exercise 1: Test Different Ticket IDs

Write tests for ticket IDs: 1234, 9999. Verify the agent:
1. Calls lookup_ticket
2. Extracts the correct ticket ID
3. Provides appropriate response

In [None]:
# Exercise 1: Your code here
# Test ticket 1234

# Test ticket 9999


### Exercise 2: Test Tool Disambiguation

Write a test where the user says "Check the database". The agent could:
- Call `lookup_ticket` if they think it's a ticket
- Call `check_system_status` if they think it's the service

Which one should the agent choose? Test your hypothesis!

In [None]:
# Exercise 2: Your code here


### Exercise 3: Add a New Tool and Test It

Create a new tool `restart_service(service_name: str)` that simulates restarting an IT service.

**Requirements:**
1. Write the tool function
2. Add it to the agent (you'll need to recreate the agent)
3. Write 2 tests:
   - Test that agent calls restart_service when asked to restart
   - Test that agent extracts correct service name

In [None]:
# Exercise 3: Your code here

def restart_service(service_name: str) -> dict:
    """
    TODO: Implement this function
    Should return a dict with restart status
    """
    pass

# TODO: Recreate agent with the new tool
# TODO: Write tests


### Exercise 4: Test Multiple Tickets

Write a test where the user says "Compare tickets 5678 and 1234".

**Assert:**
1. Agent looks up both tickets
2. Both ticket IDs are correctly extracted

In [None]:
# Exercise 4: Your code here


### Exercise 5: Create a Failing Test

Write a test that you EXPECT to fail. Then explain:
1. Why it fails
2. Is it a bug in the agent or a test problem?
3. How would you fix it?

**Example failing scenarios:**
- Agent calls wrong tool
- Agent extracts wrong parameter
- Agent doesn't handle edge case
- Test is too strict

In [None]:
# Exercise 5: Your failing test here


**Your explanation here:**

<!-- 
TODO: Explain:
- What test did you write?
- Why does it fail?
- Is it agent bug or test bug?
- How to fix?
-->

## 9. Best Practices for Agent Testing

### ✅ DO:

1. **Test behavior, not text** - Focus on tool calls and parameters
2. **Test tool selection first** - Ensure agent picks the right tool
3. **Test parameter accuracy** - Verify extracted information is correct
4. **Test tool call sequences** - Multi-step reasoning matters
5. **Test edge cases** - Invalid input, missing data, ambiguity
6. **Use mock data** - Fast tests with fake/mock tool responses
7. **Test happy path AND failures** - Both success and error cases
8. **Write descriptive test names** - Make it clear what you're testing

### ❌ DON'T:

1. **Don't test only final text** - Tool calls are more important
2. **Don't expect exact tool sequences** - Some variation is OK
3. **Don't skip edge cases** - That's where bugs hide
4. **Don't use real external services** - Slow and unreliable
5. **Don't test too many things in one test** - Keep tests focused

### Testing Hierarchy

**Priority 1: Critical Functionality**
- Does agent call the right tool?
- Does agent extract correct parameters?

**Priority 2: Complex Behavior**
- Multi-step reasoning
- Tool call ordering

**Priority 3: Edge Cases**
- Invalid inputs
- Error handling
- Ambiguous requests

**Priority 4: Output Quality**
- Response helpfulness
- Text clarity
- (This is for next lesson: LLM-as-judge!)

### Agent Testing vs LLM Testing

| Aspect | LLM Testing (Notebook 01) | Agent Testing (This Notebook) |
|--------|---------------------------|-------------------------------|
| **What to test** | Text outputs | Tool calls & parameters |
| **Assertions** | String contains, JSON structure | Tool names, parameter values |
| **Complexity** | Single response | Multi-step sequences |
| **Determinism** | Medium (LLM variability) | High (tool logic) |
| **Focus** | Content correctness | Behavior correctness |

### Common Pitfalls

**Pitfall 1: Over-testing text**
```python
❌ assert "The ticket status is In Progress" == response
✅ assert 'lookup_ticket' in tool_calls
```

**Pitfall 2: Brittle sequence assertions**
```python
❌ assert tool_sequence == ['tool1', 'tool2', 'tool3']  # Too strict
✅ assert 'tool1' in tool_sequence and 'tool2' in tool_sequence  # Flexible
```

**Pitfall 3: Not testing edge cases**
```python
❌ Only test: "Check ticket 5678"
✅ Also test: "Check ticket XYZ", "Check ticket", "Check tickets 1, 2, 3"
```

## 10. Key Takeaways & Next Steps

### 🎉 What You've Learned

1. **Agents are different** - Test behavior (tool calls) not just text
2. **Tool selection matters** - Right tool = right functionality
3. **Parameters must be accurate** - Wrong parameters = wrong results
4. **Multi-step reasoning is testable** - Assert on sequences and order
5. **Edge cases reveal bugs** - Test invalid, ambiguous, and error cases
6. **ADK makes testing easier** - Structured tool calls are inspectable

### 🚀 What You Can Do Now

- ✅ Build testable agents with ADK
- ✅ Test tool selection and parameters
- ✅ Test multi-step agent reasoning
- ✅ Test edge cases and error handling
- ✅ Distinguish between agent testing and LLM testing

### 📚 Testing Levels Completed

**✅ Lesson 1:** Simple LLM Testing
- Text outputs
- Factual correctness
- Structured output validation

**✅ Lesson 2:** Agent Testing (This Lesson)
- Tool selection
- Parameter extraction
- Multi-step reasoning
- Edge cases

**🔜 Lesson 3:** Advanced Testing (Coming Next)
- LLM-as-judge for subjective quality
- Testing response helpfulness
- Testing conversation flow
- Integration testing

### 💡 Real-World Application

You've learned skills applicable to:
- 🎫 Customer support agents
- 📊 Data analysis agents
- 🔧 DevOps automation agents
- 📝 Content generation agents
- 🔍 Research and retrieval agents

---

**Excellent work! You've completed Lesson 2: Testing ADK Agents** 🎓

You can now test both simple LLMs and complex agents with tools. Next, we'll learn advanced techniques for testing subjective quality using LLM-as-judge!