# Chapter 16: Resource-Aware Agent Patterns

Key Takeaways:
- **Cost Optimization**: Choose the right model for the task to minimize costs while maintaining quality.
- **Intelligent Routing**: Analyze query complexity to route requests to appropriate models.
- **Resource Awareness**: Build agents that understand and optimize their resource consumption.

### Heuristic: *Not every query requires the most powerful model. Route intelligently to balance cost and capability.*

## Setup and Initialization

In [None]:
import os
import sys
import asyncio
import nest_asyncio
from dotenv import load_dotenv

# Allow nested event loops for Jupyter
nest_asyncio.apply()

# Add scripts directory to path to import custom modules
PROJECT_ROOT = os.path.dirname(os.getcwd())
SCRIPTS_DIR = os.path.join(PROJECT_ROOT, "scripts")
sys.path.insert(0, SCRIPTS_DIR)

# Load environment variables
load_dotenv()

print("✅ Configuration Loaded")

## 1. The Cost Optimization Problem

Different LLM models have different capabilities and costs:

- **Gemini Flash**: Fast, cost-effective, excellent for simple queries
- **Gemini Thinking**: Advanced reasoning, higher cost, best for complex problems

The challenge: How do we automatically route queries to the most cost-effective model that can handle them?

## 2. Creating Cost-Optimized Agents

Let's create two agents with different cost/capability profiles:

In [None]:
from google.adk.agents import LlmAgent

# Cost-effective agent for simple queries
flash_agent = LlmAgent(
    name="FlashAgent",
    model="gemini-2.0-flash-exp",
    description="A fast and efficient agent for simple queries.",
    instruction="You are a quick assistant for straightforward questions. Provide concise, accurate answers."
)

# Advanced reasoning agent for complex queries
thinking_agent = LlmAgent(
    name="ThinkingAgent",
    model="gemini-2.0-flash-thinking-exp-01-21",
    description="A highly capable agent for complex queries requiring deep reasoning.",
    instruction="You are an expert assistant for complex problem-solving. Think through problems step-by-step."
)

print(f"✅ Created {flash_agent.name} (model: {flash_agent.model})")
print(f"✅ Created {thinking_agent.name} (model: {thinking_agent.model})")

## 3. Query Complexity Analysis

To route intelligently, we need to analyze query complexity. We'll use a simple heuristic based on query length and structure:

In [None]:
from query_router import QueryComplexityAnalyzer

# Create analyzer with a threshold of 20 words
analyzer = QueryComplexityAnalyzer(word_threshold=20)

# Test with different queries
simple_query = "What is the capital of France?"
complex_query = "Explain the implications of quantum entanglement for secure communication systems, including the technical challenges and potential solutions for implementing quantum key distribution at scale."

print("Simple Query Analysis:")
simple_analysis = analyzer.analyze(simple_query)
print(f"  Query: {simple_query}")
print(f"  Complexity: {simple_analysis['complexity']}")
print(f"  Word Count: {simple_analysis['word_count']}")
print(f"  Recommended Model: {simple_analysis['recommended_model']}")
print()

print("Complex Query Analysis:")
complex_analysis = analyzer.analyze(complex_query)
print(f"  Query: {complex_query[:60]}...")
print(f"  Complexity: {complex_analysis['complexity']}")
print(f"  Word Count: {complex_analysis['word_count']}")
print(f"  Recommended Model: {complex_analysis['recommended_model']}")

## 4. The Query Router Pattern

The router analyzes incoming queries and dispatches them to the most appropriate agent:

In [None]:
from query_router import QueryRouter

# Create the router
router = QueryRouter()

print("✅ Query Router initialized")
print(f"   - Flash Agent: {router.flash_agent.name}")
print(f"   - Thinking Agent: {router.thinking_agent.name}")

## 5. Demonstration: Routing in Action

Let's see the router in action with different types of queries:

In [None]:
# Test queries of varying complexity
queries = [
    "What is 2 + 2?",
    "Who wrote Romeo and Juliet?",
    "Explain the Byzantine Generals Problem and its relevance to distributed systems and blockchain consensus mechanisms.",
    "Compare and contrast different approaches to handling state management in modern web applications, including the tradeoffs between centralized and decentralized state."
]

print("Routing Demonstration:\n")
for i, query in enumerate(queries, 1):
    print(f"Query {i}: {query[:60]}{'...' if len(query) > 60 else ''}")
    agent, analysis = router.route(query)
    print(f"   → Selected: {agent.name}")
    print(f"   → Reason: {analysis['complexity']} ({analysis['word_count']} words)")
    print()

## 6. Cost Estimation

Let's estimate the cost savings from intelligent routing:

In [None]:
from query_router import estimate_cost, COST_PER_1K_TOKENS

# Simulated token counts for our queries
# (In production, you'd track actual token usage)
simple_query_tokens = 100  # input + output
complex_query_tokens = 500

print("Cost Comparison:\n")

print("Simple Query (7 words):")
flash_cost = estimate_cost('gemini-2.0-flash-exp', simple_query_tokens, 0)
thinking_cost = estimate_cost('gemini-2.0-flash-thinking-exp-01-21', simple_query_tokens, 0)
print(f"  Flash Model: ${flash_cost:.6f}")
print(f"  Thinking Model: ${thinking_cost:.6f}")
print(f"  Savings: {((thinking_cost - flash_cost) / thinking_cost * 100):.1f}%")
print()

print("Complex Query (27 words):")
flash_cost_complex = estimate_cost('gemini-2.0-flash-exp', complex_query_tokens, 0)
thinking_cost_complex = estimate_cost('gemini-2.0-flash-thinking-exp-01-21', complex_query_tokens, 0)
print(f"  Flash Model: ${flash_cost_complex:.6f}")
print(f"  Thinking Model: ${thinking_cost_complex:.6f}")
print(f"  Note: Thinking model justified for complex reasoning")

## 7. Processing Queries End-to-End

Let's process a query through the entire pipeline (analysis → routing → execution):

In [None]:
# Process a simple query
test_query = "What is the speed of light?"

print(f"Processing query: {test_query}\n")
result = await router.process_query(test_query)

print(f"Query: {result['query']}")
print(f"Complexity: {result['analysis']['complexity']}")
print(f"Agent Used: {result['agent_used']}")
print(f"Response: {result['response'][:200]}..." if len(result['response']) > 200 else f"Response: {result['response']}")

## Conclusion

This notebook demonstrates resource-aware agent patterns that optimize for cost and efficiency:

1. **Cost-Aware Design**: Different agents for different capability/cost profiles
2. **Intelligent Routing**: Automatic query complexity analysis
3. **Optimization**: Use the right tool for the job

In production systems, you can extend this pattern with:
- More sophisticated complexity analysis (sentiment, domain, intent)
- Dynamic model selection based on availability and pricing
- Performance monitoring and adaptive thresholds
- Multi-tier routing with fallback strategies