# Lesson 10: Production-Ready Agents

This interactive notebook teaches you how to build production-ready agents:

- ‚úÖ Guardrails for content filtering and safety
- ‚úÖ PII redaction with hooks for privacy protection
- ‚úÖ OpenTelemetry for metrics and traces
- ‚úÖ Agent performance monitoring and evaluation
- ‚úÖ Production deployment best practices

**Estimated time:** 6-7 hours

**What you'll build:** Production-ready agents with safety, observability, and quality assurance!

## Setup

Import necessary modules and configure the environment:

In [None]:
from lesson_utils import load_environment, create_working_model, check_api_keys
from strands import Agent
from strands.hooks import HookProvider, HookRegistry, MessageAddedEvent, AfterInvocationEvent

# Load environment and check API keys
load_environment()
check_api_keys()

print("üéØ Lesson 10: Production-Ready Agents")
print("=" * 50)

## Part 1: Safety & Security - Guardrails and PII Redaction

Production agents require safety mechanisms:

### Guardrails
- **Content filtering** - Block harmful content
- **Topic boundaries** - Enforce allowed domains
- **PII protection** - Detect and redact sensitive data
- **Quality enforcement** - Maintain response standards

### Implementation Patterns:
1. **Bedrock Guardrails** - Built-in AWS protection
2. **Notify-Only Mode** - Shadow testing with hooks
3. **PII Redaction** - Third-party libraries (LLM Guard, Presidio)

**Reference:** [Strands Guardrails](https://strandsagents.com/latest/documentation/docs/user-guide/safety-security/guardrails/)

In [None]:
model = create_working_model()

if model:
    print("\nüìã Implementing Notify-Only Guardrails with Hooks...\n")

    class NotifyOnlyGuardrailsHook(HookProvider):
        """Hook-based guardrails that notify without blocking."""
        def __init__(self):
            self.blocked_patterns = ["sensitive", "confidential", "secret", "password"]

        def register_hooks(self, registry: HookRegistry) -> None:
            registry.add_callback(MessageAddedEvent, self.check_user_input)
            registry.add_callback(AfterInvocationEvent, self.check_assistant_response)

        def evaluate_content(self, content: str, source: str = "INPUT"):
            content_lower = content.lower()
            violations = [p for p in self.blocked_patterns if p in content_lower]
            if violations:
                print(f"\n[GUARDRAIL] WOULD BLOCK - {source}: {content[:100]}...")
                print(f"[GUARDRAIL] Violations: {', '.join(violations)}\n")

        def check_user_input(self, event: MessageAddedEvent) -> None:
            if event.message.get("role") == "user":
                content = "".join(block.get("text", "") for block in event.message.get("content", []))
                if content:
                    self.evaluate_content(content, "INPUT")

        def check_assistant_response(self, event: AfterInvocationEvent) -> None:
            if event.agent.messages and event.agent.messages[-1].get("role") == "assistant":
                content = "".join(block.get("text", "") for block in event.agent.messages[-1].get("content", []))
                if content:
                    self.evaluate_content(content, "OUTPUT")

    # Create agent with guardrail monitoring
    agent = Agent(
        model=model,
        system_prompt="You are a helpful assistant.",
        hooks=[NotifyOnlyGuardrailsHook()]
    )

    print("‚úì Agent created with notify-only guardrails\n")
    
    # Test with safe and potentially sensitive content
    response = agent("What is machine learning? Answer in 2 sentences.")
    print(f"Response: {response}\n")
    
    response = agent("Tell me about sensitive data handling in 2 sentences")
    print(f"Response: {str(response)[:150]}...")

    print("\nüí° Key Takeaway: Hooks enable flexible safety monitoring without blocking content")
else:
    print("‚ö†Ô∏è No API key available")

## Part 2: Observability - Metrics and Traces

**Observability** enables monitoring agent behavior and performance.

### Key Metrics:
- **Token Usage** - Input, output, total tokens (cost optimization)
- **Performance** - Latency, cycle count, execution time
- **Tool Usage** - Call counts, success rates, execution times
- **Traces** - Complete execution flow with OpenTelemetry

### Built-in Metrics:
Every agent invocation returns an `AgentResult` with comprehensive metrics.

**Reference:** [Strands Observability](https://strandsagents.com/latest/documentation/docs/user-guide/observability-evaluation/observability/)

In [None]:
model = create_working_model()

if model:
    print("\nüìä Capturing and Analyzing Agent Metrics...\n")
    
    agent = Agent(
        model=model,
        system_prompt="You are a helpful assistant that provides concise responses."
    )

    # Invoke agent and capture metrics
    result = agent("Explain what an AI agent is in 2 sentences")
    print(f"Response: {result}\n")

    # Access metrics
    print("üìà Metrics Analysis:")
    print("-" * 70)
    
    usage = result.metrics.accumulated_usage
    print(f"\nüí∞ Token Usage:")
    print(f"   Input tokens:  {usage['inputTokens']}")
    print(f"   Output tokens: {usage['outputTokens']}")
    print(f"   Total tokens:  {usage['totalTokens']}")

    metrics_data = result.metrics.accumulated_metrics
    print(f"\n‚ö° Performance:")
    print(f"   Latency: {metrics_data['latencyMs']}ms")
    print(f"   Cycles:  {result.metrics.cycle_count}")

    if result.metrics.cycle_durations:
        avg_cycle = sum(result.metrics.cycle_durations) / len(result.metrics.cycle_durations)
        print(f"   Avg cycle time: {avg_cycle:.3f}s")

    # Comprehensive summary
    print("\nüìã Comprehensive Summary:")
    summary = result.metrics.get_summary()
    print(f"   Total Duration: {summary.get('total_duration', 0):.3f}s")
    print(f"   Total Cycles: {summary.get('total_cycles', 0)}")
    print(f"   Avg Cycle Time: {summary.get('average_cycle_time', 0):.3f}s")

    print("\nüí° Use these metrics to optimize performance and control costs!")
else:
    print("‚ö†Ô∏è No API key available")

### OpenTelemetry Integration

For production monitoring, integrate with OpenTelemetry:

```python
# Installation
pip install 'strands-agents[otel]'

# Setup
from strands.telemetry import StrandsTelemetry

telemetry = StrandsTelemetry()
telemetry.setup_otlp_exporter()      # Send to collector
telemetry.setup_console_exporter()   # Print to console

agent = Agent(
    model=model,
    trace_attributes={
        'session.id': 'abc-1234',
        'user.id': 'user@example.com',
        'environment': 'production'
    }
)
```

**Integration platforms:**
- Jaeger - Open-source tracing
- AWS X-Ray - AWS native
- Datadog, New Relic - Commercial
- Langfuse - AI-specific

## Part 3: Evaluation - Testing and Quality Assurance

**Agent Evaluation** ensures consistent quality and performance.

### Evaluation Patterns:

1. **Metrics-Based Testing**
   - Latency < SLA threshold
   - Token usage within budget
   - Tool success rate > 95%

2. **Functional Testing**
   - Core capabilities work
   - Tools execute correctly
   - Error handling works

3. **Quality Testing**
   - Response relevance
   - Tone consistency
   - Safety compliance

4. **Regression Testing**
   - Previous bugs don't return
   - Quality maintained
   - Performance stable

In [None]:
model = create_working_model()

if model:
    print("\nüìä Running Performance Evaluation...\n")

    agent = Agent(
        model=model,
        system_prompt="You are a helpful assistant."
    )

    # Define test cases with performance criteria
    test_cases = [
        {
            "query": "What is 2+2?",
            "max_latency_ms": 5000,
            "max_tokens": 100,
            "description": "Simple arithmetic"
        },
        {
            "query": "Explain quantum computing in one sentence",
            "max_latency_ms": 8000,
            "max_tokens": 150,
            "description": "Concise explanation"
        },
    ]

    results = []

    for i, test_case in enumerate(test_cases, 1):
        print(f"Test {i}/{len(test_cases)}: {test_case['description']}")
        print("-" * 70)

        result = agent(test_case["query"])

        # Evaluate metrics
        latency = result.metrics.accumulated_metrics["latencyMs"]
        total_tokens = result.metrics.accumulated_usage["totalTokens"]

        passed = True
        issues = []

        if latency > test_case["max_latency_ms"]:
            passed = False
            issues.append(f"Latency {latency}ms > {test_case['max_latency_ms']}ms")

        if total_tokens > test_case["max_tokens"]:
            passed = False
            issues.append(f"Tokens {total_tokens} > {test_case['max_tokens']}")

        status = "‚úÖ PASS" if passed else "‚ùå FAIL"
        print(f"Status: {status}")
        print(f"Latency: {latency}ms (max: {test_case['max_latency_ms']}ms)")
        print(f"Tokens: {total_tokens} (max: {test_case['max_tokens']})")

        if issues:
            print(f"Issues: {', '.join(issues)}")

        results.append({"test": test_case["description"], "passed": passed})
        print()

    # Summary
    passed_count = sum(1 for r in results if r["passed"])
    total_count = len(results)
    pass_rate = (passed_count / total_count) * 100 if total_count > 0 else 0

    print("üìà Evaluation Summary:")
    print("-" * 70)
    print(f"Tests Run: {total_count}")
    print(f"Passed: {passed_count}")
    print(f"Failed: {total_count - passed_count}")
    print(f"Pass Rate: {pass_rate:.1f}%")

    print("\nüí° Regular evaluation ensures production quality!")
else:
    print("‚ö†Ô∏è No API key available")

## Part 4: Production Deployment Best Practices

### üõ°Ô∏è Security and Privacy:
- ‚úì Enable guardrails for content safety
- ‚úì Implement PII redaction
- ‚úì Use secure credential management
- ‚úì Encrypt data in transit and at rest
- ‚úì Regular security audits

### üìä Monitoring and Alerting:
- ‚úì Set up OpenTelemetry tracing
- ‚úì Monitor token usage and costs
- ‚úì Track latency and error rates
- ‚úì Alert on anomalies
- ‚úì Dashboard for key metrics

### ‚ö° Scalability and Performance:
- ‚úì Async/streaming for responsiveness
- ‚úì Caching for repeated queries
- ‚úì Connection pooling
- ‚úì Load balancing
- ‚úì Auto-scaling

### üí∞ Cost Optimization:
- ‚úì Monitor token usage patterns
- ‚úì Use caching to reduce API calls
- ‚úì Optimize prompts
- ‚úì Choose appropriate models
- ‚úì Set budget alerts

### üö® Incident Response:
- ‚úì Defined escalation procedures
- ‚úì Runbooks for common issues
- ‚úì Fallback mechanisms
- ‚úì Circuit breakers
- ‚úì Post-incident reviews

## Experiments

### Exercises:
1. **Integrate AWS Bedrock Guardrails** - Add real guardrails with AWS
2. **Implement PII Redaction** - Use LLM Guard or Presidio
3. **Set up OpenTelemetry** - Connect to Jaeger locally
4. **Create Test Suite** - Build 20+ structured test cases
5. **LLM Judge** - Implement automated quality evaluation
6. **Metrics Dashboard** - Visualize performance data
7. **Alerting** - Set up latency/error thresholds
8. **Caching Layer** - Reduce costs with caching
9. **Incident Runbook** - Document response procedures
10. **Canary Deployment** - Implement safe rollout strategy

Use the cell below for your experiments:

In [None]:
# Your experiments here!


## ‚úÖ Success Criteria

You've completed Lesson 10 if:

- ‚úÖ Understand guardrails for content safety
- ‚úÖ Can implement PII redaction with hooks
- ‚úÖ Access and interpret agent metrics
- ‚úÖ Understand OpenTelemetry integration
- ‚úÖ Can design structured test suites
- ‚úÖ Evaluate agents with quantitative metrics
- ‚úÖ Know production deployment best practices
- ‚úÖ Understand security, monitoring, and scalability

## üí° Key Concepts Learned

- **Guardrails** - Content filtering, topic blocking, PII protection
- **Hooks** - Flexible integration points for safety/monitoring
- **Observability** - Metrics, traces, logs for production monitoring
- **OpenTelemetry** - Industry standard for distributed tracing
- **Evaluation** - Systematic testing and quality assurance
- **Best Practices** - Security, monitoring, scalability, cost optimization

## üéì Congratulations!

You've completed all 10 lessons of the Strands Agent Framework!

### What You've Mastered:
- Foundation patterns (basic agents, tools)
- Intermediate patterns (state, async, multi-modal)
- Advanced patterns (hooks, context, conversation management)
- Multi-agent patterns (Graph, Swarm, Workflow, A2A)
- Production patterns (safety, observability, evaluation)

### Next Steps:
- Build your own production agent
- Explore AWS deployment options
- Contribute to Strands community
- Share your projects!

**You're now ready to build production-ready AI agents!** üöÄ