# Chapter 7: Monitoring and Observability in Strands Agents

## Introduction to Agent Observability

As you deploy Strands Agents into production environments, understanding how they're performing, debugging issues, and ensuring reliable operation becomes critical. This chapter focuses on monitoring and observability techniques for Strands Agents, helping you build reliable AI systems that can be effectively maintained in production.

We'll cover:
- Logging full lifecycle actions
- Monitoring performance and latency
- Tracing agent operations

As with previous chapters, we'll use the Nova Lite model (`us.amazon.nova-lite-v1:0`) as specified for our course.

## Setup and Prerequisites

Let's start by installing the necessary packages:

In [None]:
%pip install -U strands-agents strands-agents-tools
%pip install -U matplotlib

## Logging

Strands Agents provide built-in capabilities for logging agent operations. Let's start with some basic logging approaches:

In [None]:
import logging

# Configure the root strands logger
logging.getLogger("strands").setLevel(logging.DEBUG)

In [None]:
from strands import Agent
from strands_tools import current_time

# Create a simple agent
simple_agent = Agent(
    model="us.amazon.nova-lite-v1:0",  # Using Nova Lite model
    tools=[current_time],
    system_prompt="You are a helpful assistant that provides clear and informative responses."
)

result = simple_agent("What's the time in London?")

In [None]:
# Add a handler to see the logs
logging.basicConfig(
    format="%(levelname)s | %(name)s | %(message)s", 
    handlers=[logging.StreamHandler()],
    force=True
)

In [None]:
class AgentObservabilityMiddleware:
    def __init__(self):
        self.metrics = {
            "total_calls": 0,
            "total_tool_calls": 0,
            "total_errors": 0,
            "total_execution_time": 0,
            "tool_usage": {}
        }
    
    def __call__(self, next_handler):
        def handle(agent_instance, query, **kwargs):
            # Pre-processing
            self.metrics["total_calls"] += 1
            start_time = time.time()
            
            try:
                # Execute the agent
                response = next_handler(agent_instance, query, **kwargs)
                
                # Post-processing
                execution_time = time.time() - start_time
                self.metrics["total_execution_time"] += execution_time
                
                # Track tool usage
                tool_calls = getattr(response, 'tool_calls', [])
                self.metrics["total_tool_calls"] += len(tool_calls)
                
                for call in tool_calls:
                    tool_name = call.tool_name
                    if tool_name not in self.metrics["tool_usage"]:
                        self.metrics["tool_usage"][tool_name] = 0
                    self.metrics["tool_usage"][tool_name] += 1
                
                return response
                
            except Exception as e:
                self.metrics["total_errors"] += 1
                raise
        
        return handle
    
    def get_metrics(self):
        # Calculate average execution time
        avg_time = 0
        if self.metrics["total_calls"] > 0:
            avg_time = self.metrics["total_execution_time"] / self.metrics["total_calls"]
        
        # Add calculated metrics
        metrics = self.metrics.copy()
        metrics["average_execution_time"] = avg_time
        metrics["error_rate"] = self.metrics["total_errors"] / max(1, self.metrics["total_calls"])
        
        return metrics

Let's test our middleware with a Strands Agent:

In [None]:
# Create a custom tool for testing
@tool
def weather_info(location: str) -> str:
    """Get weather information for a specified location."""
    # Mock data for demonstration
    weather_data = {
        "new york": "72°F, Partly Cloudy",
        "london": "64°F, Rainy",
        "tokyo": "78°F, Sunny",
        "sydney": "70°F, Clear",
        "paris": "68°F, Cloudy"
    }
    return weather_data.get(location.lower(), "Weather information not available")

# Create an observable agent
middleware = AgentObservabilityMiddleware()
observable_agent = Agent(
    model="us.amazon.nova-pro-v1:0",
    tools=[calculator, weather_info],
    system_prompt="You are a helpful assistant that can check weather and perform calculations.",
    middleware=[middleware]
)

# Run some test queries
queries = [
    "What's the weather like in Tokyo?",
    "Calculate 573 * 218",
    "What's the weather in Paris and how much is 100 * 1.2?"
]

for query in queries:
    print(f"\nQuery: {query}")
    response = observable_agent(query)
    print(f"Response: {response.message}")

# Show the collected metrics
print("\n\nAgent Metrics:")
import json
print(json.dumps(middleware.get_metrics(), indent=2))

In [None]:
from strands.handlers.callback_handler import PrintingCallbackHandler

# Create a simple agent
simple_agent = Agent(
    model="us.amazon.nova-lite-v1:0",  # Using Nova Lite model
    tools=[current_time],
    system_prompt="You are a helpful assistant that provides clear and informative responses."
)

result = simple_agent("What's the time in London?")

### Callback Handler
The Strands Agents SDK provides two "process present" mechanisms:

- **Standard logging**: For internal operations, debugging, and errors (primarily for developers)
- **Callback system**: For user-facing output, streaming responses, and tool execution notifications

Callbacks are configured through the callback_handler parameter when creating an Agent object. You can use built-in handlers (`PrintingCallbackHandler` by default), or create custom callback handlers to process streaming events according to your application's specific requirements.

In [None]:
import logging

# Configure the root strands logger
logging.getLogger("strands").setLevel(logging.INFO)

In [None]:
simple_agent = Agent(
    model="us.amazon.nova-lite-v1:0",  # Using Nova Lite model
    tools=[current_time],
    system_prompt="You are a helpful assistant that provides clear and informative responses.",
    callback_handler=PrintingCallbackHandler()
)

result = simple_agent("What's the time in London?")

In [None]:
simple_agent = Agent(
    model="us.amazon.nova-lite-v1:0",  # Using Nova Lite model
    tools=[current_time],
    system_prompt="You are a helpful assistant that provides clear and informative responses.",
    callback_handler=None
)

result = simple_agent("What's the time in London?")

In [None]:
print(result)

## Monitoring
By default, Strands Agents provides you complete metrics to track and analyze your agent's performance. These metrics are automatically collected and can be accessed after agent execution. This data helps you optimize your agent's performance, monitor resource usage, and control operational costs without requiring any additional configuration.

You can access these metrics programmatically to integrate with your monitoring systems or to generate performance reports for your application.

In [None]:
import pprint
pprint.pprint(result.metrics.get_summary())

### Tracking Token Usage and Costs

For production systems, monitoring token usage and costs is critical.

In [None]:
result.metrics.accumulated_usage

## Tracing

Observability relies heavily on tracing to offer in-depth visibility into your agent's operations. By adhering to the OpenTelemetry standard, Strands Agents framework traces meticulously record the entire path of a request as it moves through your agent. This includes interactions with LLMs, data retrieval processes, tool utilization, and the handling of events within the loop.

<img src="jeager.jpg" width="750" alt="jeager">

(image using Jeager)

In [None]:
# Pull and run Jaeger all-in-one container
# Suppose you've install docker
!sudo docker run -d --name jaeger \
  -e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \
  -e COLLECTOR_OTLP_ENABLED=true \
  -p 6831:6831/udp \
  -p 6832:6832/udp \
  -p 5778:5778 \
  -p 16686:16686 \
  -p 4317:4317 \
  -p 4318:4318 \
  -p 14250:14250 \
  -p 14268:14268 \
  -p 14269:14269 \
  -p 9411:9411 \
  jaegertracing/all-in-one:latest

In [None]:
import os
from strands.telemetry.tracer import get_tracer

# Configure the tracer
tracer = get_tracer(
    service_name="strands-agents-svc",
    otlp_endpoint="http://localhost:4318",
    otlp_headers={"Authorization": "Bearer TOKEN"},
    enable_console_export=True
)

# Create agent
agent = Agent(
    model="us.amazon.nova-lite-v1:0",
    tools=[current_time],
    system_prompt="You are a helpful assistant that provides clear and informative responses."
)

# Execute a series of interactions that will be traced
response = agent("Hi!")
print(response)

# Ask a follow-up that uses tools
response = agent("What's the time in London?")
print(response)

# Each interaction creates a complete trace that can be visualized in your tracing tool

## Best Practices for Agent Observability

When implementing observability for your Strands Agents, consider these best practices:

1. Standardize instrumentation using OpenTelemetry
2. Design for multiple consumers using fan-out architecture
3. Optimize large data volume through filtering and sampling
4. Shift observability left during agent development

These practices should be implemented from day one to ensure reliable agent performance.

## Summary

In this chapter, we've explored various approaches to monitoring and observability for Strands Agents:

1. Basic logging techniques for tracking agent activity
2. Tracking token usage and costs
3. Agent monitoring
4. Tracing

These techniques allow you to maintain visibility into how your agents are performing in production, identify issues early, and ensure cost-effective operation. As your agent systems grow in complexity, a robust observability strategy becomes increasingly important.

## Exercises

1. Create a visualization that shows the distribution of response times for an agent over multiple requests
2. Build a simple web dashboard (using a library like Dash or Streamlit) to display agent metrics in real-time
3. Implement a token budget system that can automatically pause agent operations when a daily token limit is reached

## Ending

This concludes our course on Strands Agents. We've covered everything from the basics of creating agents, to using tools, customizing functionality, integrating with MCP, deploying to production, and building multi-agent systems. With the knowledge from all chapters in this course, you now have a comprehensive understanding of building, deploying, monitoring, and optimizing AI agents with the Strands Agents framework. 

🎉🎉🎉