## Lab 4: Deploy to Production - Use AgentCore Runtime with Observability

### Overview

In Lab 3 we scaled our Customer Support Agent by centralizing tools through AgentCore Gateway with secure authentication. Now we'll complete the production journey by deploying our agent to AgentCore Runtime with comprehensive observability. This will transform our prototype into a production-ready system that can handle real-world traffic with full monitoring and automatic scaling.

[Amazon Bedrock AgentCore Runtime](https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/agents-tools-runtime.html) is a secure, fully managed runtime that empowers organizations to deploy and scale AI agents in production, regardless of framework, protocol, or model choice. It provides enterprise-grade reliability, automatic scaling, and comprehensive monitoring capabilities.

**Workshop Journey:**

- **Lab 1 (Done):** Create Agent Prototype - Built a functional customer support agent
- **Lab 2 (Done):** Enhance with Memory - Added conversation context and personalization
- **Lab 3 (Done):** Scale with Gateway & Identity - Shared tools across agents securely
- **Lab 4 (Current):** Deploy to Production - Used AgentCore Runtime with observability
- **Lab 5:** Build User Interface - Create a customer-facing application

### Why AgentCore Runtime & Production Deployment Matter

Current State (Lab 1-3): Agent runs locally with centralized tools but faces production challenges:

- Agent runs locally in a single session
- No comprehensive monitoring or debugging capabilities
- Cannot handle multiple concurrent users reliably

After this lab, we will have a production-ready agent infrastructure with:

- Serverless auto-scaling to handle variable demand
- Comprehensive observability with traces, metrics, and logging
- Enterprise reliability with automatic error recovery
- Secure deployment with proper access controls
- Easy management through AWS console and APIs and support for real-world production workloads.


### Adding comprehensive observability with AgentCore Observability

Additionally, AgentCore Runtime integrates seamlessly with [AgentCore Observability](https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/observability.html) to provide full visibility into your agent's behavior in production. AgentCore Observability automatically captures traces, metrics, and logs from your agent interactions, tool usage, and memory access patterns. In this lab we will see how AgentCore Runtime integrates with CloudWatch GenAI Observability to provide comprehensive monitoring and debugging capabilities.

For request tracing, AgentCore Observability captures the complete conversation flow including tool invocations, memory retrievals, and model interactions. For performance monitoring, it tracks response times, success rates, and resource utilization to help optimize your agent's performance.

During the observability flow, AgentCore Runtime automatically instruments your agent code and sends telemetry data to CloudWatch. You can then use CloudWatch dashboards and GenAI Observability features to analyze patterns, identify bottlenecks, and troubleshoot issues in real-time.

### Architecture for Lab 4
<div style="text-align:left"> 
    <img src="images/architecture_lab4_runtime.png" width="75%"/> 
</div>

*Agent now runs in AgentCore Runtime with full observability through CloudWatch, serving production traffic with auto-scaling and comprehensive monitoring. Memory and Gateway integrations from previous labs remain fully functional in the production environment.*

### Key Features

- **Serverless Agent Deployment:** Transform your local agent into a scalable production service using AgentCore Runtime with minimal code changes
- **Comprehensive Observability:** Full request tracing, performance metrics, and debugging capabilities through CloudWatch GenAI Observability

### Prerequisites

- Python 3.12+
- AWS account with appropriate permissions
- Docker, Finch or Podman installed and running
- Amazon Bedrock AgentCore SDK
- Strands Agents framework
- **Lab 3 Completion:** This lab builds on Lab 3 (AgentCore Gateway). You MUST run [lab-03-agentcore-gateway](lab-03-agentcore-gateway.ipynb) to provision the gateway before running this lab.

**Note**: You MUST enable [CloudWatch Transaction Search](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Enable-TransactionSearch.html) to be able to see AgentCore Observability traces in CloudWatch.


---
## ðŸ“š Theory: Understanding Production Deployment

### What is Production Deployment?

Production deployment is the process of making your AI agent available to end-users in a reliable, scalable, and monitored environment. Unlike development or testing environments where agents run on local machines, production deployment requires:

1. **High Availability**: The agent must be accessible 24/7 with minimal downtime
2. **Scalability**: Automatically handle varying numbers of concurrent users
3. **Security**: Protect user data and control access through authentication
4. **Observability**: Monitor performance, track errors, and debug issues
5. **Reliability**: Gracefully handle failures and recover automatically

### Why AgentCore Runtime?

AgentCore Runtime is AWS's managed service for deploying AI agents to production. It provides:

- **Serverless Architecture**: No infrastructure to manage - AWS handles servers, scaling, and maintenance
- **Auto-scaling**: Automatically adjusts capacity based on incoming traffic
- **Built-in Observability**: Integrated with CloudWatch for comprehensive monitoring
- **Framework Agnostic**: Works with any AI framework (LangChain, Strands, custom code)
- **Container-based**: Your agent runs in Docker containers for consistency across environments

### Key Concepts

**1. Containerization**
- Your agent code is packaged into a Docker container image
- The container includes your code, dependencies, and runtime environment
- Ensures your agent runs identically in development and production

**2. Entrypoint Function**
- The entry point is the function that AgentCore Runtime calls when a request arrives
- It receives the user's input and returns the agent's response
- Decorated with `@app.entrypoint` to register it with the runtime

**3. Session Management**
- Each conversation has a unique session ID
- Sessions maintain conversation context and memory
- Different users have different sessions (isolation)

**4. Authentication & Authorization**
- JWT (JSON Web Tokens) verify user identity
- Bearer tokens are passed in HTTP headers
- Tokens are validated before processing requests

### The Deployment Pipeline

```
Local Code â†’ Docker Image â†’ ECR Registry â†’ AgentCore Runtime â†’ Production
     â†“             â†“              â†“                â†“              â†“
  Python       Package       Store Image      Deploy          Serve Users
```

### Observability: Why It Matters

In production, you need to answer questions like:
- Is the agent responding quickly enough?
- Which tools are being used most frequently?
- Are there any errors or failures?
- How many concurrent users are active?

AgentCore Observability provides:
- **Traces**: Complete record of each request (what happened, in what order)
- **Metrics**: Quantitative measurements (response time, success rate, etc.)
- **Logs**: Detailed event records for debugging

---

### Step 1: Import Required Libraries

#### ðŸ“– What's Happening Here?

This step imports the necessary Python libraries and ensures our memory resource from Lab 2 is available. The `create_or_get_memory_resource()` function either retrieves an existing memory configuration or creates a new one if it doesn't exist.

**Key Components:**
- **boto3**: AWS SDK for Python - allows interaction with AWS services
- **get_ssm_parameter**: Retrieves configuration values stored in AWS Systems Manager Parameter Store
- **create_or_get_memory_resource**: Ensures the AgentCore Memory from Lab 2 is ready to use

In [None]:
# Import required libraries
import boto3
from lab_helpers.utils import get_ssm_parameter
from lab_helpers.lab2_memory import create_or_get_memory_resource

# Ensure memory resource exists (from Lab 2)
# This creates or retrieves the AgentCore Memory configuration
# Returns the memory resource name that will be used by our agent
create_or_get_memory_resource()  # Just in case the memory lab wasn't executed

---
## ðŸ“š Theory: Transforming Local Code to Runtime-Ready

### The Challenge: From Local to Production

Your agent currently runs on your local machine. To deploy it to production, we need to:

1. **Make it invokable**: The agent must respond to external requests (not just local function calls)
2. **Handle authentication**: Verify that requests come from authorized users
3. **Manage sessions**: Keep conversations separate for different users
4. **Enable monitoring**: Automatically report performance and errors

### The AgentCore Runtime Pattern

AgentCore Runtime uses a simple pattern to transform your local code:

```python
# Local code pattern:
def my_function():
    # Your agent logic
    return result

# Runtime-ready pattern:
from bedrock_agentcore.runtime import BedrockAgentCoreApp

app = BedrockAgentCoreApp()  # Initialize the runtime

@app.entrypoint  # Register this as the entry point
def my_function(payload, context):
    # Same agent logic, but now receives payload and context
    return result

app.run()  # Start the runtime server
```

### Understanding the Four Key Lines

**Line 1: Import the Runtime App**
```python
from bedrock_agentcore.runtime import BedrockAgentCoreApp
```
This imports the main class that manages your agent in production.

**Line 2: Initialize the App**
```python
app = BedrockAgentCoreApp()
```
Creates an instance that will:
- Set up HTTP server to receive requests
- Configure authentication validation
- Enable automatic observability
- Manage session routing

**Line 3: Decorate the Entrypoint**
```python
@app.entrypoint
```
Marks your function as the entry point. When a request arrives, this function is called.

**Line 4: Run the Runtime**
```python
app.run()
```
Starts the server and begins listening for requests.

### What You Receive in the Entrypoint

**payload**: Dictionary containing the request data
```python
{
    "prompt": "User's question or message",
    # Other custom fields you might include
}
```

**context**: Object with request metadata
```python
context.request_headers  # HTTP headers including Authorization
context.session_id       # Unique ID for this conversation
context.user_id          # Authenticated user identifier
```

### Authentication Flow

1. User sends request with JWT token in `Authorization` header
2. AgentCore Runtime validates the token
3. Your entrypoint extracts the token from `context.request_headers`
4. Token is passed to AgentCore Gateway for tool access
5. Gateway validates token again before allowing tool usage

This ensures secure, end-to-end authentication.

---

### Step 2: Preparing Your Agent for AgentCore Runtime

#### Creating the Runtime-Ready Agent

Let's first define the necessary AgentCore Runtime components via Python SDK within our previous local agent implementation.

#### ðŸ“– Code Walkthrough

Observe the `#### AGENTCORE RUNTIME - LINE X ####` comments below to see where the relevant deployment code is added. You'll find 4 such lines that prepare the runtime-ready agent:

1. **Import the Runtime App** with `from bedrock_agentcore.runtime import BedrockAgentCoreApp`
2. **Initialize the App** with `app = BedrockAgentCoreApp()`
3. **Decorate our invocation function** with `@app.entrypoint`
4. **Let AgentCore Runtime control the execution** with `app.run()`

##### Key Implementation Details:

**Authentication Handling:**
- Extracts JWT token from request headers: `context.request_headers.get('Authorization', '')`
- Validates that authentication is present before processing
- Propagates the token to AgentCore Gateway: `headers={"Authorization": auth_header}`

**Request Processing:**
- Receives user input from `payload['prompt']`
- Maintains session context via memory integration
- Returns plain text responses for synchronous invocation

**Error Handling:**
- Returns HTTP 401 if authentication is missing
- Catches and logs any runtime errors
- Provides clear error messages to users

This implementation preserves all memory and tool functionality from previous labs while adding production-ready features.

#### Step 2a: Create the Runtime File - Part 1 (Imports and Configuration)

**ðŸ“– What's in this cell?**

This cell creates the beginning of our runtime file with:
- **Runtime imports** (LINE 1): Import BedrockAgentCoreApp framework
- **Core libraries**: Agent framework, model, memory, and tools
- **Configuration retrieval**: Get AWS resource ARNs from Parameter Store
- **App initialization** (LINE 2): Create the runtime application instance

In [None]:
%%writefile ./lab_helpers/lab4_runtime.py
# ==============================================================================
# AGENTCORE RUNTIME - LINE 1: Import the Runtime Application Framework
# ==============================================================================
from bedrock_agentcore.runtime import (
    BedrockAgentCoreApp,
)  #### AGENTCORE RUNTIME - LINE 1 ####

from strands import Agent
from strands.models.bedrock import BedrockChatModel
from bedrock_agentcore.memory import BedrockMemoryWrapper
from bedrock_agentcore.client.mcp import MCPClient
import json

# Import helper functions for retrieving configuration from AWS Parameter Store
from lab_helpers.utils import get_ssm_parameter

# ==============================================================================
# AGENTCORE RUNTIME - LINE 2: Initialize the Runtime Application
# ==============================================================================
# This creates the app instance that will manage our agent in production
# It automatically configures:
# - HTTP server to receive requests
# - Authentication validation
# - Session management
# - Observability integration
app = BedrockAgentCoreApp()  #### AGENTCORE RUNTIME - LINE 2 ####

# ==============================================================================
# CONFIGURATION: Retrieve AWS Resources
# ==============================================================================
# These values were created in previous labs and stored in Parameter Store
MEMORY_RESOURCE_ARN = get_ssm_parameter("MEMORY_RESOURCE_ARN")
GATEWAY_STACK_NAME = get_ssm_parameter("GATEWAY_STACK_NAME")
GATEWAY_URL = get_ssm_parameter(f"/{GATEWAY_STACK_NAME}/GatewayUrl")

#### Step 2b: Create the Runtime File - Part 2 (Entrypoint Function Header)

**ðŸ“– What's in this cell?**

This cell adds the entrypoint function definition with:
- **@app.entrypoint decorator** (LINE 3): Registers this as the main entry point
- **Function signature**: Defines payload and context parameters
- **Documentation**: Complete docstring explaining the function
- **Authentication validation**: Checks for JWT token in headers

In [None]:
%%writefile -a ./lab_helpers/lab4_runtime.py

# ==============================================================================
# AGENTCORE RUNTIME - LINE 3: Define the Entrypoint Function
# ==============================================================================
# The @app.entrypoint decorator registers this function as the entry point
# When a request arrives, AgentCore Runtime will call this function
@app.entrypoint  #### AGENTCORE RUNTIME - LINE 3 ####
def invoke(payload: dict, context):
    """
    Main entrypoint for agent invocations in AgentCore Runtime.
    
    This function:
    1. Validates authentication from request headers
    2. Extracts user input from the payload
    3. Configures the agent with memory and tools
    4. Processes the request and returns a response
    
    Parameters:
    -----------
    payload : dict
        Request data containing:
        - prompt: str - The user's question or message
    
    context : RequestContext
        Runtime context containing:
        - request_headers: dict - HTTP headers including Authorization
        - session_id: str - Unique identifier for this conversation
        - user_id: str - Authenticated user identifier
    
    Returns:
    --------
    str : The agent's response text
    """
    
    # ==========================================================================
    # STEP 1: Extract and Validate Authentication
    # ==========================================================================
    # Get the JWT token from the Authorization header
    # Format: "Bearer <token>"
    auth_header = context.request_headers.get('Authorization', '')
    
    if not auth_header:
        # No authentication provided - reject the request
        return {
            "statusCode": 401,
            "body": "Missing Authorization header"
        }
    
    # ==========================================================================
    # STEP 2: Extract User Input
    # ==========================================================================
    # Get the user's question/message from the payload
    user_prompt = payload.get("prompt", "")
    
    if not user_prompt:
        return {
            "statusCode": 400,
            "body": "Missing prompt in payload"
        }

#### Step 2c: Create the Runtime File - Part 3 (Model and Memory Setup)

**ðŸ“– What's in this cell?**

This cell adds the model configuration:
- **Language Model**: Initialize Claude 3.5 Sonnet v2
- **Memory Integration**: Wrap the model with BedrockMemoryWrapper for context retention
- This enables the agent to remember previous conversations within the session

In [None]:
%%writefile -a ./lab_helpers/lab4_runtime.py
    
    # ==========================================================================
    # STEP 3: Initialize the Language Model
    # ==========================================================================
    # Configure Claude 3.5 Sonnet v2 as our AI model
    model = BedrockChatModel(
        model_id="us.anthropic.claude-3-5-sonnet-20241022-v2:0",
        runtime="bedrock",
    )
    
    # ==========================================================================
    # STEP 4: Configure Memory (from Lab 2)
    # ==========================================================================
    # Wrap the model with AgentCore Memory for conversation context
    # This allows the agent to remember previous interactions in this session
    model_with_memory = BedrockMemoryWrapper(
        model=model,
        memory_resource_arn=MEMORY_RESOURCE_ARN,
    )

#### Step 2d: Create the Runtime File - Part 4 (Tools Configuration)

**ðŸ“– What's in this cell?**

This cell adds tools integration:
- **Gateway Connection**: Connect to AgentCore Gateway from Lab 3
- **Authentication Propagation**: Pass JWT token to Gateway for authorization
- **Tool Retrieval**: Get the list of available tools (warranty check, knowledge base, etc.)

In [None]:
%%writefile -a ./lab_helpers/lab4_runtime.py
    
    # ==========================================================================
    # STEP 5: Configure Tools via AgentCore Gateway (from Lab 3)
    # ==========================================================================
    # Connect to the AgentCore Gateway to access centralized tools
    # Pass the authentication token so the Gateway can verify access
    mcp_client = MCPClient(
        url=GATEWAY_URL,
        headers={"Authorization": auth_header},  # Token propagation
    )
    
    # Retrieve the list of available tools from the Gateway
    tools = mcp_client.get_tools()

#### Step 2e: Create the Runtime File - Part 5 (Agent Creation and Execution)

**ðŸ“– What's in this cell?**

This cell completes the agent setup:
- **Agent Initialization**: Create the Agent with model, tools, and instructions
- **Agent Instructions**: Define the agent's role and behavior guidelines
- **Request Processing**: Execute the agent and return the response

In [None]:
%%writefile -a ./lab_helpers/lab4_runtime.py
    
    # ==========================================================================
    # STEP 6: Create and Configure the Agent
    # ==========================================================================
    # Define the agent's role and capabilities
    agent = Agent(
        name="Customer Support Agent",
        model=model_with_memory,  # Model with memory integration
        tools=tools,  # Tools from AgentCore Gateway
        instructions="""
        You are a helpful customer support agent with access to warranty check, 
        knowledge base search, and order tracking tools.
        
        Your responsibilities:
        - Answer customer questions accurately using available tools
        - Maintain a friendly and professional tone
        - Use tools when needed to retrieve specific information
        - Remember context from previous messages in this conversation
        
        Guidelines:
        - Always verify warranty information using the warranty_check tool
        - Search the knowledge base for technical questions
        - Track orders using the order_tracking tool when customers ask
        - If you don't know something, say so - don't make up information
        """,
    )
    
    # ==========================================================================
    # STEP 7: Process the Request
    # ==========================================================================
    # Run the agent synchronously and get the response
    # The agent will:
    # 1. Analyze the user's prompt
    # 2. Decide which tools (if any) to use
    # 3. Call those tools via the Gateway
    # 4. Retrieve relevant memories from the session
    # 5. Generate a final response
    result = agent.run_sync(user_prompt)
    
    # ==========================================================================
    # STEP 8: Return the Response
    # ==========================================================================
    # Extract the text response from the result
    # The result object contains the agent's final answer
    return result.text

#### Step 2f: Create the Runtime File - Part 6 (Runtime Execution)

**ðŸ“– What's in this cell?**

This final cell completes the runtime file:
- **app.run()** (LINE 4): Starts the HTTP server to listen for requests
- This is what makes the agent continuously available in production
- When deployed to AgentCore Runtime, this runs inside a container

In [None]:
%%writefile -a ./lab_helpers/lab4_runtime.py

# ==============================================================================
# AGENTCORE RUNTIME - LINE 4: Start the Runtime Server
# ==============================================================================
# This starts the HTTP server and begins listening for requests
# In production, this runs continuously in a container
# During local testing, you can run this file directly
if __name__ == "__main__":
    app.run()  #### AGENTCORE RUNTIME - LINE 4 ####

#### âœ… Step 2 Complete: Runtime File Created

**What we just built:**

The runtime file (`lab4_runtime.py`) is now complete with all 4 critical runtime lines:
1. âœ… **LINE 1**: Imported `BedrockAgentCoreApp`
2. âœ… **LINE 2**: Initialized `app = BedrockAgentCoreApp()`
3. âœ… **LINE 3**: Decorated function with `@app.entrypoint`
4. âœ… **LINE 4**: Added `app.run()` to start the server

**File Structure:**
```
lab4_runtime.py
â”œâ”€ Imports and Configuration
â”œâ”€ Entrypoint Function (@app.entrypoint)
â”‚  â”œâ”€ Authentication validation
â”‚  â”œâ”€ Model setup with memory
â”‚  â”œâ”€ Tools configuration via Gateway
â”‚  â”œâ”€ Agent creation
â”‚  â””â”€ Request processing
â””â”€ Runtime execution (app.run())
```

This file is now ready to be packaged into a Docker container and deployed to AgentCore Runtime!

---
## ðŸ“š Theory: Docker Containerization

### Why Containers?

A container is like a lightweight, portable box that contains everything your agent needs to run:
- Your Python code
- All dependencies (libraries, packages)
- The Python runtime itself
- Configuration files

**Benefits:**
1. **Consistency**: Runs the same everywhere (your laptop, AWS, colleague's machine)
2. **Isolation**: Doesn't interfere with other applications
3. **Portability**: Easy to move between environments
4. **Reproducibility**: Same image always produces same results

### The Dockerfile: Building Instructions

A Dockerfile is a recipe for building your container image. Let's understand each part:

```dockerfile
# Start from a base image with Python pre-installed
FROM python:3.12-slim

# Set the working directory inside the container
WORKDIR /app

# Copy requirements file and install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy your application code
COPY . .

# Define the command to run when container starts
CMD ["python", "lab_helpers/lab4_runtime.py"]
```

### Container Registry (ECR)

Amazon Elastic Container Registry (ECR) is like a library for container images:
- Stores your Docker images securely in AWS
- AgentCore Runtime pulls images from here when deploying
- Supports versioning and access control

### Build Process

```
Code + Dockerfile â†’ docker build â†’ Container Image â†’ docker push â†’ ECR
```

1. **Build**: Docker reads the Dockerfile and creates an image
2. **Tag**: Image is labeled with a version/name
3. **Push**: Image is uploaded to ECR
4. **Deploy**: AgentCore Runtime pulls the image and runs it

### Multi-Architecture Builds

The `--platform linux/amd64` flag ensures the image works on AWS's servers (even if you're building on a different type of computer like Apple M1/M2).

---

### Step 3: Deploy Agent to AgentCore Runtime

#### ðŸ“– What Happens in This Step?

This is where the magic happens! We'll package our agent code into a Docker container and deploy it to AgentCore Runtime. The process involves:

1. **Building** a Docker image with our agent code
2. **Pushing** the image to Amazon ECR (container registry)
3. **Creating** an AgentCore Runtime resource that runs our container

The helper function `deploy_to_agentcore_runtime()` automates this entire process:

**What It Does:**
- Creates an ECR repository to store your container image
- Builds a Docker image from your code using the Dockerfile
- Authenticates with ECR and pushes the image
- Creates an IAM execution role with proper permissions
- Provisions an AgentCore Runtime resource
- Configures memory integration and logging
- Waits for the deployment to complete

**Expected Duration:** 3-5 minutes

This deployment makes your agent accessible via AWS APIs with automatic scaling, monitoring, and production-grade reliability.

In [None]:
# Import the deployment helper function
from lab_helpers.lab4_deployment import deploy_to_agentcore_runtime

# Deploy the agent to AgentCore Runtime
# This function will:
# 1. Build a Docker container with your agent code
# 2. Push it to Amazon ECR
# 3. Create an AgentCore Runtime resource
# 4. Configure all necessary permissions
print("Starting deployment to AgentCore Runtime...")
print("This will take approximately 3-5 minutes.")
print("")

runtime_arn = deploy_to_agentcore_runtime()

print("")
print("âœ… Deployment complete!")
print(f"Runtime ARN: {runtime_arn}")
print("")
print("Your agent is now running in production and ready to handle requests!")

---
## ðŸ“š Theory: Authentication & Security

### Why Authentication Matters

In production, you need to ensure:
1. **Identity**: Know who is making the request
2. **Authorization**: Verify they have permission to use the agent
3. **Security**: Protect against unauthorized access
4. **Audit**: Track who did what and when

### JWT (JSON Web Tokens)

JWT is a secure way to transmit identity information. A JWT contains:

```
Header.Payload.Signature
```

**Header**: Metadata about the token type and algorithm
```json
{
  "alg": "RS256",
  "typ": "JWT"
}
```

**Payload**: Claims about the user
```json
{
  "sub": "user123",
  "name": "John Doe",
  "exp": 1735689600
}
```

**Signature**: Cryptographic proof of authenticity
- Created by signing header + payload with a secret key
- Cannot be forged without the secret
- Ensures the token hasn't been tampered with

### Amazon Cognito

Cognito is AWS's managed authentication service:
- **User Pools**: Store and manage user accounts
- **Authentication**: Verify usernames and passwords
- **Token Issuance**: Create JWTs for authenticated users
- **Token Validation**: Verify JWTs are valid and not expired

### Authentication Flow

```
1. User logs in with username/password
   â†“
2. Cognito verifies credentials
   â†“
3. Cognito issues JWT tokens (ID token, Access token)
   â†“
4. User includes token in requests: Authorization: Bearer <token>
   â†“
5. AgentCore Runtime validates the token
   â†“
6. If valid, request is processed
```

### Token Types

**ID Token**: Contains user identity information
- Who the user is (name, email, etc.)
- Used for personalization

**Access Token**: Grants permission to access resources
- What the user can do
- Used for authorization

**Refresh Token**: Used to get new tokens
- Long-lived (days/weeks)
- Exchanges for fresh ID/Access tokens when they expire

### Security Best Practices

1. **Never share tokens**: Keep them secret like passwords
2. **Use HTTPS**: Always transmit tokens over encrypted connections
3. **Short expiry**: Tokens should expire quickly (minutes/hours)
4. **Validate thoroughly**: Check signature, expiration, and claims
5. **Rotate regularly**: Use refresh tokens to get new access tokens

---

### Step 4: Invoke the Production Agent

#### ðŸ“– Testing the Production Deployment

Now that our agent is running in AgentCore Runtime, we can test it! This section demonstrates:

1. **Authentication**: Getting a JWT token from Cognito
2. **Session Management**: Creating unique sessions for different users
3. **Agent Invocation**: Sending requests to the production agent
4. **Context Preservation**: Verifying memory works across messages

#### Understanding the Test Flow

**Step A: Get Authentication Token**
- Retrieve a JWT token from Cognito (simulating user login)
- This token proves the user's identity

**Step B: Create Session ID**
- Generate a unique identifier for this conversation
- Different sessions = different users/conversations

**Step C: Invoke Agent**
- Send user input along with token and session ID
- Agent processes request using its tools and memory
- Response is returned

**Step D: Test Context**
- Send follow-up questions in the same session
- Agent should remember previous conversation

Let's see it in action!

#### Step 4a: Import Required Components and Get Authentication

In [None]:
# Import necessary libraries for testing
import uuid
from lab_helpers.lab3_cognito import get_cognito_bearer_token
from lab_helpers.lab4_client import AgentCoreRuntime

# Get an authentication token from Cognito
# This simulates a user logging in and receiving credentials
print("Authenticating with Cognito...")
access_token = get_cognito_bearer_token()
print("âœ… Authentication successful")
print(f"Token type: {access_token['token_type']}")
print(f"Token: {access_token['bearer_token'][:50]}...")
print("")

---
## ðŸ“š Theory: Session Management

### What is a Session?

A session represents a single conversation between a user and the agent. Think of it like a phone call - each call is separate, even if it's the same person calling.

**Session Characteristics:**
- **Unique ID**: Each session has a unique identifier (UUID)
- **Isolated Context**: Messages in one session don't affect another
- **Memory Scope**: Agent remembers previous messages within the same session
- **User-Specific**: Different users have different sessions

### Why Session IDs?

Imagine a customer support agent handling multiple customers:
```
Session 1: Customer A asking about iPhone warranty
Session 2: Customer B tracking an order
Session 3: Customer A (again) asking follow-up question
```

The agent needs to:
- Keep Customer A's conversations separate from Customer B
- Remember context when Customer A returns (Session 3)
- Not mix up information between customers

### UUID (Universally Unique Identifier)

We use UUIDs as session IDs because:
- **Guaranteed Unique**: Virtually impossible to have duplicates
- **Random**: Cannot be guessed or predicted
- **No Coordination Needed**: Can generate IDs independently

Example UUID: `550e8400-e29b-41d4-a716-446655440000`

### Session Lifecycle

```
1. User starts conversation
   â†’ Generate new session ID
   
2. User sends messages
   â†’ Use same session ID for all messages
   â†’ Agent remembers previous context
   
3. Conversation ends
   â†’ Session remains in memory for some time
   â†’ Can resume with same session ID
   
4. Eventually expires
   â†’ Memory is cleared after inactivity period
   â†’ New conversation needs new session ID
```

### Memory Integration

AgentCore Memory uses session IDs to:
- Store conversation history per session
- Retrieve relevant context when processing new messages
- Maintain user preferences and state
- Enable multi-turn conversations

---

#### Step 4b: First User - Initial Query

In [None]:
# Create a client to interact with our deployed agent
agentcore_runtime = AgentCoreRuntime()

# Generate a unique session ID for this user's conversation
# This ensures their messages are kept separate from other users
session_id = uuid.uuid4()
print(f"Created session ID: {session_id}")
print("")

# First query: User asks about their iPhone warranty
user_query = "I have an iPhone 13 Pro with serial number ABC123456789. Can you check my warranty status?"

print(f"User Question: {user_query}")
print("Invoking agent...")
print("")

# Send the request to our production agent
# The agent will:
# 1. Validate the JWT token
# 2. Use the warranty_check tool via AgentCore Gateway
# 3. Store this interaction in memory
# 4. Generate a response
response = agentcore_runtime.invoke(
    {"prompt": user_query},
    bearer_token=access_token["bearer_token"],
    session_id=str(session_id),
)

print("Agent Response:")
print(response["response"])
print("")
print("âœ… First query successful")

#### Step 4c: Same User - Follow-up Query (Testing Memory)

Now let's test if the agent remembers the context from the previous message. We'll ask a follow-up question WITHOUT mentioning the iPhone again. The agent should recall it from memory.

In [None]:
# Follow-up question in the same session
# Notice: We don't mention "iPhone" or the serial number
# The agent should remember this from the previous message
user_query = "What specific coverage does my warranty include?"

print(f"User Question: {user_query}")
print("Invoking agent with same session ID...")
print("")

# Same session_id as before - this is key!
# AgentCore Memory will retrieve the previous conversation context
response = agentcore_runtime.invoke(
    {"prompt": user_query},
    bearer_token=access_token["bearer_token"],
    session_id=str(session_id),  # Same session = remembered context
)

print("Agent Response:")
print(response["response"])
print("")
print("âœ… Memory test successful - agent remembered the iPhone context!")

#### Step 4d: Same User - Technical Question (Testing Knowledge Base)

Let's test the knowledge base search tool by asking a technical question.

In [None]:
# Technical question that requires knowledge base search
user_query = "Tell me detailed information about the technical documentation on installing a new CPU"

print(f"User Question: {user_query}")
print("Invoking agent (will use knowledge base tool)...")
print("")

response = agentcore_runtime.invoke(
    {"prompt": user_query},
    bearer_token=access_token["bearer_token"],
    session_id=str(session_id),
)

print("Agent Response:")
print(response["response"])
print("")
print("âœ… Knowledge base integration working")

#### Step 4e: Different User - Testing Session Isolation

Now let's create a completely different session (simulating a different user). The agent should NOT remember anything from the previous conversation.

In [None]:
# Create a NEW session ID for a different user/conversation
session_id2 = uuid.uuid4()
print(f"Created NEW session ID: {session_id2}")
print("This simulates a completely different customer.")
print("")

# This user asks about a Gaming Console
user_query = "I have a Gaming Console Pro device, I want to check my warranty status, warranty serial number is MNO33333333."

print(f"User Question: {user_query}")
print("Invoking agent with NEW session ID...")
print("")

# Different session_id = completely separate conversation
# Agent will NOT know about the iPhone from session_id (first user)
response = agentcore_runtime.invoke(
    {"prompt": user_query},
    bearer_token=access_token["bearer_token"],
    session_id=str(session_id2),  # NEW session = no shared context
)

print("Agent Response:")
print(response["response"])
print("")
print("âœ… Session isolation working - this user knows nothing about the iPhone!")

#### ðŸ“Š Test Summary

We've successfully verified that our production agent:

**âœ… Authentication Works**
- Validates JWT tokens from Cognito
- Rejects requests without proper authentication

**âœ… Memory Works**
- Remembers context within the same session
- Answers follow-up questions without re-explaining

**âœ… Tools Work**
- Warranty check via AgentCore Gateway
- Knowledge base search for technical questions

**âœ… Session Isolation Works**
- Different sessions are completely separate
- No information leakage between users

**And it all runs in production with:**
- No infrastructure management needed
- Automatic scaling for multiple users
- Built-in monitoring and logging
- Secure, enterprise-grade deployment

---
## ðŸ“š Theory: Observability & Monitoring

### What is Observability?

Observability is the ability to understand what's happening inside your system by examining its outputs. It's like having X-ray vision into your agent's behavior.

**The Three Pillars of Observability:**

1. **Logs**: Text records of events that happened
   - "User asked about warranty at 10:15 AM"
   - "Tool execution started at 10:15:02 AM"
   - "Response generated at 10:15:05 AM"

2. **Metrics**: Numerical measurements over time
   - Average response time: 3.2 seconds
   - Success rate: 98.5%
   - Requests per minute: 45

3. **Traces**: End-to-end journey of a single request
   - Request arrives â†’ Token validated â†’ Agent invoked â†’ Tool called â†’ Response sent
   - Shows timing and dependencies between steps

### Why Observability Matters in Production

**Without Observability:**
- "The agent is slow" - But where is the bottleneck?
- "Some requests are failing" - Which ones and why?
- "Users are unhappy" - What's causing the issues?

**With Observability:**
- "The knowledge base search takes 5 seconds" - Optimize that tool
- "Requests fail when warranty tool times out" - Add timeout handling
- "90% of slow requests happen during peak hours" - Scale up capacity

### AgentCore Observability

AgentCore automatically instruments your agent code to capture:

**Request-Level Tracing:**
```
Request ID: 123-456-789
â”œâ”€ Received user input (0ms)
â”œâ”€ Validated token (50ms)
â”œâ”€ Retrieved memory (200ms)
â”œâ”€ Executed warranty_check tool (1500ms)
â”œâ”€ Generated response (800ms)
â””â”€ Returned to user (2550ms total)
```

**Performance Metrics:**
- P50 latency (median): 2.1s
- P95 latency (95th percentile): 4.5s
- P99 latency (worst case): 8.2s

**Error Tracking:**
- Authentication failures: 2%
- Tool timeout errors: 0.5%
- Memory retrieval errors: 0.1%

### CloudWatch GenAI Observability

AWS CloudWatch provides a specialized dashboard for AI agents:

**Agents View:**
- See all deployed agents
- High-level health metrics
- Quick access to details

**Sessions View:**
- List of all conversations
- Filter by user, time, or status
- See session duration and message count

**Traces View:**
- Detailed request timeline
- Tool invocation sequences
- Performance bottlenecks
- Error stack traces

### Reading a Trace

Example trace breakdown:
```
Span 1: HTTP Request (2550ms total)
  Span 2: Authentication (50ms)
  Span 3: Agent Invocation (2450ms)
    Span 4: Memory Retrieval (200ms)
    Span 5: Tool Execution (1500ms)
      Span 6: Gateway Request (1400ms)
      Span 7: Tool Processing (100ms)
    Span 8: Response Generation (800ms)
```

This shows:
- Most time spent in tool execution (1500ms)
- Gateway network latency is significant (1400ms)
- Opportunity to optimize tool performance

### Using Observability for Optimization

**Identify Slow Operations:**
1. Look at P95/P99 latencies
2. Find spans taking longest
3. Optimize those components

**Debug Errors:**
1. Filter traces by error status
2. Examine error messages and stack traces
3. Identify patterns (same tool? same time?)
4. Fix root cause

**Monitor User Experience:**
1. Track session success rate
2. Measure time to first response
3. Monitor message round-trip time
4. Set alerts for degradation

### Best Practices

1. **Set up alerts**: Get notified when error rate spikes or latency increases
2. **Review regularly**: Check dashboards weekly to spot trends
3. **Investigate anomalies**: When metrics deviate, dig into traces
4. **Correlate with changes**: Did a deployment cause the issue?
5. **Document learnings**: Build a knowledge base of common issues

---

### Step 5: AgentCore Observability

[AgentCore Observability](https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/observability.html) provides monitoring and tracing capabilities for AI agents using Amazon OpenTelemetry Python Instrumentation and Amazon CloudWatch GenAI Observability.

#### ðŸ“– How to View Observability Data

The default AgentCore Runtime configuration automatically logs your agent's traces to CloudWatch through **AgentCore Observability**. These traces provide complete visibility into your agent's behavior in production.

**To Access the Dashboard:**
1. Open the AWS Console
2. Navigate to **CloudWatch**
3. Select **GenAI Observability**
4. Choose **Bedrock AgentCore**

#### Agents Overview

The Agents view shows all your deployed agents with key metrics:
- **Agent Name**: Identifier for your agent
- **Request Count**: Total number of invocations
- **Success Rate**: Percentage of successful requests
- **Average Latency**: Mean response time
- **Error Rate**: Percentage of failed requests

![Agents Overview on CloudWatch](images/observability_agents.png)

**What to Look For:**
- Declining success rates (investigate errors)
- Increasing latencies (performance degradation)
- Unusual traffic patterns (potential issues)

#### Sessions View

The Sessions view lists all conversations across all agents:
- **Session ID**: Unique identifier for each conversation
- **User ID**: Who initiated the session
- **Duration**: How long the conversation lasted
- **Message Count**: Number of exchanges
- **Status**: Completed, active, or errored

![Sessions](images/sessions_lab5_observability.png)

**Use Cases:**
- Find specific user conversations
- Identify long-running sessions
- Analyze conversation patterns
- Track user engagement

#### Traces View

Trace view provides detailed, request-level insights:

**Available Actions:**
- **Filter traces**: Search by time, status, session, or user
- **Sort by column**: Organize by latency, timestamp, or status
- **Logs Insights**: Deep-dive into log data with queries
- **Export traces**: Download data for external analysis

![Traces](images/traces_lab4_observability.png)

**Key Information in Traces:**
- **Timeline**: Visual representation of request flow
- **Spans**: Individual operations (auth, tool calls, memory access)
- **Durations**: Time spent in each operation
- **Metadata**: Request/response payloads, headers, errors

**Practical Examples:**

**Example 1: Debugging Slow Requests**
1. Filter traces by latency > 5 seconds
2. Click on a slow trace
3. Examine the span timeline
4. Identify the bottleneck (e.g., slow tool execution)
5. Optimize that specific component

**Example 2: Investigating Errors**
1. Filter traces by error status
2. Look for common patterns
3. Check error messages and stack traces
4. Correlate with deployments or config changes
5. Implement fix and verify in new traces

**Example 3: Analyzing Tool Usage**
1. Filter traces containing specific tool names
2. Calculate average tool execution time
3. Identify most frequently used tools
4. Optimize popular tools for better performance

#### Setting Up Alerts (Recommended)

Configure CloudWatch alarms to get notified of issues:

**High Error Rate Alert:**
- Metric: Error percentage > 5%
- Action: Send SNS notification to team

**High Latency Alert:**
- Metric: P95 latency > 10 seconds
- Action: Trigger auto-scaling or investigation

**Low Traffic Alert:**
- Metric: Request count drops by 50%
- Action: Check for service issues

---

**ðŸ’¡ Pro Tip:** Regularly review your traces after deployments to catch performance regressions early. Set up a dashboard with your most important metrics for at-a-glance monitoring.

---
## ðŸ“š Summary: What You've Learned

### Key Concepts Covered

**1. Production Deployment**
- Transformed local code into production-ready service
- Understood serverless architecture benefits
- Learned about auto-scaling and reliability

**2. Containerization**
- Packaged agent code into Docker containers
- Pushed images to Amazon ECR
- Understood container benefits (consistency, portability)

**3. Authentication & Security**
- Implemented JWT-based authentication
- Integrated with Amazon Cognito
- Secured agent access with bearer tokens

**4. Session Management**
- Created unique session IDs for conversations
- Maintained context within sessions
- Ensured isolation between different users

**5. Observability**
- Enabled comprehensive monitoring with CloudWatch
- Learned to read and interpret traces
- Understood metrics for performance optimization

### Technical Skills Gained

**Code Transformation:**
- Added only 4 lines to make agent runtime-ready
- Learned entrypoint pattern
- Understood payload and context handling

**AWS Services:**
- AgentCore Runtime for serverless agent hosting
- ECR for container image storage
- CloudWatch for monitoring and logging
- Cognito for user authentication

**Best Practices:**
- Minimal code changes for maximum impact
- End-to-end authentication flow
- Proper session isolation
- Comprehensive error handling

### Architecture Evolution

**Lab 1**: Local agent prototype
**Lab 2**: Added memory for context
**Lab 3**: Centralized tools via Gateway
**Lab 4**: Full production deployment with observability

Your agent now has:
- âœ… Serverless auto-scaling
- âœ… Enterprise security
- âœ… Session management
- âœ… Memory persistence
- âœ… Centralized tools
- âœ… Comprehensive monitoring

---

### Congratulations! ðŸŽ‰

You have successfully completed **Lab 4: Deploy to Production - Use AgentCore Runtime with Observability!**

Here is what you accomplished:

##### Production-Ready Deployment:

- âœ… Prepared your agent for production with minimal code changes (only 4 lines added)
- âœ… Validated proper session isolation between different customers
- âœ… Confirmed session continuity + memory persistence and context awareness per session
- âœ… Containerized your agent using Docker
- âœ… Deployed to AgentCore Runtime with automatic scaling

##### Enterprise-Grade Security & Identity:

- âœ… Implemented secure authentication using Cognito integration with JWT tokens
- âœ… Configured proper IAM roles and execution permissions for production workloads
- âœ… Established identity-based access control for secure agent invocation
- âœ… Tested end-to-end authentication flow

##### Comprehensive Observability:

- âœ… Enabled AgentCore Observability for full request tracing across all customer sessions
- âœ… Configured CloudWatch GenAI Observability dashboard monitoring
- âœ… Learned to interpret traces, metrics, and logs
- âœ… Understood performance optimization techniques

##### Current Limitations (We'll fix these next!):

- **Developer Focused Interaction** - Agent accessible via SDK/API calls but no user-friendly web interface
- **Manual Session Management** - Requires programmatic session creation rather than intuitive user experience

##### Next Up: [Lab 5: Build User Interface â†’](lab-05-frontend.ipynb)

In Lab 5, you'll complete the customer experience by building a user-friendly web interface that:
- Provides an intuitive chat interface for end-users
- Handles authentication and session management automatically
- Displays real-time agent responses
- Creates a professional, production-ready application

**Let's go build that interface! ðŸš€**
