## Lab 4: Deploy to Production - Use AgentCore Runtime with Observability and a Frontend Application

### Overview

In Lab 3 we designed robust architectures with Retrieval Augemented Generation (RAG), Agentic AI using Strands and security with Guardrails. In this Lab, we will see how to deploy an Agent to production using AgentCore Runtime with secure authentication, memory and comprehensive observability. This will transform our prototype into a production-ready system that can handle real-world traffic with full monitoring and automatic scaling.

[Amazon Bedrock AgentCore Runtime](https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/agents-tools-runtime.html) is a secure, fully managed runtime that empowers organizations to deploy and scale AI agents in production, regardless of framework, protocol, or model choice. It provides enterprise-grade reliability, automatic scaling, and comprehensive monitoring capabilities.

**Workshop Journey:**

- **Lab 1 (Done):** Getting Started with Bedrock APIs - Learned basic API calls, streaming responses, and conversational interactions using Claude Haiku 4.5
- **Lab 2 (Done):** Prompt Engineering - Mastered system prompts, structured output, zero-shot/few-shot prompting, and chain-of-thought reasoning
- **Lab 3 (Done):** Enterprise Bedrock Architectures with Claude Models - Built RAG with Knowledge Base, implemented Agentic AI using Strands framework, and configured security with Guardrails
- **Lab 4 (Current):** Production Deployment with Monitoring & Frontend - Deploy agents to production using AgentCore Runtime with comprehensive observability, secure authentication with Cognito, and create a customer-facing Streamlit frontend application
- **Lab 5:** Cost Optimization - Implement dynamic model selection with prompt routing, optimize prompts and outputs, leverage caching strategies, and use batch processing for non-real-time scenarios

### Why AgentCore Runtime & Production Deployment Matter

Current State: Agent runs locally with centralized tools but faces production challenges:

- No comprehensive monitoring or debugging capabilities
- The agents cannot scale in or out automatically as traffic increases or decreases.

After this lab, we will have a production-ready agent infrastructure with:

- Serverless auto-scaling to handle variable demand
- Comprehensive observability with traces, metrics, and logging
- Enterprise reliability with automatic error recovery
- Secure deployment with proper access controls
- Easy management through AWS console and APIs and support for real-world production workloads.


### Adding comprehensive observability with AgentCore Observability

Additionally, AgentCore Runtime integrates seamlessly with [AgentCore Observability](https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/observability.html) to provide full visibility into your agent's behavior in production. AgentCore Observability automatically captures traces, metrics, and logs from your agent interactions, tool usage, and memory access patterns. In this lab we will see how AgentCore Runtime integrates with CloudWatch GenAI Observability to provide comprehensive monitoring and debugging capabilities.

For request tracing, AgentCore Observability captures the complete conversation flow including tool invocations, memory retrievals, and model interactions. For performance monitoring, it tracks response times, success rates, and resource utilization to help optimize your agent's performance.

During the observability flow, AgentCore Runtime automatically instruments your agent code and sends telemetry data to CloudWatch. You can then use CloudWatch dashboards and GenAI Observability features to analyze patterns, identify bottlenecks, and troubleshoot issues in real-time.

### Architecture for Lab 4
<div style="text-align:left"> 
    <img src="images/lab4_architecture_runtime.png" width="75%"/> 
</div>

*Agent now runs in AgentCore Runtime with full observability through CloudWatch, serving production traffic with auto-scaling and comprehensive monitoring.AgentCore Memory enables the AI agent to maintain context over time, remember important facts, and deliver consistent, personalized experiences.*

### Key Features

- **Serverless Agent Deployment:** Transform your local agent into a scalable production service using AgentCore Runtime with minimal code changes
- **Comprehensive Observability:** Full request tracing, performance metrics, and debugging capabilities through CloudWatch GenAI Observability

### Prerequisites

- Python 3.12+
- AWS account with appropriate permissions
- Docker, Finch or Podman installed and running
- Amazon Bedrock AgentCore SDK
- Strands Agents framework

### Step 1: Install Dependencies and Import Required Libraries

In [None]:
# Install required packages
%pip install -U -r requirements.txt -q

In [None]:
# Import required libraries
import os
import json
import boto3
from strands import Agent
from strands.models import BedrockModel
from utils import get_param_value

session = boto3.Session()

sts = session.client('sts')
identity = sts.get_caller_identity()
account_id = identity['Account']
region = boto3.Session().region_name or 'us-west-2'

print(f"Account ID: {account_id}")
print(f"Region: {region}")

#### Step 1.5 Prepare Memory and Knowledgebase

**Note:** The following cell might take a few minutes to run. Please be patient.  

While this is running, click on the **Chat icon on the left bar** ‚Äî this is **Amazon Q**, where you can ask questions about any AWS service, request code snippets, and more.  

While we wait for the cell to finish, you can ask questions like:  
- *What is Bedrock AgentCore Memory?*  
- *Why is it needed?*  
- *What are the various types of Memory?*

In [None]:
from bedrock_agentcore_starter_toolkit.operations.memory.manager import MemoryManager
from bedrock_agentcore.memory import MemoryClient
from bedrock_agentcore.memory.constants import StrategyType

MEMORY_NAME="ReturnRefundAssisantMemory"

memory_manager = MemoryManager(region_name=region)

# Create or get memory with strategies
memory = memory_manager.get_or_create_memory(
    name=MEMORY_NAME,
    description="Memory for returns and refunds assistant",
    strategies=[
                {
                    StrategyType.USER_PREFERENCE.value: {
                        "name": "CustomerPreferences",
                        "description": "Captures customer preferences and behavior",
                        "namespaces": ["returns/customer/{actorId}/preferences"],
                    }
                },
                {
                    StrategyType.SEMANTIC.value: {
                        "name": "CustomerSupportSemantic",
                        "description": "Stores facts from conversations",
                        "namespaces": ["returns/customer/{actorId}/semantic"],
                    }
                },
            ]
)

memory_id = memory["id"]

In [None]:
# Retrieve the knowledge base ID from AWS Systems Manager Parameter Store

kb_id = get_param_value(f"/app/workshop/kb/knowledge-base-id")
print(f"Knowledge Base ID: {kb_id}")

### Step 2: Preparing Your Agent for AgentCore Runtime

#### Creating the Runtime-Ready Agent

Let's first define the necessary AgentCore Runtime components via Python SDK within our previous local agent implementation.

Observe the `#### AGENTCORE RUNTIME - LINE i ####` comments below to see where is the relevant deployment code added. You'll find 4 such lines that prepare the runtime-ready agent:

1. Import the Runtime App with `from bedrock_agentcore.runtime import BedrockAgentCoreApp`
2. Initialize the App with `app = BedrockAgentCoreApp()`
3. Decorate our invocation function with `@app.entrypoint`
4. Let AgentCore Runtime control the execution with `app.run()`

#### The magic command writefile writes the code in this cell to a py file. This is our agent code that we will push to AgentCore Runtime.

In [None]:
%%writefile ./lab4_runtime.py
import os
from bedrock_agentcore.runtime import (
    BedrockAgentCoreApp,
)  #### AGENTCORE RUNTIME - LINE 1 ####

from strands import Agent, tool
from strands.models import BedrockModel
from strands_tools import retrieve, current_time
from bedrock_agentcore.memory.integrations.strands.config import AgentCoreMemoryConfig, RetrievalConfig
from bedrock_agentcore.memory.integrations.strands.session_manager import AgentCoreMemorySessionManager
from utils import get_param_value
from utils.agent_memory import REGION, SESSION_ID, ACTOR_ID


MODEL_ID = "us.anthropic.claude-haiku-4-5-20251001-v1:0"

bedrock_model = BedrockModel(
    model_id=MODEL_ID,
    temperature=0.3,

)

# Initialize the AgentCore Runtime App
app = BedrockAgentCoreApp()  #### AGENTCORE RUNTIME - LINE 2 ####

kb_id = os.environ.get("KNOWLEDGE_BASE_ID", "NOT AVAILABLE")

system_prompt=f"""
You are an Amazon Returns & Refunds assistant.

Do not answer based on your own knowledge.  
You have access to relevant policy from retrieve from a knowledge base. (KNOWLEDGE_BASE_ID="{kb_id}")
You have access to memory regarding customers purchase history and preferences.
Retrieve and check the relevant policies firstly for return * refund requests.
Use metadata filtering to get policies for proper country where customer had transactions.
[Metadata]
{{
    "country": <ISO2 Country Code> // e.g. "US", "UK", "IN"
}}

When a user asks about returns or refunds, use the content to give accurate advice.
Always be accurate, concise, and do not deviate from known policies.
"""
# Add Memory
# Get memory ID from environment variable



memory_id = os.environ.get("MEMORY_ID")
if not memory_id:
    raise Exception("Environment variable MEMORY_ID is required")


@app.entrypoint  #### AGENTCORE RUNTIME - LINE 3 ####
def invoke(payload, context=None):
    """AgentCore Runtime entrypoint function"""
    
    # session_id tracks the conversation context for the model, and actor_id identifies the user or agent 
    # interacting with the AgentCore runtime.

    session_id = context.session_id if context else SESSION_ID
    actor_id = payload.get("actor_id", ACTOR_ID) 

    # Configure memory
    agentcore_memory_config = AgentCoreMemoryConfig(
        memory_id=memory_id,
        session_id=session_id,
        actor_id=actor_id,
        retrieval_config={
            f"returns/customer/{actor_id}/semantic": RetrievalConfig(top_k=3, relevance_score=0.2),
            f"returns/customer/{actor_id}/preferences": RetrievalConfig(top_k=3, relevance_score=0.2)
        }
    )

    # Create session manager
    session_manager = AgentCoreMemorySessionManager(
        agentcore_memory_config=agentcore_memory_config,
        region_name=REGION
    )

    # Create the agent with all customer support tools and memory
    agent = Agent(
        model=bedrock_model,
        tools=[retrieve, current_time],
        system_prompt=system_prompt,
        session_manager = session_manager
    )

    user_input = payload.get("prompt", "")

    # Invoke the agent
    response = agent(user_input)
    return response.message["content"][0]["text"]


if __name__ == "__main__":
    app.run()  #### AGENTCORE RUNTIME - LINE 4 ####


#### What happens behind the scenes?

When you use `BedrockAgentCoreApp`, it automatically:

- Creates an HTTP server that listens on port 8080
- Implements the required `/invocations` endpoint for processing requests
- Implements the `/ping` endpoint for health checks
- Handles proper content types and response formats
- Manages error handling according to AWS standards


### Step 3: Deploying to AgentCore Runtime

Now let's deploy our agent to AgentCore Runtime using the [AgentCore Starter Toolkit](https://github.com/aws/bedrock-agentcore-starter-toolkit).

#### Configure the Secure Runtime Deployment (AgentCore Runtime + AgentCore Identity)

First we will use our starter toolkit to configure the AgentCore Runtime deployment with an entrypoint, the execution role we will create and a requirements file. We will also configure the identity authorization using an Amazon Cognito user pool and we will configure the starter kit to auto create the Amazon ECR repository on launch.

During the configure step, your docker file will be generated based on your application code

<div style="text-align:left"> 
    <img src="images/lab4_configure.png" width="75%"/> 
</div>

**Note**: The Cognito access_token is valid for 2 hours only. If the access_token expires you can vend another access_token by using the `reauthenticate_user` method.


In [None]:
# For those unfamiliar with Cognito: here we are creating or configuring a Cognito User Pool 
# so that users can authenticate securely and we can obtain a bearer token for API access.

from utils.identity_ssm_utils import setup_cognito_user_pool, reauthenticate_user

print("Setting up Amazon Cognito user pool...")
cognito_config = (
    setup_cognito_user_pool()
)  # You'll get your bearer token from this output cell.
print("Cognito setup completed ‚úì")

In [None]:
from bedrock_agentcore_starter_toolkit import Runtime

# Initialize the AgentCore runtime toolkit to interact with AgentCore services.
boto_session = boto3.session.Session()
region = boto_session.region_name

# Initialize an instance of the AgentCore Runtime, which allows deploying, configuring, and managing agents.
agentcore_runtime = Runtime()

# Create a new IAM role specifically for the AgentCore Runtime to use when executing the agent.
# This ensures the runtime has the necessary permissions for tasks like accessing S3, Bedrock models, and other AWS services.
from utils.identity_ssm_utils import create_agentcore_runtime_execution_role
agentcore_runtime_execution_role = create_agentcore_runtime_execution_role()

# Configure the AgentCore agent deployment.
# This sets up the agent's entrypoint script, runtime environment, execution role, memory mode, dependencies, 
# agent name, and authentication settings (here using Cognito JWT authorizer for secure client access).
response = agentcore_runtime.configure(
    entrypoint="lab4_runtime.py",
    auto_create_ecr=True,
    execution_role=agentcore_runtime_execution_role,
    auto_create_execution_role=False,
    memory_mode="NO_MEMORY",  # Memory was already created in a previous step, so we disable it here.
    requirements_file="requirements.txt",
    region=region,
    agent_name="returns_refunds_agent",
    authorizer_configuration={
        "customJWTAuthorizer": {
            "allowedClients": [cognito_config.get("client_id")],
            "discoveryUrl": cognito_config.get("discovery_url"),
        }
    },
)

print("Configuration completed:", response)
print(f"AllowedClients: {cognito_config.get('client_id')}")
print(f"discoveryUrl: {cognito_config.get('discovery_url')}")

# Summary of what we've done so far:
# 1. Initialized the AgentCore runtime toolkit.
# 2. Created an execution role for AgentCore.
# 3. Configured the agent deployment, linking it to an entrypoint script, runtime, and Cognito authorizer.
# 4. Memory is disabled because we already created a memory instance in a previous step; it does not need to be re-created.
# Next, we will deploy and start the agent so it can handle user queries securely and with the configured runtime environment.

In [None]:
!cat .bedrock_agentcore.yaml

#### Launch the Agent

Now let's launch our agent to AgentCore Runtime. This will create an AWS CodeBuild pipeline, the Amazon ECR repository and the AgentCore Runtime components.

<div style="text-align:left"> 
    <img src="images/lab4_launch.png" width="70%"/> 
</div>

In [None]:
# Launch the agent (this will build and deploy the container)
# This step can take a few minutes, depending on your environment.
# Tip: While waiting, press CTRL + A, right-click, and select **Generative AI ‚Üí Explain Code** 
# to explore or experiment with the code in the notebook interactively.

from utils.identity_ssm_utils import get_ssm_parameter, put_ssm_parameter

# Retrieve the knowledge base ID from SSM Parameter Store
kb_id = get_param_value("/app/workshop/kb/knowledge-base-id")

# Launch the AgentCore agent, passing in environment variables for memory and knowledge base
launch_result = agentcore_runtime.launch(
    env_vars={
        "MEMORY_ID": memory_id,
        "KNOWLEDGE_BASE_ID": kb_id
    }
)
print("Launch completed:", launch_result.agent_arn)

# Store the agent ARN in SSM Parameter Store for easy retrieval later
agent_arn = put_ssm_parameter(
    "/app/returnsrefunds/agentcore/runtime_arn",
    launch_result.agent_arn
)

# Important note: 
# If you accidentally run this cell twice, the second run may attempt to redeploy the same agent.
# Depending on your setup, it could overwrite the previous deployment or fail due to naming conflicts.
# It's safest to run it only once to avoid unintended redeployments.

#### Check Deployment Status

Let's wait for the deployment to complete:


In [None]:
import time

# Wait for the agent to be ready
status_response = agentcore_runtime.status()
status = status_response.endpoint["status"]

end_status = ["READY", "CREATE_FAILED", "DELETE_FAILED", "UPDATE_FAILED"]
while status not in end_status:
    print(f"Waiting for deployment... Current status: {status}")
    
    delay = 5 * 2
    # nosemgrep: arbitrary-sleep
    time.sleep(delay) 
    status_response = agentcore_runtime.status()
    status = status_response.endpoint["status"]

print(f"Final status: {status}")

### Step 4: Invoking Your Deployed Agent

Now that our agent is deployed and ready, let's test it with some queries. We invoke the agent with the right authorization token type. In out case it'll be Cognito access token. Copy the access token from the cell above

<div style="text-align:left"> 
    <img src="images/lab4_invoke.png" width="70%"/> 
</div>

#### Using the AgentCore Starter Toolkit

We can validate that the agent works using the AgentCore Starter Toolkit for invocation. The starter toolkit can automatically create a session id for us to query our agent. Alternatively, you can also pass the session id as a parameter during invocation. For demonstration purpose, we will create our own session id.

In [None]:
import uuid


# Test different customer support scenarios
user_query = "I bought a Kindle Book three days ago by accident in India. I want to get a refund, what date is the ETA if I request it now?"

bearer_token = reauthenticate_user(
    cognito_config.get("client_id"),
    cognito_config.get("client_secret")
)

response = agentcore_runtime.invoke(
    {"prompt": user_query},
    bearer_token=bearer_token
)
response

### Step 4.5: Demonstrating AgentCore Memory Capabilities

Now that our agent is deployed and running in production, let's explore how **AgentCore Memory** enables personalized, context-aware interactions across sessions.

AgentCore Memory operates on two levels:
- **Short-Term Memory (STM)**: Immediate conversation context within the current session
- **Long-Term Memory (LTM)**: Persistent information extracted across multiple conversations, including customer preferences and facts

In this section, we'll:
1. Seed some customer interaction history
2. Wait for Long-Term Memory processing to complete
3. Test how the agent uses memory to provide personalized responses

#### Seeding Customer Interaction History

Let's seed some previous customer interactions to demonstrate how memory works. In production, this happens naturally through customer conversations, but for this demo we'll create some historical context.

In [None]:
import time
from bedrock_agentcore.memory import MemoryClient
from utils import get_param_value

# Initialize memory client
memory_client = MemoryClient(region_name=region)

print(f"Memory ID: {memory_id}")

# Define a customer ID for this demo
ACTOR_ID = "customer_001"

# Seed previous customer interactions
previous_interactions = [
    ("I bought a Fire TV Stick last week in the US but it's not working properly. Can I return it?", "USER"),
    ("Yes, you can return your Fire TV Stick. According to the US return policy, most items can be returned within 30 days of receipt. Since you purchased it last week, you're well within the return window.", "ASSISTANT"),
    ("I also purchased a Kindle Paperwhite 2 months ago in the UK. Can I still return that?", "USER"),
    ("For the Kindle Paperwhite purchased in the UK 2 months ago, you're outside the standard 30-day return window. However, Kindle devices may have different warranty coverage. Let me check the specific UK policy for you.", "ASSISTANT"),
    ("I prefer shopping for electronics and usually buy Amazon devices. I'm in the US.", "USER"),
    ("Thank you for sharing that! I've noted your preference for Amazon electronics devices and that you're based in the US. This will help me provide more relevant assistance in the future.", "ASSISTANT"),
]

print("üìù Seeding customer interaction history...")
try:
    memory_client.create_event(
        memory_id=memory_id,
        actor_id=ACTOR_ID,
        session_id="previous_session",
        messages=previous_interactions
    )
    print("‚úÖ Customer history seeded successfully!")
    print("‚è≥ Long-Term Memory processing will begin automatically...")
    print("   This typically takes 20-30 seconds as the system:")
    print("   ‚Ä¢ Analyzes conversation patterns")
    print("   ‚Ä¢ Extracts customer preferences")
    print("   ‚Ä¢ Creates semantic embeddings for facts")
except Exception as e:
    print(f"‚ö†Ô∏è Error seeding history: {e}")

#### Waiting for Long-Term Memory Processing

After creating events, AgentCore Memory processes the data asynchronously:
1. **Immediate**: Messages stored in Short-Term Memory (STM)
2. **Asynchronous**: STM processed into Long-Term Memory (LTM) strategies

Let's wait for the processing to complete and verify the memories were extracted:

In [None]:
# Wait for Long-Term Memory processing
print("üîç Checking for processed Long-Term Memories...")
retries = 0
max_retries = 6  # 1 minute wait
memories_found = False

while retries < max_retries:
    try:
        # Check for preference memories
        preference_memories = memory_client.retrieve_memories(
            memory_id=memory_id,
            namespace=f"returns/customer/{ACTOR_ID}/preferences",
            query="customer preferences and location"
        )
        
        if preference_memories:
            print(f"‚úÖ Found {len(preference_memories)} preference memories after {retries * 10} seconds!")
            memories_found = True
            break
    except Exception as e:
        print(f"‚ö†Ô∏è Error retrieving memories: {e}")
    
    retries += 1
    if retries < max_retries:
        print(f"‚è≥ Still processing... waiting 10 more seconds (attempt {retries}/{max_retries})")
        time.sleep(10)

if not memories_found:
    print("‚ö†Ô∏è Memory processing is taking longer than expected. This can happen with system load.")
    print("   You can proceed with the lab - memories will be available shortly.")
else:
    print("\nüéØ AgentCore Memory extracted these customer preferences:")
    print("=" * 80)
    for i, memory in enumerate(preference_memories, 1):
        if isinstance(memory, dict):
            content = memory.get('content', {})
            if isinstance(content, dict):
                text = content.get('text', '')
                if text:
                    print(f"  {i}. {text}")
    
    # Also check semantic memories
    try:
        semantic_memories = memory_client.retrieve_memories(
            memory_id=memory_id,
            namespace=f"returns/customer/{ACTOR_ID}/semantic",
            query="previous return requests and purchases"
        )
        
        if semantic_memories:
            print("\nüß† AgentCore Memory identified these factual details:")
            print("=" * 80)
            for i, memory in enumerate(semantic_memories, 1):
                if isinstance(memory, dict):
                    content = memory.get('content', {})
                    if isinstance(content, dict):
                        text = content.get('text', '')
                        if text:
                            print(f"  {i}. {text}")
    except Exception as e:
        print(f"‚ö†Ô∏è Error retrieving semantic memories: {e}")

#### Testing Memory-Aware Responses

Now let's test how the agent uses memory to provide personalized responses. We'll ask questions that should trigger memory retrieval and demonstrate context awareness.

**Note**: The agent's memory hooks automatically:
1. Retrieve relevant customer context before processing queries
2. Inject that context into the conversation
3. Save new interactions for future use

In [None]:
# Create a new session for this customer
import uuid
demo_session_id = str(uuid.uuid4())

print("üß™ Testing memory-aware agent responses...\n")
print("=" * 80)

# Test 1: Ask about previous purchases (should recall Fire TV Stick and Kindle)
print("\nüì± Test 1: Asking about previous purchases...\n")
test_query_1 = "What products have I purchased recently?"

bearer_token = reauthenticate_user(
    cognito_config.get("client_id"), 
    cognito_config.get("client_secret")
)

response_1 = agentcore_runtime.invoke(
    {"prompt": test_query_1}, 
    bearer_token=bearer_token,
    session_id=demo_session_id
)
print(f"Agent Response: {response_1}\n")

# Test 2: Ask for recommendations (should use preference for Amazon devices)
print("=" * 80)
print("\nüéØ Test 2: Asking for product recommendations...\n")
test_query_2 = "I'm thinking about buying a new tablet. What would you recommend?"

response_2 = agentcore_runtime.invoke(
    {"prompt": test_query_2}, 
    bearer_token=bearer_token,
    session_id=demo_session_id
)
print(f"Agent Response: {response_2}\n")

# Test 3: Follow-up question (should maintain session context)
print("=" * 80)
print("\nüí¨ Test 3: Follow-up question using session context...\n")
test_query_3 = "What's the return policy for that?"

response_3 = agentcore_runtime.invoke(
    {"prompt": test_query_3}, 
    bearer_token=bearer_token,
    session_id=demo_session_id
)
print(f"Agent Response: {response_3}\n")

print("=" * 80)
print("\n‚úÖ Memory demonstration complete!")
print("\nNotice how the agent:")
print("  ‚Ä¢ Recalled previous purchases (Fire TV Stick, Kindle Paperwhite)")
print("  ‚Ä¢ Used customer preferences (Amazon devices, US location)")
print("  ‚Ä¢ Maintained context within the session for follow-up questions")
print("\nThis is the power of AgentCore Memory in production! üöÄ")

### Step 5: AgentCore Observability

[AgentCore Observability](https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/observability.html) provides monitoring and tracing capabilities for AI agents using Amazon OpenTelemetry Python Instrumentation and Amazon CloudWatch GenAI Observability.

#### Agents

Default AgentCore Runtime configuration allows for logging our agent's traces on CloudWatch by means of **AgentCore Observability**. These traces can be seen on the AWS CloudWatch GenAI Observability dashboard. Navigate to Cloudwatch &rarr; GenAI Observability &rarr; Bedrock AgentCore.

![Agents Overview on CloudWatch](images/lab4_observability_agents.png)

#### Sessions

The Sessions view shows the list of all the sessions associated with all agents in your account.

![sessions](images/lab4_sessions_observability.png)

#### Traces

Trace view lists all traces from your agents in this account. To work with traces:

- Choose Filter traces to search for specific traces.
- Sort by column name to organize results.
- Under Actions, select Logs Insights to refine your search by querying across your log and span data or select Export selected traces to export.

![traces](images/lab4_traces_observability.png)


### Step 6: Building a Customer-Facing Frontend Application

You can now invoke your agent runtime from any application. On real world applications, customers expect an user interface to be available. Now it's time to create a user-friendly frontend that customers can actually use to interact with our agent.

We'll now create a **Streamlit-based web application** that provides customers with an intuitive chat interface to interact with our deployed Customer Support Agent. The frontend will include:

- **Secure Authentication** - User login via Amazon Cognito
- **Real-time Chat Interface** - Streamlit-powered conversational UI
- **Streaming Responses** - Live response streaming for better user experience
- **Session Management** - Persistent conversations with memory
- **Response Timing** - Performance metrics for transparency


#### Install Frontend Dependencies

In [None]:
# Install frontend-specific dependencies
%pip install -r utils/lab4_frontend/requirements.txt -q
print("‚úÖ Frontend dependencies installed successfully!")

Understanding the Frontend Architecture

Our Streamlit application consists of several key components:

#### Core Components:

1. **main.py** - Main Streamlit application with UI and authentication
2. **chat.py** - Chat management and AgentCore Runtime integration
3. **chat_utils.py** - Utility functions for message formatting and display
4. **sagemaker_helper.py** - Helper for generating accessible URLs

#### Authentication Flow:

1. User accesses the Streamlit application
2. Amazon Cognito handles user authentication
3. Valid JWT tokens are used to authorize AgentCore Runtime requests
4. User can interact with the Customer Support Agent securely

#### Launch the Returns and Refunds Agent Frontend üöÄ

Now let's start our Streamlit application. The application will:

1. **Generate an accessible URL** for the application
2. **Start the Streamlit server** on port 8501
3. **Connect to your deployed AgentCore Runtime** from Lab 4
4. **Provide a complete customer support interface**

**Important Notes:**
- The application will run continuously until you stop it (Ctrl+C)
- Make sure your AgentCore Runtime from Lab 4 is still deployed and running
- The Cognito authentication tokens are valid for 2 hours

In [None]:
# Get the accessible URL for the Streamlit application
from utils.lab4_frontend.sagemaker_helper import get_streamlit_url

streamlit_url = get_streamlit_url()
print(f'\nüöÄ Returns and Refunds Agent Streamlit Application URL:\n{streamlit_url}\n')

# Start the Streamlit application
!cd utils/lab4_frontend/ && streamlit run main.py

#### Testing Your Returns and Refunds Agent Application

Once your Streamlit application is running, you can test the complete customer support experience:

#### Authentication Testing:
1. **Access the application** using the Returns and Refunds Agent Streamlit Application URL provided above
2. **Sign in** with the test credentials provided in the output
3. **Verify** that you see the welcome message with your username

<div style="text-align:left">
    <img src="images/lab4_streamlit_login.png"/>
</div>
<div>
    <img src="images/lab4_welcome_user.png"/>
</div>    


#### Scenarios to Test:



Return Policy Questions:
I bought a Kindle Book three days ago by accident in India. I want to get a refund, what date is the ETA if I request it now?"


<div style="text-align:left">    
    <img src="images/lab4_agent_question.png" width="75%"/>
</div>

Memory and Personalization Testing

### Congratulations! üéâ

You have successfully completed **Lab 4: Deploy to Production - Use AgentCore Runtime with Observability and a Frontend Application!**

Here is what you accomplished:

##### Production-Ready Deployment:

- Prepared your agent for production with minimal code changes (only 4 lines added)
- Validated proper session isolation between different customers
- Confirmed session continuity + memory persistence and context awareness per session

##### Enterprise-Grade Security & Identity:

- Implemented secure authentication using Cognito integration with JWT tokens
- Configured proper IAM roles and execution permissions for production workloads
- Established identity-based access control for secure agent invocation

##### Comprehensive Observability:

- Enabled AgentCore Observability for full request tracing across all customer sessions
- Configured CloudWatch GenAI Observability dashboard monitoring

##### Built a customer facing front-end application

- **Web Interface** - Streamlit-based customer support application
- **Secure Authentication** - Amazon Cognito integration for user management
- **Real-time Streaming** - Live response streaming for better user experience
- **Session Management** - Persistent conversations with memory across interactions
- **Complete Integration** - Frontend connected to your AgentCore Runtime

### Next Steps

To further enhance your customer support solution, consider:

- **Custom Styling** - Brand the frontend with your company's design system
- **Additional Tools** - Integrate with your existing CRM, ticketing, or knowledge base systems
- **Multi-language Support** - Add internationalization for global customers
- **Advanced Analytics** - Implement custom dashboards for support team insights
- **Mobile Optimization** - Ensure the interface works well on mobile devices

### Cleanup

When you're ready to clean up the resources created in this workshop:

**Ready to clean up?** [Proceed to Lab 6: Cleanup ‚Üí](lab6_Cleanup.ipynb)

---