# Lab 02: Quick Wins

## Overview

In this notebook, we apply **low-effort, high-impact** optimizations to reduce costs and improve response quality. The key insight is that **prompt structure matters more than length**.

**What you'll learn:**
- How to structure system prompts for clarity and consistency
- How to use `max_tokens` to limit output length
- How to use `stop_sequences` to end responses early
- How to set appropriate `temperature` for accuracy

**Optimizations in this notebook:**
1. Well-structured prompt with sections and numbered instructions (~1024 tokens vs ~1500 unstructured)
2. `max_tokens=1024` limit (allows detailed responses)
3. Stop sequences `["###", "END_RESPONSE"]`
4. `temperature=0.1` for accurate, consistent customer support responses

## Prerequisites

- Completed Lab 01 (baseline agent deployed)
- Baseline metrics recorded

## Workshop Journey

```
01 Baseline ‚Üí [02 Quick Wins] ‚Üí 03 Caching ‚Üí 04 Routing ‚Üí 05 Guardrails ‚Üí 06 Gateway ‚Üí 07 Evaluations
                   ‚Üë
              You are here
```

## Step 1: Setup

In [22]:
import os
import json
import uuid
from pathlib import Path
from dotenv import load_dotenv

load_dotenv(override=True)

import boto3
from bedrock_agentcore_starter_toolkit import Runtime

region = os.environ.get("AWS_DEFAULT_REGION", "us-east-1")
control_client = boto3.client("bedrock-agentcore-control", region_name=region)
data_client = boto3.client("bedrock-agentcore", region_name=region)
agentcore_runtime = Runtime()

print(f"Region: {region}")
print(f"Langfuse Host: {os.environ.get('LANGFUSE_HOST', 'Not set')}")

Region: us-east-1
Langfuse Host: https://d2rhlwziq3nnbf.cloudfront.net


## Step 2: Review the Quick Wins Optimizations

Let's compare the baseline vs optimized configurations.

## Optimization 1: Well-Structured System Prompt

### Baseline Prompt (v1)

In [23]:
from agents.v1_baseline import SYSTEM_PROMPT as BASELINE_PROMPT

print(BASELINE_PROMPT)

You are a helpful, friendly, professional, knowledgeable, and empathetic customer support assistant working for an electronics e-commerce company called TechMart Electronics, which is a leading retailer of consumer electronics, computers, smartphones, tablets, audio equipment, smart home devices, gaming consoles, and various other technology products and accessories. Your primary responsibility is to assist customers with their inquiries, concerns, questions, and issues related to products, services, policies, and technical matters. You should always strive to provide the best possible customer experience by being patient, understanding, thorough, and comprehensive in your responses.

As a customer support representative for TechMart Electronics, your role encompasses a wide variety of responsibilities and duties that you must fulfill to the best of your abilities. First and foremost, you are expected to provide accurate, helpful, and detailed information to customers using the various

### Opportunities for Improvement

| Area | Observation |
|------|-------------|
| **Structure** | Dense paragraphs without visual hierarchy make it difficult for the model to quickly locate relevant instructions |
| **Hedging language** | Phrases like "try to", "as best you can", "hopefully", "if possible" introduce ambiguity about expected behavior |
| **Filler phrases** | "Please", "Can you please", "It would be great if" add tokens without providing actionable guidance |
| **Task definition** | The expected output format and response structure are not specified |
| **Redundancy** | Adjective chains like "helpful and friendly and professional and also knowledgeable and empathetic" could be condensed |

**Estimated token count: ~1500 tokens**

---

### Optimized Prompt (v2)

In [24]:
from agents.v2_quick_wins import SYSTEM_PROMPT as OPTIMIZED_PROMPT

print(OPTIMIZED_PROMPT)


# ROLE AND PERSONA

You are Alex, a senior customer support specialist at TechMart Electronics.

## Personality
- Professional yet friendly
- Patient and empathetic
- Solution-focused

---

# AVAILABLE TOOLS

1. **get_return_policy**: Return/refund policies by product category
2. **get_product_info**: Product specs, features, availability
3. **web_search**: Current information from the web
4. **get_technical_support**: Troubleshooting guides and technical docs

---

# RESPONSE FORMAT

Your response MUST be structured with these fields:
- **answer**: A clear, concise, and helpful response to the customer's question. Use bullet points for lists.
- **category**: Classify as one of: "product", "policy", "technical", or "general"
- **confidence**: Rate as "high" (verified with tools), "medium" (partial info), or "low" (uncertain)

---

# GUIDELINES

1. Always use tools to verify information before responding
2. Keep answers concise but complete
3. For technical issues, provide step-by-step

### What Makes This Prompt Effective

| Technique | Implementation |
|-----------|----------------|
| **Visual hierarchy** | Headers (`# ROLE`, `# TOOLS`) and horizontal rules (`---`) organize content into scannable sections |
| **Direct language** | "You are Alex" instead of "Please try to be..."; "MUST" instead of "It would be nice if..." |
| **Structured lists** | Tools numbered 1-4, guidelines as numbered steps, personality as bullet points |
| **Explicit output format** | Defines required fields: answer, category, confidence |
| **Clear boundaries** | "Do NOT discuss competitor products" ‚Äî unambiguous constraints |

**Key insight:** Structure often matters more than brevity. A well-organized prompt helps the model locate and follow instructions efficiently.

**Estimated token count: ~1024 tokens** (32% reduction with improved clarity)

---

### Further Reading

For more on prompt engineering best practices, see:

- [Anthropic: Prompt Engineering Overview](https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview)
- [AWS: Prompt Engineering Techniques with Claude on Amazon Bedrock](https://aws.amazon.com/blogs/machine-learning/prompt-engineering-techniques-and-best-practices-learn-by-doing-with-anthropics-claude-3-on-amazon-bedrock/)
- [OpenAI: Prompt Engineering Guide](https://platform.openai.com/docs/guides/prompt-engineering)
- [Anthropic: Use XML Tags to Structure Your Prompts](https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/use-xml-tags)

## Optimizations 2, 3, 4: Model Configuration

### Optimization 2: `max_tokens=1024`

Limits output length to prevent runaway responses. 1024 tokens is enough for detailed troubleshooting while keeping costs predictable.

### Optimization 3: `stop_sequences=["###", "END_RESPONSE"]`

Allows early termination when the response is complete. The model can stop generating before hitting `max_tokens` if it reaches a natural endpoint.

### Optimization 4: `temperature=0.1`

Low temperature for accuracy and consistency. Customer support needs factual, predictable responses‚Äînot creative variations.

| Temperature | Use Case |
|-------------|----------|
| 0.0 - 0.3 | Factual tasks, customer support, code generation |
| 0.4 - 0.7 | Balanced creativity and accuracy |
| 0.8 - 1.0 | Creative writing, brainstorming |

### Final Model Configuration

```python
model = BedrockModel(
    model_id="us.anthropic.claude-sonnet-4-5-20250929-v1:0",
    temperature=0.1,                          # Optimization 4
    max_tokens=1024,                          # Optimization 2
    stop_sequences=["###", "END_RESPONSE"],   # Optimization 3
)
```

In [25]:
# Review the v2 agent code
agent_file = Path("agents/v2_quick_wins.py")
print(agent_file.read_text())

"""
V2 Quick Wins Agent - Low-effort optimizations.
- Concise system prompt
- max_tokens limit
- stop_sequences
- Low temperature for accuracy
"""

import base64
import os
from bedrock_agentcore.runtime import BedrockAgentCoreApp
from dotenv import load_dotenv
from strands import Agent
from strands.models import BedrockModel
from strands.telemetry import StrandsTelemetry

import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from utils.tools import get_return_policy, get_product_info, web_search, get_technical_support

load_dotenv()

# Langfuse configuration
langfuse_public_key = os.environ.get("LANGFUSE_PUBLIC_KEY")
langfuse_secret_key = os.environ.get("LANGFUSE_SECRET_KEY")
langfuse_host = os.environ.get("LANGFUSE_HOST", "https://cloud.langfuse.com")
LANGFUSE_AUTH = base64.b64encode(f"{langfuse_public_key}:{langfuse_secret_key}".encode()).decode()

os.environ["LANGFUSE_PROJECT_NAME"] = "my-llm-project"
os.environ["DISABLE_ADOT_OBSERVABILITY"] = "t

## Step 3: Deploy the Quick Wins Agent

In [26]:
agent_name = "customer_support_v2_quick_wins"
agent_file = str(Path("agents/v2_quick_wins.py").absolute())
requirements_file = str(Path("requirements-for-agentcore.txt").absolute())

print(f"Agent name: {agent_name}")
print(f"Agent file: {agent_file}")
print(f"Requirements: {requirements_file}")

print(f"Configuring agent: {agent_name}")
agentcore_runtime.configure(
    entrypoint=agent_file,
    auto_create_execution_role=True,
    auto_create_ecr=True,
    requirements_file=requirements_file,
    region=region,
    agent_name=agent_name,
)

Entrypoint parsed: file=/Users/tracilim/Projects/aws-bedrock-prompt-optimization-workshop/03-developer-journey/agents/v2_quick_wins.py, bedrock_agentcore_name=v2_quick_wins


Memory disabled - agent will be stateless
Configuring BedrockAgentCore agent: customer_support_v2_quick_wins


Agent name: customer_support_v2_quick_wins
Agent file: /Users/tracilim/Projects/aws-bedrock-prompt-optimization-workshop/03-developer-journey/agents/v2_quick_wins.py
Requirements: /Users/tracilim/Projects/aws-bedrock-prompt-optimization-workshop/03-developer-journey/requirements-for-agentcore.txt
Configuring agent: customer_support_v2_quick_wins


Memory disabled
Network mode: PUBLIC
Generated Dockerfile: Dockerfile
Generated .dockerignore: /Users/tracilim/Projects/aws-bedrock-prompt-optimization-workshop/03-developer-journey/.dockerignore
Changing default agent from 'customer_support_v1_baseline' to 'customer_support_v2_quick_wins'
Bedrock AgentCore configured: /Users/tracilim/Projects/aws-bedrock-prompt-optimization-workshop/03-developer-journey/.bedrock_agentcore.yaml


ConfigureResult(config_path=PosixPath('/Users/tracilim/Projects/aws-bedrock-prompt-optimization-workshop/03-developer-journey/.bedrock_agentcore.yaml'), dockerfile_path=PosixPath('/Users/tracilim/Projects/aws-bedrock-prompt-optimization-workshop/03-developer-journey/Dockerfile'), dockerignore_path=PosixPath('/Users/tracilim/Projects/aws-bedrock-prompt-optimization-workshop/03-developer-journey/.dockerignore'), runtime='None', runtime_type=None, region='us-east-1', account_id='739907928487', execution_role=None, ecr_repository=None, auto_create_ecr=True, s3_path=None, auto_create_s3=False, memory_id=None, network_mode='PUBLIC', network_subnets=None, network_security_groups=None, network_vpc_id=None)

In [27]:
# Modify Dockerfile for Langfuse
dockerfile_path = Path("Dockerfile")
if dockerfile_path.exists():
    content = dockerfile_path.read_text()
    # Replace opentelemetry-instrument wrapper with direct python call
    # Keep the correct module path using regex
    if "opentelemetry-instrument" in content:
        import re
        content = re.sub(
            r'CMD \["opentelemetry-instrument", "python", "-m", "([^"]+)"\]',
            r'CMD ["python", "-m", "\1"]',
            content
        )
        dockerfile_path.write_text(content)
        print("Dockerfile modified for Langfuse")
    else:
        print("Dockerfile already configured or using different format")
else:
    print("Dockerfile not found - will be created during deployment")

Dockerfile modified for Langfuse


In [28]:
env_vars = {
    "LANGFUSE_HOST": os.environ.get("LANGFUSE_HOST"),
    "LANGFUSE_PUBLIC_KEY": os.environ.get("LANGFUSE_PUBLIC_KEY"),
    "LANGFUSE_SECRET_KEY": os.environ.get("LANGFUSE_SECRET_KEY"),
    "PYTHONUNBUFFERED": "1",
}

print("Deploying to AgentCore Runtime...")
launch_result = agentcore_runtime.launch(env_vars=env_vars, auto_update_on_conflict=True)
agent_arn = launch_result.agent_arn
print(f"Agent deployed: {agent_arn}")

üöÄ Launching Bedrock AgentCore (cloud mode - RECOMMENDED)...
   ‚Ä¢ Deploy Python code directly to runtime
   ‚Ä¢ No Docker required (DEFAULT behavior)
   ‚Ä¢ Production-ready deployment

üí° Deployment options:
   ‚Ä¢ runtime.launch()                ‚Üí Cloud (current)
   ‚Ä¢ runtime.launch(local=True)      ‚Üí Local development
Memory disabled - skipping memory creation
Starting CodeBuild ARM64 deployment for agent 'customer_support_v2_quick_wins' to account 739907928487 (us-east-1)
Setting up AWS resources (ECR repository, execution roles)...
Getting or creating ECR repository for agent: customer_support_v2_quick_wins


Deploying to AgentCore Runtime...


ECR repository available: 739907928487.dkr.ecr.us-east-1.amazonaws.com/bedrock-agentcore-customer_support_v2_quick_wins
Getting or creating execution role for agent: customer_support_v2_quick_wins
Using AWS region: us-east-1, account ID: 739907928487
Role name: AmazonBedrockAgentCoreSDKRuntime-us-east-1-449c4d3c64


‚úÖ Reusing existing ECR repository: 739907928487.dkr.ecr.us-east-1.amazonaws.com/bedrock-agentcore-customer_support_v2_quick_wins


‚úÖ Reusing existing execution role: arn:aws:iam::739907928487:role/AmazonBedrockAgentCoreSDKRuntime-us-east-1-449c4d3c64
Execution role available: arn:aws:iam::739907928487:role/AmazonBedrockAgentCoreSDKRuntime-us-east-1-449c4d3c64
Preparing CodeBuild project and uploading source...
Getting or creating CodeBuild execution role for agent: customer_support_v2_quick_wins
Role name: AmazonBedrockAgentCoreSDKCodeBuild-us-east-1-449c4d3c64
Reusing existing CodeBuild execution role: arn:aws:iam::739907928487:role/AmazonBedrockAgentCoreSDKCodeBuild-us-east-1-449c4d3c64
Using dockerignore.template with 46 patterns for zip filtering
Uploaded source to S3: customer_support_v2_quick_wins/source.zip
Updated CodeBuild project: bedrock-agentcore-customer_support_v2_quick_wins-builder
Starting CodeBuild build (this may take several minutes)...
Starting CodeBuild monitoring...
üîÑ QUEUED started (total: 0s)
‚úÖ QUEUED completed in 1.3s
üîÑ PROVISIONING started (total: 1s)
‚úÖ PROVISIONING completed 

Agent deployed: arn:aws:bedrock-agentcore:us-east-1:739907928487:runtime/customer_support_v2_quick_wins-WwfeJI2Ll9


## Step 4: Test the Optimized Agent

Run the same test scenarios as the baseline to compare metrics.

In [29]:
def invoke_agent(prompt):
    """Invoke the agent via AgentCore API."""
    response = data_client.invoke_agent_runtime(
        agentRuntimeArn=agent_arn,
        runtimeSessionId=str(uuid.uuid4()),
        payload=json.dumps({"prompt": prompt}).encode(),
    )
    return json.loads(response["response"].read().decode("utf-8"))

In [30]:
# Import Langfuse metrics helper
from utils.langfuse_metrics import (
    get_latest_trace_metrics,
    print_metrics,
    clear_metrics,
    collect_metric,
    print_metrics_table,
    get_collected_metrics
)

# Clear any previously collected metrics
clear_metrics()

# Standard test prompts - each demonstrates a specific tool usage pattern
TEST_PROMPTS = [
    # Single tool: get_return_policy
    ("Return Policy", "What is your return policy for laptops?"),

    # Single tool: get_product_info
    ("Product Info", "Tell me about your smartphone options"),

    # Single tool: get_technical_support (Bedrock KB)
    ("Technical Support", "My laptop won't turn on, can you help me troubleshoot?"),

    # Multi-tool: get_product_info + get_return_policy
    ("Multi-part Question", "I want to buy a laptop. What are the specs and what's the return policy?"),

    # No tool: General greeting
    ("General Question", "Hello! What can you help me with today?"),
]

# Run all tests and collect metrics
for test_name, prompt in TEST_PROMPTS:
    print("=" * 60)
    print(f"Test: {test_name}")
    print("=" * 60)

    response = invoke_agent(prompt)
    print(response)

    # Fetch and collect metrics
    metrics = get_latest_trace_metrics(
        agent_name="customer-support-v2-quick-wins",
        wait_seconds=5,
        max_retries=5,
        timeout_seconds=120,
    )
    print_metrics(metrics, test_name)
    collect_metric(metrics, test_name)

Test: Return Policy
**Answer:**

I'm happy to help you with our laptop return policy! Here are the key details:

**Return Window:**
‚Ä¢ 30 days from purchase date

**Condition Requirements:**
‚Ä¢ Must include original packaging
‚Ä¢ All accessories must be included
‚Ä¢ No physical damage to the laptop

**How to Return:**
‚Ä¢ Use our online RMA portal, or
‚Ä¢ Visit any TechMart store location

**Refund Timeline:**
‚Ä¢ 7-10 business days after we inspect the returned item

**Return Shipping:**
‚Ä¢ FREE for defective products
‚Ä¢ Customer pays shipping for change of mind returns

**Restocking Fee:**
‚Ä¢ No fee for defective items
‚Ä¢ 15% restocking fee for change of mind returns

**Warranty:**
‚Ä¢ All laptops include a 1-year manufacturer warranty
‚Ä¢ Extended warranty options available for purchase

If you need to start a return, I can guide you through our online RMA portal process. Do you have a specific laptop you're looking to return?

**Category:** policy

**Confidence:** high

     

In [36]:
# Print summary table
print_metrics_table()

# Compare with baseline metrics (from notebook 01)
# Update these values with your actual baseline results
BASELINE_TOTAL_INPUT_TOKENS = 19623  # From v1-baseline (5 tests)
BASELINE_TOTAL_OUTPUT_TOKENS = 2096  # From v1-baseline (5 tests)
BASELINE_TOTAL_COST = 0.0903  # From v1-baseline

# Calculate improvements
collected = get_collected_metrics()
if collected:
    valid_metrics = [m for m in collected if "error" not in m]
    if valid_metrics:
        total_input = sum(m.get('input_tokens', 0) for m in valid_metrics)
        total_output = sum(m.get('output_tokens', 0) for m in valid_metrics)
        total_cost = sum(m.get('cost_usd', 0) for m in valid_metrics)

        input_reduction = ((BASELINE_TOTAL_INPUT_TOKENS - total_input) / BASELINE_TOTAL_INPUT_TOKENS) * 100
        output_reduction = ((BASELINE_TOTAL_OUTPUT_TOKENS - total_output) / BASELINE_TOTAL_OUTPUT_TOKENS) * 100
        cost_reduction = ((BASELINE_TOTAL_COST - total_cost) / BASELINE_TOTAL_COST) * 100

        print("\n" + "=" * 60)
        print("           COMPARISON: BASELINE (v1) vs QUICK WINS (v2)")
        print("=" * 60)
        print(f"  Total Input Tokens:  {total_input:,} (Baseline: {BASELINE_TOTAL_INPUT_TOKENS:,})")
        print(f"  Input Reduction:     {input_reduction:+.1f}%")
        print(f"  Total Output Tokens: {total_output:,} (Baseline: {BASELINE_TOTAL_OUTPUT_TOKENS:,})")
        print(f"  Output Reduction:    {output_reduction:+.1f}%")
        print(f"  Total Cost:          ${total_cost:.4f} (Baseline: ${BASELINE_TOTAL_COST:.4f})")
        print(f"  Cost Reduction:      {cost_reduction:+.1f}%")
        print("=" * 60)


                                  METRICS SUMMARY
               Test Latency    Cost Input Output Cache Read Tokens Cache Write Tokens
      Return Policy   7.82s $0.0127 2,755    293                 0                  0
       Product Info   7.87s $0.0135 2,802    340                 0                  0
  Technical Support  10.37s $0.0146 2,811    412                 0                  0
Multi-part Question   8.67s $0.0167 3,099    491                 0                  0
   General Question   5.50s $0.0069 1,278    205                 0                  0
---------------------------------------------------------------------------------------------------------
  TOTALS: Latency(avg): 8.05s | Cost: $0.0643 | Input: 12,745 | Output: 1,741
          Cache Read Tokens: 0 | Cache Write Tokens: 0


           COMPARISON: BASELINE (v1) vs QUICK WINS (v2)
  Total Input Tokens:  12,745 (Baseline: 19,623)
  Input Reduction:     +35.1%
  Total Output Tokens: 1,741 (Baseline: 2,096)
  Output R

### Results Analysis

The comparison above shows the impact of simple, low-effort optimizations. By restructuring the system prompt and tuning model parameters, we achieved meaningful reductions in both token usage and cost‚Äîwithout changing the agent's functionality.

**Key takeaways:**

- **Structure matters more than brevity.** A well-organized prompt with headers, bullet points, and clear sections helps the model locate instructions efficiently. This reduces both input tokens (shorter prompt) and output tokens (more focused responses).

- **Explicit constraints improve consistency.** Defining the expected output format and boundaries ("Do NOT...") leads to more predictable behavior across requests.

- **Model parameters are free optimizations.** Setting `max_tokens`, `temperature`, and `stop_sequences` costs nothing to implement but can meaningfully reduce token usage and improve response quality.

---

**Next:** In Lab 03, we'll explore **prompt caching**‚Äîa technique that can dramatically reduce costs when the same system prompt is used across multiple requests.

---

## Cleanup

To delete the agent deployed in this notebook, uncomment and run the following code.

In [None]:
# # Delete the agent
# control_client.delete_agent_runtime(agentRuntimeId=agent_arn.split("/")[-1])
# print(f"Agent deleted: {agent_arn}")