# Lab 02: Quick Wins

## Overview

In this notebook, we apply **low-effort, high-impact** optimizations to reduce costs and improve response quality. The key insight is that **prompt structure matters more than length**.

**What you'll learn:**
- How to structure system prompts for clarity and consistency
- How to use `max_tokens` to limit output length
- How to use `stop_sequences` to end responses early
- How to set appropriate `temperature` for accuracy

**Optimizations in this notebook:**
1. Well-structured prompt with sections, guidelines, and few-shot examples (~1,030 tokens vs ~1,500 unstructured)
2. `max_tokens=1024` limit (allows detailed responses)
3. Stop sequences `["###", "END_RESPONSE"]`
4. `temperature=0.1` for accurate, consistent customer support responses

## Prerequisites

- Completed Lab 01 (baseline agent deployed)
- Baseline metrics recorded

## Workshop Journey

```
01 Baseline ‚Üí [02 Quick Wins] ‚Üí 03 Caching ‚Üí 04 Routing ‚Üí 05 Guardrails ‚Üí 06 Gateway ‚Üí 07 Evaluations
                   ‚Üë
              You are here
```

## Step 1: Setup

In [3]:
from __future__ import annotations

import json
import os
import uuid
from pathlib import Path

from dotenv import load_dotenv

load_dotenv(override=True)

import boto3
from bedrock_agentcore_starter_toolkit import Runtime

region = os.environ.get("AWS_DEFAULT_REGION", "us-east-1")
control_client = boto3.client("bedrock-agentcore-control", region_name=region)
data_client = boto3.client("bedrock-agentcore", region_name=region)
agentcore_runtime = Runtime()

print(f"Region: {region}")
print(f"Langfuse Host: {os.environ.get('LANGFUSE_BASE_URL', 'Not set')}")

Region: us-east-1
Langfuse Host: https://d2rhlwziq3nnbf.cloudfront.net


## Step 2: Review the Quick Wins Optimizations

Let's compare the baseline vs optimized configurations.

## Optimization 1: Well-Structured System Prompt

### Baseline Prompt (v1)

In [4]:
from agents.v1_baseline import SYSTEM_PROMPT as BASELINE_PROMPT

print(BASELINE_PROMPT)


You are a customer support assistant and your job is to try to help customers as best
you can with whatever they need. You work for TechMart Electronics which is a company
that sells electronics and technology products. Please try to be helpful and friendly
and professional and also knowledgeable and empathetic and patient and understanding
too if you can. TechMart Electronics is a retailer that sells things like consumer
electronics and computers and laptops and smartphones and mobile phones and tablets
and also audio equipment and headphones and speakers and smart home devices and gaming
consoles and gaming accessories and various other products and accessories that are
related to technology and electronics. Can you please help customers with their
questions and concerns and issues and problems and inquiries about products and
services and policies and returns and technical stuff and troubleshooting and other
things they might need help with or want to know about?

Please try your b

### Opportunities for Improvement

| Area | Observation |
|------|-------------|
| **Structure** | Dense paragraphs without visual hierarchy make it difficult for the model to quickly locate relevant instructions |
| **Hedging language** | Phrases like "try to", "as best you can", "hopefully", "if possible" introduce ambiguity about expected behavior |
| **Filler phrases** | "Please", "Can you please", "It would be great if" add tokens without providing actionable guidance |
| **Task definition** | The expected output format and response structure are not specified |
| **Redundancy** | Adjective chains like "helpful and friendly and professional and also knowledgeable and empathetic" could be condensed |

**Estimated token count: ~1,500 tokens**

---

### Optimized Prompt (v2)

In [None]:
from utils.agent_config import SYSTEM_PROMPT_TEXT as OPTIMIZED_PROMPT

print(OPTIMIZED_PROMPT)

### What Makes This Prompt Effective

| Technique | Implementation |
|-----------|----------------|
| **Visual hierarchy** | Headers (`# ROLE`, `# GUIDELINES`, `# EXAMPLES`) organize content into scannable sections |
| **Direct language** | "You are Alex" instead of "Please try to be..."; action-oriented guidelines |
| **Numbered guidelines** | Clear, prioritized instructions (1-6) that the model can reference |
| **Explicit output format** | Defines required fields: answer, category, confidence |
| **Few-shot examples** | 4 examples demonstrate tool usage, response structure, and expected behavior |

**Key insight:** Few-shot examples are one of the most effective prompting techniques. They show the model exactly what you want‚Äîtool calling patterns, output structure, and tone‚Äîwithout verbose explanations.

**Estimated token count: ~1,030 tokens** (32% reduction, meets 1,024-token minimum for prompt caching)

---

### Further Reading

For more on prompt engineering best practices, see:

- [Anthropic: Prompt Engineering Overview](https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview)
- [AWS: Prompt Engineering Techniques with Claude on Amazon Bedrock](https://aws.amazon.com/blogs/machine-learning/prompt-engineering-techniques-and-best-practices-learn-by-doing-with-anthropics-claude-3-on-amazon-bedrock/)
- [OpenAI: Prompt Engineering Guide](https://platform.openai.com/docs/guides/prompt-engineering)
- [Anthropic: Use XML Tags to Structure Your Prompts](https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/use-xml-tags)

## Optimizations 2, 3, 4: Model Configuration

### Optimization 2: `max_tokens=1024`

Limits output length to prevent runaway responses. 1024 tokens is enough for detailed troubleshooting while keeping costs predictable.

### Optimization 3: `stop_sequences=["###", "END_RESPONSE"]`

Allows early termination when the response is complete. The model can stop generating before hitting `max_tokens` if it reaches a natural endpoint.

### Optimization 4: `temperature=0.1`

Low temperature for accuracy and consistency. Customer support needs factual, predictable responses‚Äînot creative variations.

| Temperature | Use Case |
|-------------|----------|
| 0.0 - 0.3 | Factual tasks, customer support, code generation |
| 0.4 - 0.7 | Balanced creativity and accuracy |
| 0.8 - 1.0 | Creative writing, brainstorming |

### Final Model Configuration

```python
model = BedrockModel(
    model_id="us.anthropic.claude-sonnet-4-5-20250929-v1:0",
    temperature=0.1,                          # Optimization 4
    max_tokens=1024,                          # Optimization 2
    stop_sequences=["###", "END_RESPONSE"],   # Optimization 3
)
```

In [6]:
# Review the v2 agent code
agent_file = Path("agents/v2_quick_wins.py")
print(agent_file.read_text())

"""
V2 Quick Wins Agent - Low-effort optimizations.
- Concise system prompt
- max_tokens limit
- stop_sequences
- Low temperature for accuracy
"""

import base64
import os
from bedrock_agentcore.runtime import BedrockAgentCoreApp
from dotenv import load_dotenv
from strands import Agent
from strands.models import BedrockModel
from strands.telemetry import StrandsTelemetry

import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from utils.tools import get_return_policy, get_product_info, web_search, get_technical_support

load_dotenv()

# Langfuse configuration
langfuse_public_key = os.environ.get("LANGFUSE_PUBLIC_KEY")
langfuse_secret_key = os.environ.get("LANGFUSE_SECRET_KEY")
langfuse_base_url = os.environ.get("LANGFUSE_BASE_URL", "https://cloud.langfuse.com")
LANGFUSE_AUTH = base64.b64encode(f"{langfuse_public_key}:{langfuse_secret_key}".encode()).decode()

os.environ["LANGFUSE_PROJECT_NAME"] = "my-llm-project"
os.environ["DISABLE_ADOT_OBSERVABILIT

## Step 3: Deploy the Quick Wins Agent

In [None]:
agent_name = "customer_support_v2_quick_wins"
agent_file = str(Path("agents/v2_quick_wins.py").absolute())
requirements_file = str(Path("requirements-for-agentcore.txt").absolute())

# Clean up any existing build files from previous labs
for f in ["Dockerfile", ".dockerignore", ".bedrock_agentcore.yaml"]:
    p = Path(f)
    if p.exists():
        p.unlink()
        print(f"Removed existing: {f}")

print(f"Configuring agent: {agent_name}")
agentcore_runtime.configure(
    entrypoint=agent_file,
    auto_create_execution_role=True,
    auto_create_ecr=True,
    requirements_file=requirements_file,
    region=region,
    agent_name=agent_name,
)

In [8]:
# Modify Dockerfile for Langfuse
dockerfile_path = Path("Dockerfile")
if dockerfile_path.exists():
    content = dockerfile_path.read_text()
    # Replace opentelemetry-instrument wrapper with direct python call
    # Keep the correct module path using regex
    if "opentelemetry-instrument" in content:
        import re

        content = re.sub(
            r'CMD \["opentelemetry-instrument", "python", "-m", "([^"]+)"\]', r'CMD ["python", "-m", "\1"]', content
        )
        dockerfile_path.write_text(content)
        print("Dockerfile modified for Langfuse")
    else:
        print("Dockerfile already configured or using different format")
else:
    print("Dockerfile not found - will be created during deployment")

Dockerfile modified for Langfuse


In [9]:
env_vars = {
    "LANGFUSE_BASE_URL": os.environ.get("LANGFUSE_BASE_URL"),
    "LANGFUSE_PUBLIC_KEY": os.environ.get("LANGFUSE_PUBLIC_KEY"),
    "LANGFUSE_SECRET_KEY": os.environ.get("LANGFUSE_SECRET_KEY"),
    "PYTHONUNBUFFERED": "1",
}

print("Deploying to AgentCore Runtime...")
launch_result = agentcore_runtime.launch(env_vars=env_vars, auto_update_on_conflict=True)
agent_arn = launch_result.agent_arn
print(f"Agent deployed: {agent_arn}")

üöÄ Launching Bedrock AgentCore (cloud mode - RECOMMENDED)...
   ‚Ä¢ Deploy Python code directly to runtime
   ‚Ä¢ No Docker required (DEFAULT behavior)
   ‚Ä¢ Production-ready deployment

üí° Deployment options:
   ‚Ä¢ runtime.launch()                ‚Üí Cloud (current)
   ‚Ä¢ runtime.launch(local=True)      ‚Üí Local development
Memory disabled - skipping memory creation
Starting CodeBuild ARM64 deployment for agent 'customer_support_v2_quick_wins' to account 739907928487 (us-east-1)
Setting up AWS resources (ECR repository, execution roles)...
Getting or creating ECR repository for agent: customer_support_v2_quick_wins


Deploying to AgentCore Runtime...


ECR repository available: 739907928487.dkr.ecr.us-east-1.amazonaws.com/bedrock-agentcore-customer_support_v2_quick_wins
Getting or creating execution role for agent: customer_support_v2_quick_wins
Using AWS region: us-east-1, account ID: 739907928487
Role name: AmazonBedrockAgentCoreSDKRuntime-us-east-1-449c4d3c64


‚úÖ Reusing existing ECR repository: 739907928487.dkr.ecr.us-east-1.amazonaws.com/bedrock-agentcore-customer_support_v2_quick_wins


‚úÖ Reusing existing execution role: arn:aws:iam::739907928487:role/AmazonBedrockAgentCoreSDKRuntime-us-east-1-449c4d3c64
Execution role available: arn:aws:iam::739907928487:role/AmazonBedrockAgentCoreSDKRuntime-us-east-1-449c4d3c64
Preparing CodeBuild project and uploading source...
Getting or creating CodeBuild execution role for agent: customer_support_v2_quick_wins
Role name: AmazonBedrockAgentCoreSDKCodeBuild-us-east-1-449c4d3c64
Reusing existing CodeBuild execution role: arn:aws:iam::739907928487:role/AmazonBedrockAgentCoreSDKCodeBuild-us-east-1-449c4d3c64
Using dockerignore.template with 46 patterns for zip filtering
Uploaded source to S3: customer_support_v2_quick_wins/source.zip
Updated CodeBuild project: bedrock-agentcore-customer_support_v2_quick_wins-builder
Starting CodeBuild build (this may take several minutes)...
Starting CodeBuild monitoring...
üîÑ QUEUED started (total: 0s)
‚úÖ QUEUED completed in 11.5s
üîÑ PROVISIONING started (total: 12s)
‚úÖ PROVISIONING complete

Agent deployed: arn:aws:bedrock-agentcore:us-east-1:739907928487:runtime/customer_support_v2_quick_wins-WwfeJI2Ll9


In [None]:
# Save the agent ARN for later use
agent_arn = launch_result.agent_arn
print(f"Agent ARN: {agent_arn}")

## Step 4: Test the Optimized Agent

Run the same test scenarios as the baseline to compare metrics.

In [10]:
def invoke_agent(prompt):
    """Invoke the agent via AgentCore API."""
    response = data_client.invoke_agent_runtime(
        agentRuntimeArn=agent_arn,
        runtimeSessionId=str(uuid.uuid4()),
        payload=json.dumps({"prompt": prompt}).encode(),
    )
    return json.loads(response["response"].read().decode("utf-8"))

In [11]:
# Import Langfuse metrics helper
from utils.langfuse_metrics import (
    clear_metrics,
    collect_metric,
    get_latest_trace_metrics,
    print_metrics,
    print_metrics_table,
)

# Clear any previously collected metrics
clear_metrics()

# Standard test prompts - each demonstrates a specific tool usage pattern
TEST_PROMPTS = [
    # Single tool: get_return_policy
    ("Return Policy", "What is your return policy for laptops?"),
    # Single tool: get_product_info
    ("Product Info", "Tell me about your smartphone options"),
    # Single tool: get_technical_support (Bedrock KB)
    ("Technical Support", "My laptop won't turn on, can you help me troubleshoot?"),
    # Multi-tool: get_product_info + get_return_policy
    ("Multi-part Question", "I want to buy a laptop. What are the specs and what's the return policy?"),
    # No tool: General greeting
    ("General Question", "Hello! What can you help me with today?"),
]

# Run all tests and collect metrics
for test_name, prompt in TEST_PROMPTS:
    print("=" * 60)
    print(f"Test: {test_name}")
    print("=" * 60)

    response = invoke_agent(prompt)
    print(response)

    # Fetch and collect metrics
    metrics = get_latest_trace_metrics(
        agent_name="customer-support-v2-quick-wins",
        wait_seconds=5,
        max_retries=5,
        timeout_seconds=120,
    )
    print_metrics(metrics, test_name)
    collect_metric(metrics, test_name)

Test: Return Policy
- **answer:** Our laptop return policy offers flexibility with the following details:

  **Return Window:** 30 days from purchase date
  
  **Requirements:**
  - Original packaging with all accessories included
  - No physical damage to the device
  - Receipt or order confirmation
  
  **Return Process:**
  - Submit through our online RMA portal, or
  - Bring to any TechMart store location
  
  **Refund Timeline:** 7-10 business days after we inspect the returned item
  
  **Shipping Costs:**
  - FREE return shipping for defective items
  - Customer pays shipping for change-of-mind returns
  
  **Restocking Fees:**
  - No fee for defective products
  - 15% restocking fee for change-of-mind returns
  
  **Warranty:** All laptops include a 1-year manufacturer warranty, with extended warranty options available at purchase.
  
  Would you like help starting a return or do you have questions about a specific laptop purchase?

- **category:** policy
- **confidence:** high

In [None]:
# Print summary table
print_metrics_table()

# Save metrics for comparison in later notebooks
from utils.langfuse_metrics import save_metrics
save_metrics("v2")

## Step 5: Compare with v1 (Baseline)

Enter your metrics from Lab 01 (v1 baseline) to compare cost, latency, and token usage.

In [None]:
from utils.langfuse_metrics import load_metrics, print_comparison

# Load metrics from Lab 01 (saved automatically when you ran print_metrics_table())
v1 = load_metrics("v1")

# Or enter manually if Lab 01 metrics weren't saved:
# v1 = {"total_cost": 0.1042, "avg_latency": 8.90, "total_input_tokens": 24523, "total_output_tokens": 2040}

# Print comparison (current metrics auto-calculated from collected)
print_comparison(
    prev_name="v1 (Baseline)",
    curr_name="v2 (Quick Wins)",
    prev_cost=v1["total_cost"],
    prev_latency=v1["avg_latency"],
    prev_input_tokens=v1["total_input_tokens"],
    prev_output_tokens=v1["total_output_tokens"],
)

### Results Analysis

The comparison above shows the impact of simple, low-effort optimizations. By restructuring the system prompt and tuning model parameters, we achieved meaningful reductions in both token usage and cost‚Äîwithout changing the agent's functionality.

**Key takeaways:**

- **Structure matters more than brevity.** A well-organized prompt with headers, bullet points, and clear sections helps the model locate instructions efficiently. This reduces both input tokens (shorter prompt) and output tokens (more focused responses).

- **Explicit constraints improve consistency.** Defining the expected output format and boundaries ("Do NOT...") leads to more predictable behavior across requests.

- **Model parameters are free optimizations.** Setting `max_tokens`, `temperature`, and `stop_sequences` costs nothing to implement but can meaningfully reduce token usage and improve response quality.

---

**Next:** In Lab 03, we'll explore **prompt caching**‚Äîa technique that can dramatically reduce costs when the same system prompt is used across multiple requests.

---

## Cleanup

To delete the agent deployed in this notebook, uncomment and run the following code.

In [None]:
# Uncomment to delete resources created in this lab
# agentcore_runtime.destroy(delete_ecr_repo=True)
# print(f"Deleted agent and ECR repository: {agent_name}")