# Amazon Bedrock Workshop: Getting Started with Amazon Bedrock APIs using Claude Sonnet 3.7

Welcome to this hands-on workshop where you'll learn how to interact with Amazon Bedrock APIs using Anthropic's Claude Sonnet 3.7 model. 

## What is Amazon Bedrock?

[Amazon Bedrock](https://aws.amazon.com/bedrock/) is a comprehensive, secure, and flexible service for building generative AI applications and agents. Amazon Bedrock connects you to leading foundation models (FMs), services to deploy and operate agents, and tools for fine-tuning, safeguarding, and optimizing models along with knowledge bases to connect applications to your latest data so that you have everything you need to quickly move from experimentation to real-world deployment.

### Key Benefits:
- **Easy Integration**: Single API to access multiple foundation models
- **Serverless**: No infrastructure to manage
- **Security**: Your data stays within your AWS account
- **Scalability**: Automatically scales based on demand


## Anthropic Claude
[Claude](https://docs.claude.com/en/docs/about-claude/models/overview) is a family of state-of-the-art large language models developed by Anthropic. This guide introduces our models and compares their performance with legacy models.


## Lab Overview

In this lab, you'll learn several Amazon Bedrock APIs:
1. Basic API calls using `invoke_model`
2. Streaming responses with `invoke_model_with_response_stream`
3. Conversational interactions using `converse` 
4. Streaming conversations with `converse_stream`

Let's explore each one with practical examples!

## Setup and Prerequisites

First, let's install the required Python library. We'll use `boto3`, the AWS SDK for Python, to interact with Bedrock APIs.

**Note**: Make sure you have enabled Claude Sonnet model access in your AWS Bedrock console.

In [None]:
%pip install boto3 -q

## Import Required Libraries

Now let's import the libraries we'll need throughout this workshop.


In [None]:
import boto3
import json

# Create a Bedrock client (replace region with where you enabled Bedrock)
region_name = boto3.Session().region_name
bedrock = boto3.client("bedrock-runtime", region_name=region_name)

print("Bedrock client created successfully!")
print(f"Region: {bedrock.meta.region_name}")

## Define Our Context Data

Throughout this lab, we'll use Amazon's Returns & Refunds policy as our context. This provides a practical, real-world scenario for our API examples.

**What this cell does:**
- Defines a context string containing Amazon's return policy information
- This context will be used in our prompts to give the AI model relevant information
- Using consistent context helps demonstrate how the same information can be used across different API methods


In [None]:
# Context from Amazon Returns & Refunds FAQ
returns_context = """
Amazon Returns & Refunds Policy:
- Most items can be returned within 30 days of receipt of delivery.
- You can initiate a return from 'Your Orders'.
- Refunds are issued to the original payment method once the item is received.
- Some items may not be returnable (e.g., perishable goods, digital products).
- A-to-Z Guarantee covers purchases from third-party sellers if the item doesn't arrive or isn't as described.
"""

print("Context data defined:")
print(returns_context)

## 1. Basic API Invocation with `invoke_model`

The `invoke_model` API is the most basic way to interact with Bedrock models. It sends a request and waits for the complete response.

### Understanding the Parameters

- **`modelId`**: Specifies which model to use (Claude Sonnet 3.7 in our case)
- **`body`**: Contains the request payload as JSON, including:
  - `anthropic_version`: API version for Anthropic models
  - `max_tokens`: Maximum number of tokens in the response
  - `messages`: Array of conversation messages with roles and content

### Expected Output
You'll receive a JSON response containing the model's answer about returning items on Amazon, along with metadata like token usage and stop reason.


### Understanding Inference Configuration Parameters

Let's explore what each parameter does:

#### `maxTokens`
- Controls the maximum length of the response
- Higher values allow longer responses
- Consider your use case and cost implications

#### `temperature` (0.0 - 1.0)
- **0.0**: Deterministic, always picks the most likely next word
- **0.3-0.5**: Good balance for most applications
- **0.7**: More creative and varied responses (what we used above)
- **1.0**: Very creative, potentially less coherent

#### `topP` (not used above, but available)
- Controls the diversity of word selection
- Lower values = more focused responses
- Higher values = more diverse responses

In [None]:
prompt = """Question: How can I return an item on Amazon?"""

response = bedrock.invoke_model(
    modelId="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
    body=json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1000,
        "messages": [
            {
                "role": "user", 
                "content": prompt
            }
        ]
    })
)

result = json.loads(response["body"].read())
print(result)

### Understanding the Response

The response above contains several important fields:

- **`content`**: Array containing the actual response text
- **`usage`**: Token usage information (input and output tokens)
- **`stop_reason`**: Why the model stopped generating (usually "end_turn")
- **`model`**: Confirms which model was used

Let's extract just the text content for easier reading:

**What this cell does:**
- Checks if the response contains content
- Extracts the text from the first content block
- Displays it in a formatted way
- Shows token usage statistics

**Expected outcome:** You'll see Claude's response in a clean, readable format along with information about how many tokens were used.

In [None]:
# Extract and display just the text content
if result.get('content') and len(result['content']) > 0:
    response_text = result['content'][0]['text']
    print("Claude's Response:")
    print("=" * 50)
    print(response_text)
    print("=" * 50)
    print(f"Tokens used: {result['usage']['input_tokens']} input, {result['usage']['output_tokens']} output")
else:
    print("No content found in response")

## 2. Streaming Responses with `invoke_model_with_response_stream`

Sometimes you want to see the response as it's being generated, especially for longer responses. This creates a better user experience in interactive applications.

### When to use:
- Interactive chat applications
- Long responses where users want to see progress
- Better user experience with immediate feedback

### How it works:
- Response comes in chunks
- Each chunk contains a piece of the text
- You process chunks as they arrive

### What this cell does:
- Uses our returns context to provide background information
- Asks a specific question about Amazon's refund policy
- Uses `invoke_model_with_response_stream` instead of `invoke_model`
- Processes streaming chunks in real-time
- Displays text as it arrives from the model

### Expected Output
You'll see the response appear word by word, simulating a typing effect. This demonstrates how streaming works in real-time applications.

In [None]:
prompt = f"""{returns_context}

Question: What is Amazon's refund policy?
"""

response = bedrock.invoke_model_with_response_stream(
    modelId="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
    body=json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 200,
        "messages": [
            {
                "role": "user",
                "content": prompt
            }
        ]
    })
)

print("Streaming response:")
print("=" * 50)

for event in response["body"]:
    if "chunk" in event:
        chunk = json.loads(event["chunk"]["bytes"].decode("utf-8"))
        if chunk["type"] == "content_block_delta":
            print(chunk["delta"]["text"], end="")

print("\n" + "=" * 50)
print("Streaming complete!")

### Understanding Streaming Events

When using streaming, the response comes as a series of events:

- **`chunk`**: Contains the actual data
- **`type`**: Indicates the type of chunk (we want "content_block_delta")
- **`delta`**: Contains the incremental text to add

### Streaming vs Non-Streaming Comparison

| Aspect | `invoke_model` | `invoke_model_with_response_stream` |
|--------|----------------|------------------------------------|
| **Response Time** | Wait for complete response | Immediate feedback |
| **User Experience** | All-at-once | Progressive display |
| **Complexity** | Simple | Requires event handling |
| **Best For** | Batch processing | Interactive apps |
| **Token Limits** | Same limits apply | Same limits apply |

## 3. Conversational API with `converse`

The `converse` method provides a more structured way to handle conversations. It's designed specifically for multi-turn conversations and offers better support for system messages and conversation context.

### Advantages:
- Simpler syntax
- Consistent across different models
- Better support for conversations
- Separate system and user messages
- Built-in inference configuration

### Key Features:
- `system`: For setting context and behavior
- `messages`: For the actual conversation
- `inferenceConfig`: For model parameters

### What this cell does:
- Uses the `converse` method instead of `invoke_model`
- Passes our returns context as a system message
- Asks a question about canceling Amazon orders
- Demonstrates the cleaner conversation structure

### Expected Output
You'll receive a response about canceling Amazon orders, with the context provided through system messages. Notice how the API structure is cleaner and more intuitive.

In [None]:
response = bedrock.converse(
    modelId="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
    system=[{"text": returns_context}],
    messages=[
        {"role": "user", "content": [{"text": "How do I cancel an Amazon order?"}]}
    ]
)

for msg in response["output"]["message"]["content"]:
    print(msg["text"])

## 4. Streaming Conversations with `converse_stream`

The `converse_stream` method combines the best of both worlds: the structured conversation format of `converse` with the real-time streaming of `invoke_model_with_response_stream`.

### What this cell does:
- Uses `converse_stream` for real-time streaming with conversation structure
- Includes system context about Amazon's return policy
- Asks about Amazon's A-to-Z Guarantee
- Sets inference parameters to control response generation
- Displays the response as it streams in real-time

### Expected Output
You'll see a streaming explanation of Amazon's A-to-Z Guarantee appearing in real-time, demonstrating how to combine conversational structure with streaming responses.

In [None]:
response = bedrock.converse_stream(
    modelId="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
    system=[{"text": returns_context}],
    messages=[
        {"role": "user", "content": [{"text": "Explain Amazon's A-to-Z Guarantee"}]}
    ],
    inferenceConfig={
        "maxTokens": 500,
        "temperature": 0.7
    }
)

print("Streaming conversational response:")
print("=" * 50)

for event in response["stream"]:
    if "contentBlockDelta" in event:
        print(event["contentBlockDelta"]["delta"]["text"], end="")

print("\n" + "=" * 50)
print("Streaming conversation complete!")


Congratulations! You've successfully completed this lab. You now have hands-on experience with:

### âœ… What You've Learned:

1. **Basic API Invocation** - Using `invoke_model` for simple requests
2. **Streaming Responses** - Real-time text generation with `invoke_model_with_response_stream`
3. **Conversational APIs** - Structured conversations using `converse`
4. **Streaming Conversations** - Real-time chat with `converse_stream`


Let's continue our learning with lab 2 for prompt engineering.