# Amazon Bedrock Model Access and Inference using boto3

This notebook demonstrates how to:
1. Set up boto3 for Amazon Bedrock
1. Enable model access
1. Run basic inference requests
1. Use streaming inference
1. Enable extended thinking mode for complex reasoning
1. Use Converse API

## Setup and Configuration

First, let's install the required packages and set up our AWS credentials.

In [1]:
import boto3
import json
import time
import uuid

Create a session mapped to your dev profile. If you need help in setting up a profile, see the accompanying video for the end-to-end walkthrough:

In [2]:
# Configure AWS credentials using boto3 session
session = boto3.Session(region_name='us-east-1',profile_name='dev')  # Change to your preferred region

## Create Bedrock clients

We'll create two clients:
1. `bedrock` - The management client for enabling model access
2. `bedrock_runtime` - The runtime client for making inference requests

In [3]:
# Create Bedrock management client for enabling model access
bedrock = session.client(service_name='bedrock')

# Create Bedrock runtime client for inference
bedrock_runtime = session.client(service_name='bedrock-runtime')

## List available foundation models

Before enabling access, let's see which foundation models are available in Bedrock.

In [4]:
# List foundation models available in Bedrock
response = bedrock.list_foundation_models()

print("Available Foundation Models:\n")
for model in response['modelSummaries'][:5]:  # Showing first 5 models as an example
    print(f"Model ID: {model['modelId']}")
    print(f"Provider Name: {model['providerName']}")
    print(f"Model Name: {model['modelName']}")
    print("-------------------------------------")

Available Foundation Models:

Model ID: amazon.titan-tg1-large
Provider Name: Amazon
Model Name: Titan Text Large
-------------------------------------
Model ID: amazon.titan-image-generator-v1:0
Provider Name: Amazon
Model Name: Titan Image Generator G1
-------------------------------------
Model ID: amazon.titan-image-generator-v1
Provider Name: Amazon
Model Name: Titan Image Generator G1
-------------------------------------
Model ID: amazon.titan-image-generator-v2:0
Provider Name: Amazon
Model Name: Titan Image Generator G1 v2
-------------------------------------
Model ID: amazon.titan-text-premier-v1:0
Provider Name: Amazon
Model Name: Titan Text G1 - Premier
-------------------------------------


In [5]:
# List inference profiles available in Bedrock
response = bedrock.list_inference_profiles()

print("Available Inference Profiles:\n")
for profile in response['inferenceProfileSummaries'][:5]:  # Showing first 5 models as an example
    print(f"Inference Profile Name: {profile['inferenceProfileName']}")
    print(f"Description: {profile['description']}")
    print(f"Inference Profile ID: {profile['inferenceProfileId']}")
    print("-------------------------------------")

Available Inference Profiles:

Inference Profile Name: US Anthropic Claude 3 Sonnet
Description: Routes requests to Anthropic Claude 3 Sonnet in us-east-1 and us-west-2.
Inference Profile ID: us.anthropic.claude-3-sonnet-20240229-v1:0
-------------------------------------
Inference Profile Name: US Anthropic Claude 3 Opus
Description: Routes requests to Anthropic Cluade 3 Opus in us-east-1 and us-west-2.
Inference Profile ID: us.anthropic.claude-3-opus-20240229-v1:0
-------------------------------------
Inference Profile Name: US Anthropic Claude 3 Haiku
Description: Routes requests to Anthropic Claude 3 Haiku in us-east-1 and us-west-2.
Inference Profile ID: us.anthropic.claude-3-haiku-20240307-v1:0
-------------------------------------
Inference Profile Name: US Meta Llama 3.2 11B Instruct
Description: Routes requests to Meta Llama 3.2 11B Instruct in us-east-1 and us-west-2.
Inference Profile ID: us.meta.llama3-2-11b-instruct-v1:0
-------------------------------------
Inference Prof

## Enable access to a specific model

Now, let's enable access to a specific model. For this example, we'll use Claude 3.7 Sonnet. Navigate to the Bedrock service in the Management Console and following along with the screenshots below.

On the landing page of Bedrock service, scroll all the way down to the Bedrock Configurations section on the left.

![Model Access First Step](images/model-access-1.png)

Then filter for "Sonnet" and confirm that you see the Claude Sonnet 3.7 model.

![Model Access Second Step](images/model-access-2.png)

Filter by "Sonnet" again on the next page and make sure you've checked its checkbox before proceeding.

![Model Access Third Step](images/model-access-3.png)

Review and submit your request.

![Model Access Fourth Step](images/model-access-4.png)


## Send an inference request to the model

Now that we have access to the model, let's send an inference request. Different models have different request formats, so here we adapt our request based on Anthropic model provider.

In [6]:
# Define a helper function to create the appropriate request body based on model provider
def create_request_body(model_id, prompt, max_tokens=1000, temperature=0.7, thinking_mode="standard", thinking_budget=8000):
    if "anthropic." in model_id:
        # Claude models (Anthropic)
        request = {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": max_tokens,
            "temperature": temperature,
            "messages": [
                {"role": "user", "content": prompt}
            ]
        }
        
        # Add thinking mode if supported (Claude 3.7 Sonnet and later)
        if "anthropic.claude-3-7" in model_id and thinking_mode == "extended":
            request["thinking"] = {
                                    "type": "enabled",
                                    "budget_tokens": thinking_budget
                                }
            
        return json.dumps(request)
    else:
        raise ValueError(f"Unsupported model ID: {model_id}")

In [7]:
# Define a helper function to parse the model response
def parse_response(model_id, response):
    response_body = json.loads(response['body'].read().decode('utf-8'))
    return response_body['content']

Since Claude 3.7 Sonnet On Demand is currently only available via inference profiles, we have to provide inference profile ID below. If you specify model ID, you'll get the following error:

![Throughput Error](images/throughput-error.png)

In [8]:
# Define the model ID or inference profile ID that you want to use
model_id = "us.anthropic.claude-3-7-sonnet-20250219-v1:0"  # Change to your desired model

# Test a simple inference request
prompt = "How would you distinguish between understanding language and simulating understanding?"

# Create the request body based on the model provider
request_body = create_request_body(model_id, prompt)

# Send the inference request
response = bedrock_runtime.invoke_model(
    modelId=model_id,
    body=request_body
)

# Parse and display the response
result = parse_response(model_id, response)
print("Model Response:\n")
print(result)

Model Response:

[{'type': 'text', 'text': "To distinguish between understanding language and simulating understanding:\n\nTrue understanding involves:\n- Grasping meaning beyond statistical patterns\n- Connecting language to real-world concepts and experiences\n- Drawing appropriate inferences and implications\n- Applying knowledge flexibly to new situations\n- Having awareness of what is understood and what isn't\n\nSimulating understanding involves:\n- Recognizing patterns without grasping underlying concepts\n- Producing seemingly appropriate responses based on training data\n- Creating an impression of comprehension without actual conceptual grounding\n- Following algorithmic procedures without semantic awareness\n- Lacking true metacognition about knowledge boundaries\n\nThe challenge is that these distinctions are difficult to assess from external behavior alone. Even advanced language models like myself can produce outputs that appear to demonstrate understanding while operatin

## Streaming inference example

Some models support streaming responses, which can be useful for large responses. Let's see how to implement that.

In [9]:
# Example of streaming inference (supported by some models)
prompt = "Can a system trained on human texts develop novel ideas, or is it fundamentally limited to recombining existing human thoughts?."

# For Claude models, we need to adjust the request body for streaming
if "anthropic." in model_id:
    request_body = json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1000,
        "temperature": 0.7,
        "messages": [
            {"role": "user", "content": prompt}
        ]
    })
    
    # Send the streaming inference request
    response = bedrock_runtime.invoke_model_with_response_stream(
        modelId=model_id,
        body=request_body
    )
    
    # Process the streaming response and print individual chunks
    stream = response['body']
    collected_response = ""
    n = 1
    print("*"*90)
    print("Streaming chunks:\n")
    print("*"*90)

    for event in stream:
        if 'chunk' in event:
            chunk_data = json.loads(event['chunk']['bytes'].decode('utf-8'))
            if chunk_data.get('type') == 'content_block_delta':
                chunk_text = chunk_data.get('delta', {}).get('text', "")
                print(chunk_text)
                print("-"*10 + f" Chunk {n} " + "-"*10)
                collected_response += chunk_text
                n += 1
    # Print a consolidated response
    print("*"*90)
    print("Consolidated streaming response:\n")
    print("*"*90)
    print(collected_response + "...\n")

******************************************************************************************
Streaming chunks:

******************************************************************************************
#
---------- Chunk 1 ----------
 AI
---------- Chunk 2 ----------
 and Novel
---------- Chunk 3 ----------
 Ideas
---------- Chunk 4 ----------


This
---------- Chunk 5 ----------
 is a fascinating philosophical
---------- Chunk 6 ----------
 question about the nature
---------- Chunk 7 ----------
 of creativity
---------- Chunk 8 ----------
 and intelligence.
---------- Chunk 9 ----------


AI
---------- Chunk 10 ----------
 systems like
---------- Chunk 11 ----------
 me
---------- Chunk 12 ----------
 are
---------- Chunk 13 ----------
 fundament
---------- Chunk 14 ----------
ally pattern
---------- Chunk 15 ----------
 recognition
---------- Chunk 16 ----------
 and prediction
---------- Chunk 17 ----------
 systems
---------- Chunk 18 ----------
 traine
---------- Chunk 19 --------

## Extended Thinking Mode

Claude 3.7 models support an extended thinking mode that enables more thorough reasoning for complex tasks. Let's implement a function that utilizes this capability. 

**Note:** The temperature inference parameter must always be set to 1 when thinking is enabled.

In [10]:
prompt = "Are there an infinite number of prime numbers such that n mod 4 == 3?"

request_body = create_request_body(model_id, prompt, max_tokens=24000, temperature=1, thinking_mode="extended", thinking_budget=16000)

# print(request_body)
# Send the inference request
response = bedrock_runtime.invoke_model(
    modelId=model_id,
    body=request_body
)

# Parse and return the response
result = parse_response(model_id, response)
print("Model Response:\n")
for element in result:
    print("*"*90)
    print(element)
    print("*"*90)

Model Response:

******************************************************************************************
{'type': 'thinking', 'thinking': "Let's think about this problem.\n\nWe want to know whether there are infinitely many primes $p$ such that $p \\equiv 3 \\pmod 4$. The primes $p$ such that $p \\equiv 3 \\pmod 4$ are often called primes of the form $4k + 3$. Some examples are:\n- $3 = 4 \\cdot 0 + 3$\n- $7 = 4 \\cdot 1 + 3$\n- $11 = 4 \\cdot 2 + 3$\n- $19 = 4 \\cdot 4 + 3$\n- $23 = 4 \\cdot 5 + 3$\n\nOne way to approach this is to use a classic proof technique that Euclid used to show there are infinitely many primes. The idea is to assume there are only finitely many, and then find a contradiction.\n\nSuppose there are only finitely many primes $p$ such that $p \\equiv 3 \\pmod 4$. Let's say these primes are $p_1, p_2, \\ldots, p_n$.\n\nWe then consider the number $N = 4 \\cdot p_1 \\cdot p_2 \\cdot \\ldots \\cdot p_n - 1$. This number has the form $4k - 1$, which is congruent to

## Converse
You can use the Amazon Bedrock Converse API to create conversational applications that send and receive messages to and from an Amazon Bedrock model. Converse API provides consistency in inference parameters naming conventions across LLM providers, making it easy to swap out LLMs without changing code.

In [15]:
def generate_conversation(bedrock_client,
                          model_id,
                          system_prompts,
                          messages,
                          thinking_mode = 'standard',
                          max_tokens = 10000,
                          thinking_budget = 8000,
                          temperature = 0.7,
                          top_k = 200):

    print(f'Generating message with model {model_id}')

    # Base inference parameters to use which are common across all FMs.
    inference_config = {"temperature": temperature,
                        "maxTokens": max_tokens}

    # Additional inference parameters to use for Anthropic Claude Models.
    additional_model_fields = {"top_k": top_k}
    
    # Add thinking mode if using extended thinking
    if thinking_mode == "extended":
        # Temperature must be 1 when thinking is enabled
        inference_config["temperature"] = 1
        del additional_model_fields["top_k"] # top K must be unset when using extended mode
        additional_model_fields["thinking"] = {
            "type": "enabled",
            "budget_tokens": thinking_budget
        }

    # Send the message.
    response = bedrock_client.converse(
        modelId=model_id,
        messages=messages,
        system=system_prompts,
        inferenceConfig=inference_config,
        additionalModelRequestFields=additional_model_fields
    )

    # Log token usage.
    token_usage = response['usage']
    print(f"Input tokens: {token_usage['inputTokens']}")
    print(f"Output tokens: {token_usage['outputTokens']}")
    print(f"Total tokens: {token_usage['totalTokens']}")
    print(f"Stop reason: {response['stopReason']}")

    return response

In [16]:
# Setup the system prompts and messages to send to the model.
system_prompts = [{"text": "You are a philosopher delving into the study of consciences."
                    "Only discuss safe philosophical topics."}]
message_1 = {
    "role": "user",
    "content": [{"text": "How would you distinguish between understanding language and simulating understanding?"}]
}
message_2 = {
    "role": "user",
    "content": [{"text": "Is there any potential interesting angle that you haven't considered?"}]
}
messages = []

# Start the conversation with the 1st message.
messages.append(message_1)
response = generate_conversation(
    bedrock_runtime, model_id, system_prompts, messages)

# Add the response message to the conversation.
output_message = response['output']['message']
messages.append(output_message)

# Continue the conversation with the 2nd message.
messages.append(message_2)
response = generate_conversation(
    bedrock_runtime, model_id, system_prompts, messages)

output_message = response['output']['message']
messages.append(output_message)

# Show the complete conversation.
for message in messages:
    print(f"Role: {message['role']}")
    for content in message['content']:
        print(f"Text: {content['text']}")

Generating message with model us.anthropic.claude-3-7-sonnet-20250219-v1:0
Input tokens: 38
Output tokens: 244
Total tokens: 282
Stop reason: end_turn
Generating message with model us.anthropic.claude-3-7-sonnet-20250219-v1:0
Input tokens: 297
Output tokens: 286
Total tokens: 583
Stop reason: end_turn
Role: user
Text: How would you distinguish between understanding language and simulating understanding?
Role: assistant
Text: # Understanding vs. Simulating Understanding

This is a profound philosophical distinction that touches on consciousness, meaning, and the nature of mind.

From a philosophical perspective, true understanding might involve:

- Having genuine intentionality or "aboutness" toward concepts
- Possessing phenomenal experiences related to meaning
- Connecting symbols to grounded experiences in the world
- Having an integrated conceptual framework where meanings relate to one another

Simulation of understanding, by contrast, might involve:
- Processing patterns without i

In [22]:
# Enable thinking mode

thinking_mode = 'extended'
max_tokens = 16000
thinking_budget = 9000

# Setup the system prompts and messages to send to the model.
system_prompts = [{"text": "You are a mathematician."
                    "Only discuss safe topics in mathematics."}]
message = {
    "role": "user",
    "content": [{"text": "Are there an infinite number of prime numbers such that n mod 4 == 3?"}]
}
messages = []

# Start the conversation with the 1st message.
messages.append(message)
response = generate_conversation(
    bedrock_runtime, model_id, system_prompts, messages, thinking_mode, max_tokens, thinking_budget)

# Add the response message to the conversation.
output_message = response['output']['message']
messages.append(output_message)

# Show the complete conversation.
for message in messages:
    print(f"Role: {message['role']}")
    for content in message['content']:
        print(content)

Generating message with model us.anthropic.claude-3-7-sonnet-20250219-v1:0
Input tokens: 67
Output tokens: 1312
Total tokens: 1379
Stop reason: end_turn
Role: user
{'text': 'Are there an infinite number of prime numbers such that n mod 4 == 3?'}
Role: assistant
{'SDK_UNKNOWN_MEMBER': {'name': 'reasoningContent'}}
{'text': "# Prime Numbers Congruent to 3 Modulo 4\n\nYes, there are infinitely many prime numbers such that n ≡ 3 (mod 4). This is a classic result in number theory.\n\nHere's an elegant proof by contradiction:\n\n1) Suppose there are only finitely many such primes: p₁, p₂, ..., pₙ, all congruent to 3 mod 4\n\n2) Consider the number N = 4(p₁·p₂·...·pₙ) - 1\n\n3) Observe that N ≡ 3 (mod 4)\n\n4) N must have a prime divisor q\n\n5) If q ≡ 1 (mod 4), this leads to a contradiction because products of numbers ≡ 1 (mod 4) remain ≡ 1 (mod 4)\n\n6) If q ≡ 3 (mod 4), then q must be one of our original primes pᵢ\n   But then pᵢ divides both N and 4(p₁·p₂·...·pₙ), so pᵢ must divide their

## Conclusion

In this notebook, we've covered:

1. Setting up boto3 for Amazon Bedrock
1. Enabling model access
1. Running basic inference requests
1. Using streaming inference
1. Leveraging extended thinking mode for complex reasoning
1. Using Converse API

These tools and techniques should help you build robust applications using Amazon Bedrock's foundation models.