# **Lab: Building Conversational AI Solutions with AWS Bedrock using Converse API**


### **Introduction**

Welcome to this introduction to building conversational AI with Amazon Bedrock's Converse API! The primary goal of this chapter is to provide a comprehensive introduction to Amazon Bedrock APIs for generating text. While we'll explore various use cases like summarization and code generation, our focus is on understanding the API patterns.

In this notebook, you will:

1. Learn the basics of the Amazon Bedrock **Invoke API**
2. Explore the more powerful **Converse API** and it's features like multi-turn conversation, streaming, or function calling
3. Apply these APIs across various foundation models
4. Compare results across different state-of-the-art models

## **1. Setup**

### **1.1 Import the required libraries**

In [1]:
import json
import boto3
import botocore
from IPython.display import display, Markdown
import time

**Output:** (No output for successful import. If run in a live environment, the code would execute silently.)

### **Library Imports Explanation**

In this step, we are importing specific libraries that will allow us to interact with AWS services, handle JSON data, display outputs in a Jupyter notebook, and manage timing.

- **`json`**: Handles **JSON data** for parsing and generating JSON objects, commonly used in API responses from AWS.
  
- **`boto3`**: AWS SDK for Python, allowing interaction with **AWS services** (e.g., S3, EC2) to manage resources.
  
- **`botocore`**: Low-level library that **boto3** uses to manage requests, errors, and session handling with AWS.
  
- **`IPython.display`**: Displays **rich content** (e.g., **Markdown**) in Jupyter notebooks for better presentation.
  
- **`time`**: Provides time functions like **`sleep()`** to pause execution and manage time intervals.

These libraries enable efficient interaction with AWS, data handling, and display management in Jupyter notebooks.


### **1.2 Initial setup for clients, global variables and helper functions**

In [2]:
# Initialize a boto3 session, which allows interaction with AWS services.
# A session is used to manage AWS credentials, configurations, and region settings.
session = boto3.session.Session()

# Get the region name of the session to configure the AWS client appropriately.
region = session.region_name

# Initialize the Bedrock client using the session's region.
# This client will be used to interact with Amazon Bedrock's API for running foundation models.
bedrock = boto3.client(service_name='bedrock-runtime', region_name=region)


**Output:** *(No output for successful client initialization. If run in a live environment, the code would execute silently and the Bedrock client would be ready.)*

In [3]:
# Define model IDs that will be used in this module

# These model IDs are required to interact with the specific foundation models using Amazon Bedrock API.

MODELS = {

    "Claude 3.7 Sonnet": "us.anthropic.claude-3-7-sonnet-20250219-v1:0", # Claude 3.7 Sonnet model from Anthropic, using the model ID with versioning
    "Claude 3.5 Sonnet": "us.anthropic.claude-3-5-sonnet-20240620-v1:0", # Claude 3.5 Sonnet model from Anthropic with versioning
    "Claude 3.5 Haiku": "us.anthropic.claude-3-5-haiku-20241022-v1:0", # Claude 3.5 Haiku model from Anthropic, another variation with versioning
    "Amazon Nova Pro": "us.amazon.nova-pro-v1:0",  # Amazon Nova Pro model, a generative AI model from Amazon
    "Amazon Nova Micro": "us.amazon.nova-micro-v1:0",  # Amazon Nova Micro model, a smaller version of Amazon Nova for resource-constrained environments
    "Meta Llama 3.1 70B Instruct": "us.meta.llama3-1-70b-instruct-v1:0" # Meta Llama 3.1 70B Instruct model from Meta, used for instruction-following tasks
}


**Output:** (No output. The MODELS dictionary is defined in memory.)

This block of code helps you easily reference different AI models by their unique IDs, making it simpler to switch models or call them in subsequent parts of the code.

By organizing the models in a dictionary, you can easily iterate or look up specific models as needed in your application.

In [4]:
# Utility function to display model responses in a more readable format

def display_response(response, model_name=None):
    # If a model name is provided, display it as a Markdown header
    if model_name:
        display(Markdown(f"### Response from {model_name}"))

    # Display the model's response as Markdown content (formatted text)
    display(Markdown(response))

    # Print a separator line for better readability of the output
    print("\n" + "-"*80 + "\n")


**Output:** (No output. The `display_response` function is defined in memory.)

## **2. Text Summarization with Foundation Models**

Let's start by exploring how to leverage Amazon Bedrock APIs for text summarization. We'll first use the basic Invoke API, then introduce the more powerful Converse API.

As an example, let's take a paragraph about Amazon Bedrock from an [AWS blog post](https://aws.amazon.com/jp/blogs/machine-learning/announcing-new-tools-for-building-with-generative-ai-on-aws/).


In [5]:
text_to_summarize = """
AWS took all of that feedback from customers, and today we are excited to announce Amazon Bedrock, \
a new service that makes FMs from AI21 Labs, Anthropic, Stability AI, and Amazon accessible via an API. \
Bedrock is the easiest way for customers to build and scale generative AI-based applications using FMs, \
democratizing access for all builders. Bedrock will offer the ability to access a range of powerful FMs \
for text and images—including Amazons Titan FMs, which consist of two new LLMs we're also announcing \
today—through a scalable, reliable, and secure AWS managed service. With Bedrock's serverless experience, \
customers can easily find the right model for what they're trying to get done, get started quickly, privately \
customize FMs with their own data, and easily integrate and deploy them into their applications using the AWS \
tools and capabilities they are familiar with, without having to manage any infrastructure (including integrations \
with Amazon SageMaker ML features like Experiments to test different models and Pipelines to manage their FMs at scale).
"""

**Output:** *(No output. The `text_to_summarize` variable is defined.)*

### **2.1 Text Summarization using the Invoke Model API**

Amazon Bedrock's **Invoke Model API** serves as the most basic method for sending requests to foundation models. Since each model family has its own distinct request and response format, you'll need to craft specific JSON payloads tailored to each model.

For this example, we will call Claude 3.7 Sonnet via Invoke Model API (using the `invoke_model` function of the Bedrock Runtime Client) to generate a summary of our text.

In [6]:
# Create the prompt for summarization
# We are using an f-string to dynamically insert the `text_to_summarize` into the prompt for the model.

prompt = f"""Please provide a summary of the following text. Do not add any information that is not mentioned in the text below.
<text>
{text_to_summarize}  # Insert the long text to be summarized here
</text>
"""


**Output:** *(No output. The `prompt` variable is defined.)*

In [7]:
# Create request body for Claude 3.7 Sonnet

claude_body = json.dumps({                     # Convert the dictionary into a JSON-formatted string for the API
    "anthropic_version": "bedrock-2023-05-31", # Specifies the version of the Claude model API being called
    "max_tokens": 1000,                        # Maximum number of tokens the model is allowed to generate
    "temperature": 0.5,                        # Controls creativity (0 = deterministic, 1 = creative)
    "top_p": 0.9,                              # Nucleus sampling to control diversity of the output
    "messages": [                              # List of messages representing the conversation history
        {
            "role": "user",                    # Indicates that the message is coming from the user
            "content": [{"type": "text", "text": prompt}]  # Actual text prompt asking for summarization
        }
    ],
})


**Output:** *(No output. The `claude_body` JSON string is created.)*

In [8]:
# Send request to Claude 3.7 Sonnet

try:
    response = bedrock.invoke_model(                     # Call the Bedrock Invoke API to run the model
        modelId=MODELS["Claude 3.7 Sonnet"],             # The model ID for Claude 3.7 Sonnet
        body=claude_body,                                # JSON request body created earlier
        accept="application/json",                       # Expected response format
        contentType="application/json"                   # Content type of the request body
    )
    response_body = json.loads(response.get('body').read())  # Read and parse the model's JSON response

    # Extract and display the response text
    claude_summary = response_body["content"][0]["text"]     # Extract the actual summary text from the response
    display_response(claude_summary, "Claude 3.7 Sonnet (Invoke Model API)")  # Display formatted output

except botocore.exceptions.ClientError as error:             # Handle AWS API errors
    if error.response['Error']['Code'] == 'AccessDeniedException':
        # Provide helpful message for access issues
        print(f"\x1b[41m{error.response['Error']['Message']}\
            \nTo troubleshoot this issue please refer to the following resources.\
            \nhttps://docs.aws.amazon.com/IAM/latest/UserGuide/troubleshoot_access-denied.html\
            \nhttps://docs.aws.amazon.com/bedrock/latest/userguide/security-iam.html\x1b[0m\n")
    else:
        raise error                                          # Re-throw other unexpected errors


### Response from Claude 3.7 Sonnet (Invoke Model API)

# Summary

Amazon has announced Amazon Bedrock, a new service that provides API access to foundation models (FMs) from AI21 Labs, Anthropic, Stability AI, and Amazon. Bedrock aims to democratize access to generative AI by offering a user-friendly way for customers to build and scale applications. The service includes text and image models, including Amazon's new Titan LLMs. As a serverless AWS managed service, Bedrock allows customers to find appropriate models, get started quickly, customize models with their own data, and integrate them into applications using familiar AWS tools without managing infrastructure. The service also integrates with Amazon SageMaker features like Experiments and Pipelines.


--------------------------------------------------------------------------------



**Expected Output:** (Assuming a successful API call, the output would look similar to this. The actual text will vary slightly each time.)

### **2.2 Text Summarization using the Converse API (Recommended Approach)**

While the **Invoke Model API** allows direct access to foundation models, it has several limitations:
1. it uses different request/response formats for each model family;
2. there is no built-in support for multi-turn conversations;
3. it requires custom handling for different model capabilities.

The **Converse API** addresses these limitations by providing a unified interface. Let's explore it on our text summarization task:

In [9]:
# Create a converse request with our summarization task

converse_request = {
    "messages": [  # List of messages representing the conversation
        {
            "role": "user",  # Role of the sender (user in this case)
            "content": [  # Content of the message
                {
                    "text": f"Please provide a concise summary of the following text in 2-3 sentences. Text to summarize: {text_to_summarize}"  # The summarization prompt, including the text to summarize
                }
            ]
        }
    ],
    "inferenceConfig": {  # Configuration settings for controlling the model’s response
        "temperature": 0.4,  # Controls the randomness of the model's output (lower value for more deterministic output)
        "topP": 0.9,  # Nucleus sampling for controlling diversity of the output
        "maxTokens": 500  # Maximum number of tokens (words) for the summary response
    }
}



**Output:** *(No output. The `converse_request` dictionary is defined.)*

In [10]:
# Call Claude 3.7 Sonnet with Converse API

try:
    # Send the conversation request to the model using the Converse API
    response = bedrock.converse(
        modelId=MODELS["Claude 3.7 Sonnet"],  # Model ID for Claude 3.7 Sonnet from the MODELS dictionary
        messages=converse_request["messages"],  # The message list containing the user prompt
        inferenceConfig=converse_request["inferenceConfig"]  # Configuration parameters for the model (temperature, maxTokens)
    )


    # Extract the model's response from the JSON response body
    claude_converse_response = response["output"]["message"]["content"][0]["text"]  # Extracts the text from the response
    display_response(claude_converse_response, "Claude 3.7 Sonnet (Converse API)")  # Display the model’s response using the display_response function


except botocore.exceptions.ClientError as error:  # Handle any AWS API client errors
    if error.response['Error']['Code'] == 'AccessDeniedException':  # If the error is due to access denial
        # Print the error message in red, along with instructions for troubleshooting access issues
        print(f"\x1b[41m{error.response['Error']['Code']}: {error.response['Error']['Message']}\x1b[0m")
        print("Please ensure you have the necessary permissions for Amazon Bedrock.")

    else:
        raise error  # For other errors, raise the exception for further investigation


### Response from Claude 3.7 Sonnet (Converse API)

AWS has launched Amazon Bedrock, a new service providing API access to foundation models (FMs) from various AI companies, including Amazon's own Titan models. Bedrock simplifies the process for customers to build and scale generative AI applications through a serverless experience, allowing them to find appropriate models, customize them with private data, and integrate them using familiar AWS tools without managing infrastructure.


--------------------------------------------------------------------------------



**Expected Output:** (Assuming a successful API call, the output would look similar to this. The actual text will vary slightly but adhere to the 2-3 sentence limit.)

### **2.4 Easily switch between models**

One of the biggest advantages of the Converse API is the ability to easily switch between models using the exact same request format. Let's compare summaries across different foundation models by looping over the model dictionary we defined above:

In [11]:
# call different models with the same converse request

results = {}  # Initialize an empty dictionary to store results for each model

# Loop over all models defined in the MODELS dictionary
for model_name, model_id in MODELS.items():  # model_name is the name of the model, model_id is its identifier
    try:
        # Record the start time to calculate response time
        start_time = time.time()

        # Send the converse request to the model
        response = bedrock.converse(
            modelId=model_id,  # The model ID to be used
            messages=converse_request["messages"],  # The messages to be sent to the model
            inferenceConfig=converse_request["inferenceConfig"] if "inferenceConfig" in converse_request else None  # Optional inference config
        )

        # Record the end time after receiving the response
        end_time = time.time()

        # Extract the model's response from the API response
        model_response = response["output"]["message"]["content"][0]["text"]

        # Calculate the response time
        response_time = round(end_time - start_time, 2)

        # Store the response and time in the results dictionary
        results[model_name] = {
            "response": model_response,  # Store the model's response
            "time": response_time  # Store the time taken to get the response
        }

        # Print success message with model name and response time
        print(f"✅ Successfully called {model_name} (took {response_time} seconds)")

    except Exception as e:  # If an error occurs during the request
        # Print the error message
        print(f"❌ Error calling {model_name}: {str(e)}")

        # Store the error message and time in the results dictionary
        results[model_name] = {
            "response": f"Error: {str(e)}",  # Store the error message
            "time": None  # No time in case of an error
        }


✅ Successfully called Claude 3.7 Sonnet (took 2.88 seconds)
✅ Successfully called Claude 3.5 Sonnet (took 3.77 seconds)
✅ Successfully called Claude 3.5 Haiku (took 2.18 seconds)
✅ Successfully called Amazon Nova Pro (took 1.25 seconds)
✅ Successfully called Amazon Nova Micro (took 0.63 seconds)
✅ Successfully called Meta Llama 3.1 70B Instruct (took 3.49 seconds)


**Expected Output:** (The output below simulates successful, time-tracked calls. Actual times will vary.)

In [12]:
# Display results in a formatted way

for model_name, result in results.items():  # Loop through all models and their results
    if "Error" not in result["response"]:  # Check if there is no error in the model's response
        # Display the model name and the time taken to process the request
        display(Markdown(f"### {model_name} (took {result['time']} seconds)"))

        # Display the model's response (summarization or output text)
        display(Markdown(result["response"]))

        # Print a separator line for readability
        print("-" * 80)


### Claude 3.7 Sonnet (took 2.88 seconds)

AWS has launched Amazon Bedrock, a new API service that provides access to foundation models (FMs) from various AI companies, including Amazon's own Titan LLMs. The service democratizes generative AI by offering a serverless experience where customers can easily find, customize, and deploy text and image models without managing infrastructure. Bedrock integrates with existing AWS tools and allows private customization with customers' own data.

--------------------------------------------------------------------------------


### Claude 3.5 Sonnet (took 3.77 seconds)

Amazon has announced Bedrock, a new service that provides easy API access to various foundation models (FMs) from AI21 Labs, Anthropic, Stability AI, and Amazon itself, including two new Amazon Titan language models. Bedrock aims to democratize access to generative AI by offering a serverless experience that allows customers to easily find, customize, and deploy FMs for text and image applications. The service integrates with existing AWS tools and eliminates the need for infrastructure management, making it simpler for builders to scale their AI-based applications.

--------------------------------------------------------------------------------


### Claude 3.5 Haiku (took 2.18 seconds)

AWS has launched Amazon Bedrock, a new service that provides easy access to foundation models from various AI companies through an API, making generative AI more accessible to developers. Bedrock offers a serverless experience that allows customers to quickly find, customize, and integrate foundation models into their applications without managing infrastructure. The service includes models from AI21 Labs, Anthropic, Stability AI, and Amazon, with the ability to work with text and image models.

--------------------------------------------------------------------------------


### Amazon Nova Pro (took 1.25 seconds)

AWS has launched Amazon Bedrock, a new service providing easy access to Foundation Models (FMs) from various providers like AI21 Labs, Anthropic, Stability AI, and Amazon via an API, enabling customers to build and scale generative AI applications. Bedrock offers a serverless experience for selecting, customizing, and deploying FMs, including Amazon's new Titan LLMs, with seamless integration into existing AWS tools and capabilities.

--------------------------------------------------------------------------------


### Amazon Nova Micro (took 0.63 seconds)

AWS has launched Amazon Bedrock, a new service that provides easy access to a range of generative AI models from AI21 Labs, Anthropic, Stability AI, and Amazon via an API, enabling developers to quickly build and scale AI-based applications without managing infrastructure. Bedrock offers scalable, reliable, and secure access to powerful models for text and images, including new Amazon Titan models, and integrates seamlessly with AWS tools like Amazon SageMaker.

--------------------------------------------------------------------------------


### Meta Llama 3.1 70B Instruct (took 3.49 seconds)



Here is a concise summary of the text in 3 sentences:

Amazon has announced Amazon Bedrock, a new service that provides access to foundation models (FMs) from leading AI labs via an API. Bedrock makes it easy for customers to build and scale generative AI-based applications using FMs, with a scalable, reliable, and secure AWS managed service. The service offers a serverless experience, allowing customers to easily find, customize, and deploy FMs into their applications without managing infrastructure.

--------------------------------------------------------------------------------


**Expected Output:** (The following are representative examples of what the models might output, using the time placeholders from the previous step. The actual summary text would be the result of the prompt asking for a concise 2-3 sentence summary.)

### **2.5 Cross-Regional Inference in Amazon Bedrock**

Amazon Bedrock offers Cross-Regional Inference which automatically selects the optimal AWS Region within your geography to process your inference requests.

Cross-Regional Inference offers higher throughput limits (up to 2x allocated quotas) and seamlessly manages traffic bursts by dynamically routing requests across multiple AWS regions, enhancing application resilience during peak demand periods without additional routing or data transfer costs.

Customers can control where their inference data flows by selecting from a pre-defined set of regions, helping them comply with applicable data residency requirements and sovereignty laws. Moreover, this capability prioritizes the connected Bedrock API source region when possible, helping to minimize latency and improve responsiveness.



Let's see how easy it is to use Cross Region Inference by invoking the Claude 3.5 Sonnet model:

In [13]:
# Regular model invocation (standard region)

standard_response = bedrock.converse(
    modelId="anthropic.claude-3-5-sonnet-20240620-v1:0",  # Standard model ID (default region)
    messages=converse_request["messages"]  # Passing the conversation history
)

# Cross-region inference (note the "us." prefix)
cris_response = bedrock.converse(
    modelId="us.anthropic.claude-3-5-sonnet-20240620-v1:0",  # Cross-region model ID (US region)
    messages=converse_request["messages"]  # Passing the conversation history
)

# Print responses
print("Standard response:", standard_response["output"]["message"]["content"][0]["text"])  # Print the model response from the standard region
print("Cross-region response:", cris_response["output"]["message"]["content"][0]["text"])  # Print the model response from the cross-region


Standard response: Amazon has announced Bedrock, a new service that provides API access to foundation models (FMs) from various AI companies and Amazon itself. Bedrock aims to democratize access to generative AI by offering a user-friendly, serverless platform for building and scaling AI applications. The service allows customers to easily select, customize, and deploy FMs for text and image generation using familiar AWS tools and infrastructure, without the need to manage complex systems.
Cross-region response: Amazon has introduced Bedrock, a new service that provides API access to various foundation models (FMs) from AI21 Labs, Anthropic, Stability AI, and Amazon itself, including two new large language models from Amazon's Titan series. Bedrock aims to democratize access to generative AI by offering a serverless, scalable, and secure platform for developers to easily integrate and deploy FMs into their applications. The service allows customers to find suitable models, customize th

**Expected Output:** (The actual text will be very similar or identical, as the underlying model is the same, but they will be sourced from different endpoints.)

**Purpose**:
   - This setup demonstrates how to invoke models both in the **standard region** (default AWS setup) and in a **cross-region** scenario by specifying a **regional prefix** in the model ID.
   - The comparison between responses from both models helps **understand the impact of region-specific deployment**, such as response time and model consistency.

This approach helps you explore and compare the differences between invoking models in the **default region** and using **cross-region inference**, providing insights into the behavior of models hosted in different regions.


### **2.6 Multi-turn Conversations**
The Converse API makes multi-turn conversations simple. Let's see it in action:

In [14]:
# Example of a multi-turn conversation with Converse API

multi_turn_messages = [
    {
        "role": "user",  # First message from the user (initial summary request)
        "content": [{"text": f"Please summarize this text: {text_to_summarize}"}]  # User asks to summarize the text
    },
    {
        "role": "assistant",  # Response from the assistant (Claude model)
        "content": [{"text": results["Claude 3.7 Sonnet"]["response"]}]  # Assistant’s first response (summary)
    },
    {
        "role": "user",  # Follow-up message from the user (asking for a shorter summary)
        "content": [{"text": "Can you make this summary even shorter, just 1 sentence?"}]  # User asks for a more concise summary
    }
]

try:
    # Send the multi-turn conversation to the model using the Converse API
    response = bedrock.converse(
        modelId=MODELS["Claude 3.7 Sonnet"],  # Specify the model (Claude 3.7 Sonnet)
        messages=multi_turn_messages,  # Provide the conversation history (multi-turn)
        inferenceConfig={"temperature": 0.2, "maxTokens": 500}  # Set inference configuration to control creativity and length
    )

    # Extract the model's response using the correct structure
    follow_up_response = response["output"]["message"]["content"][0]["text"]  # Extract the summarized response

    # Display the follow-up response from the assistant
    display_response(follow_up_response, "Claude 3.7 Sonnet (Multi-turn conversation)")

except Exception as e:
    # Catch and display any errors encountered during the request
    print(f"Error: {str(e)}")



### Response from Claude 3.7 Sonnet (Multi-turn conversation)

Amazon Bedrock is AWS's new serverless API service that provides easy access to foundation models from multiple AI companies, allowing customers to find, customize, and deploy generative AI applications without managing infrastructure.


--------------------------------------------------------------------------------



**Expected Output:** (The response will be a single sentence, demonstrating the model maintained context from the previous turn.)

In this step, we explore how to handle **multi-turn conversations** using the Converse API, where the model generates responses based on the previous messages in the conversation.

#### Key Points:

1. **Multi-turn Conversation Setup**:
   - The `multi_turn_messages` list contains multiple messages, each representing an interaction between the **user** and the **assistant** (model).
   - **First message**: A summarization prompt from the **user** asking the model to summarize the given text.
   - **Second message**: The **assistant's** response (model's output) to the summarization prompt.
   - **Third message**: A follow-up question from the **user**, requesting the model to shorten the summary even further.

2. **Calling the Converse API**:
   - The `bedrock.converse()` function sends the entire **multi-turn conversation** to the model specified by the **modelId**.
   - The **`inferenceConfig`** includes parameters like **`temperature`** (controls creativity) and **`maxTokens`** (limits response length).

3. **Extracting and Displaying the Response**:
   - After receiving the response from the API, the `follow_up_response` is extracted from the returned JSON structure.
   - The response is then displayed using the `display_response()` function, which presents it in a user-friendly format.

4. **Error Handling**:
   - If any error occurs (e.g., network issues, model errors), an exception is caught, and an error message is printed for troubleshooting.
   - This helps ensure smooth execution of the API calls and helps identify and resolve any issues quickly.


This example demonstrates how to **manage multi-turn conversations** with the **Converse API**, enabling more **interactive communication** with the model. It allows the model to reference previous exchanges, providing more meaningful and contextually aware responses.


### **2.7 Streaming Responses with ConverseStream API**

For longer generations, you might want to receive the content as it's being generated. The ConverseStream API supports streaming, which allows you to process the response incrementally:

In [21]:
# Example of streaming with Converse API

def stream_converse(model_id, messages, inference_config=None):
    if inference_config is None:
        inference_config = {}  # Set default inference config if none provided

    print("Streaming response (chunks will appear as they are received):\n")
    print("-" * 80)

    full_response = ""  # Initialize an empty string to store the full response

    try:
        # Sending the conversation to the model and enabling streaming
        response = bedrock.converse_stream(
            modelId=model_id,  # Model ID to specify which model to use
            messages=messages,  # The conversation history to be sent to the model
            inferenceConfig=inference_config  # Inference configuration (temperature, max tokens, etc.)
        )

        # Retrieve the stream of the response from the model
        response_stream = response.get('stream')
        if response_stream:
            for event in response_stream:  # Iterate through each event in the response stream

                # Check for message start event and display the role (user or assistant)
                if 'messageStart' in event:
                    print(f"\nRole: {event['messageStart']['role']}")

                # If a content block delta is present, display the text content
                if 'contentBlockDelta' in event:
                    print(event['contentBlockDelta']['delta']['text'], end="")

                # Check for message stop event and display the stop reason
                if 'messageStop' in event:
                    print(f"\nStop reason: {event['messageStop']['stopReason']}")

                # If metadata is present, display usage and latency information
                if 'metadata' in event:
                    metadata = event['metadata']
                    if 'usage' in metadata:  # Display token usage information
                        print("\nToken usage")
                        print(f"Input tokens: {metadata['usage']['inputTokens']}")
                        print(f"Output tokens: {metadata['usage']['outputTokens']}")
                        print(f"Total tokens: {metadata['usage']['totalTokens']}")
                    if 'metrics' in event['metadata']:  # Display latency information
                        print(f"Latency: {metadata['metrics']['latencyMs']} milliseconds")

            # End of the stream, display a separator
            print("\n" + "-" * 80)

        return full_response  # Return the full response (not updated here, as the response is printed in chunks)

    except Exception as e:
        # Handle any errors that occur during the streaming process
        print(f"Error in streaming: {str(e)}")
        return None  # Return None if an error occurs

print("Done")


Done


**Output:** *(No output. The `stream_converse` function is defined in memory.)*

In [23]:
# Let's try streaming a longer summary

# Define the streaming request, which contains the user's conversation history.
# In this case, we are asking the model to provide a detailed summary of the text provided above.

streaming_request = [
    {
        "role": "user",  # The 'role' indicates that the user is sending the request to the model.
        "content": [
            {
                "text": f"""Please provide a detailed summary of the following text, explaining its key points and implications:

                {text_to_summarize}  # This is the actual text content that we want summarized.

                Make your summary comprehensive but clear.  # Additional instructions for the model to ensure clarity and comprehensiveness.
                """
            }
        ]
    }
]


**Output:** *(No output. The `streaming_request` list is defined.)*

In [24]:
# Only run this when you're ready to see streaming output

# This line of code calls the 'stream_converse' function to initiate the streaming conversation with the specified model.

streamed_response = stream_converse(
    MODELS["Claude 3.7 Sonnet"],  # The model being used for this conversation, "Claude 3.7 Sonnet" in this case.
    streaming_request,  # The conversation request (including the detailed prompt) defined earlier.
    inference_config={"temperature": 0.4, "maxTokens": 1000}  # Inference parameters to control creativity and output length.
)


Streaming response (chunks will appear as they are received):

--------------------------------------------------------------------------------

Role: assistant
# Summary of Amazon Bedrock Announcement

## Key Points

Amazon has announced **Amazon Bedrock**, a new service that provides API access to foundation models (FMs) from multiple AI companies including AI21 Labs, Anthropic, Stability AI, and Amazon itself. This announcement comes in response to customer feedback seeking easier access to generative AI capabilities.

The service features:

- **Multiple model options** for both text and image generation
- **Amazon's own Titan foundation models**, including two newly announced large language models (LLMs)
- A **serverless architecture** eliminating infrastructure management needs
- **Private customization** capabilities allowing customers to fine-tune models with their own data
- **Integration with existing AWS tools** including SageMaker ML features like Experiments and Pipelines



**Expected Output:** (The output would appear in real-time chunks, followed by metadata upon completion. This is a simulated, completed streaming output.)

## **Conclusion**

In this notebook, we explored how to effectively interact with Amazon **Bedrock** and **Converse API** to perform tasks like **text summarization** and **real-time streaming**. Here's a summary of what we've learned:

1. **Text Summarization with Invoke API**:
   - We began by demonstrating how to summarize text using the **Invoke Model API**, where we sent a single request to the model and extracted the response.
   - We further extended this by using the **Converse API**, which simplifies multi-turn conversations and allows us to send more complex, dynamic requests like **multi-turn conversations** and **real-time streaming**.

2. **Cross-Region Requests**:
   - We explored how to call models both in the **default region** and with **cross-region inference**, allowing us to better understand the impact of latency and model availability.

3. **Streaming Responses**:
   - We integrated **real-time streaming** of model responses using the **Converse API**, which enabled us to receive and display outputs as they are generated, making interactions more dynamic.

4. **Error Handling**:
   - Throughout the examples, we implemented robust **error handling** to catch common issues like permission errors, ensuring the code runs smoothly and provides helpful troubleshooting resources when necessary.

5. **Benefits of Converse API**:
   - The **Converse API** proved to be a powerful tool for simplifying requests and responses by standardizing the format, supporting multi-turn conversations, and offering easy configuration options for controlling model behavior (like temperature, maxTokens, etc.).


The integration of **Amazon Bedrock**, **Converse API**, and **Claude 3.7 Sonnet** offers a streamlined way to interact with advanced foundation models, enabling real-time, high-quality text generation tasks such as summarization, multi-turn conversations, and interactive feedback, ideal for use in a wide range of applications such as chatbots, content generation, and AI-driven applications.
