# **Lab: Building Conversational AI Solutions with AWS Bedrock using Converse API**
----

This notebook provides sample code with step by step instructions for using Amazon Bedrock's **Converse API**.

----


### **Introduction**

Welcome to this introduction to building conversational AI with Amazon Bedrock's Converse API! The primary goal of this chapter is to provide a comprehensive introduction to Amazon Bedrock APIs for generating text. While we'll explore various use cases like summarization and code generation, our focus is on understanding the API patterns.

In this notebook, you will:

1. Learn the basics of the Amazon Bedrock **Invoke API**
2. Explore the more powerful **Converse API** and it's features like multi-turn conversation, streaming, or function calling
3. Apply these APIs across various foundation models
4. Compare results across different state-of-the-art models

### **Scanario**

You are developing a **conversational AI** application for a healthcare provider to assist patients with booking appointments, retrieving medical records, and answering health-related queries.

The system needs to leverage **AWS Bedrock** and the Amazon Titan embedding model to process and generate accurate, contextually relevant responses.

Your task is to integrate **AWS Bedrock** with the **Titan embedding model** for natural language understanding and response generation, ensuring the system can handle multiple types of queries efficiently while providing reliable and relevant answers.

### **Discription:**

In this lab, you will integrate **AWS Bedrock** and the Amazon Titan embedding model to build a conversational AI system.

You will configure the model to process user queries and generate intelligent responses. The lab will guide you through setting up **AWS API Gateway** and **Lambda functions** to expose the conversational API, optimizing the model for performance and accuracy.

You will also learn error handling, troubleshooting, and best practices to ensure the system delivers seamless and accurate responses in real-time.

## **1. Setup**

### **1.1 Import the required libraries**

In [None]:
import json
import boto3
import botocore
from IPython.display import display, Markdown
import time

**Output:** (No output for successful import. If run in a live environment, the code would execute silently.)

### **Library Imports Explanation**

In this step, we are importing specific libraries that will allow us to interact with AWS services, handle JSON data, display outputs in a Jupyter notebook, and manage timing.

- **`json`**: Handles **JSON data** for parsing and generating JSON objects, commonly used in API responses from AWS.
  
- **`boto3`**: AWS SDK for Python, allowing interaction with **AWS services** (e.g., S3, EC2) to manage resources.
  
- **`botocore`**: Low-level library that **boto3** uses to manage requests, errors, and session handling with AWS.
  
- **`IPython.display`**: Displays **rich content** (e.g., **Markdown**) in Jupyter notebooks for better presentation.
  
- **`time`**: Provides time functions like **`sleep()`** to pause execution and manage time intervals.

These libraries enable efficient interaction with AWS, data handling, and display management in Jupyter notebooks.


### **1.2 Initial setup for clients, global variables and helper functions**

In [None]:
# Initialize a boto3 session, which allows interaction with AWS services.
# A session is used to manage AWS credentials, configurations, and region settings.
session = boto3.session.Session()

# Get the region name of the session to configure the AWS client appropriately.
region = session.region_name

# Initialize the Bedrock client using the session's region.
# This client will be used to interact with Amazon Bedrock's API for running foundation models.
bedrock = boto3.client(service_name='bedrock-runtime', region_name=region)


**Output:** *(No output for successful client initialization. If run in a live environment, the code would execute silently and the Bedrock client would be ready.)*

In this step, we are setting up the necessary clients and global variables required for interacting with **AWS Bedrock**.

- **`boto3.session.Session()`**:
  - Initializes a new **Boto3 session**. This session is responsible for managing the configuration and credentials that Boto3 uses to interact with AWS services. By creating a session, we ensure that all interactions with AWS are properly authenticated and configured.

- **`region = session.region_name`**:
  - Retrieves the **AWS region** that is currently configured for the session. This allows the application to automatically use the region specified in your environment or AWS configuration.

- **`bedrock = boto3.client(service_name='bedrock-runtime', region_name=region)`**:
  - Initializes the **AWS Bedrock client** using **Boto3**. The `bedrock-runtime` client is responsible for interacting with the Bedrock service, which provides access to foundation models and other AI services. The region is passed dynamically from the session, ensuring that the client is created for the correct AWS region.

This setup prepares the environment to interact with **AWS Bedrock** and sets up the necessary client configuration for making API requests.


In [None]:
# Define model IDs that will be used in this module

# These model IDs are required to interact with the specific foundation models using Amazon Bedrock API.

MODELS = {

    "Claude 3.7 Sonnet": "us.anthropic.claude-3-7-sonnet-20250219-v1:0", # Claude 3.7 Sonnet model from Anthropic, using the model ID with versioning
    "Claude 3.5 Sonnet": "us.anthropic.claude-3-5-sonnet-20240620-v1:0", # Claude 3.5 Sonnet model from Anthropic with versioning
    "Claude 3.5 Haiku": "us.anthropic.claude-3-5-haiku-20241022-v1:0", # Claude 3.5 Haiku model from Anthropic, another variation with versioning
    "Amazon Nova Pro": "us.amazon.nova-pro-v1:0",  # Amazon Nova Pro model, a generative AI model from Amazon
    "Amazon Nova Micro": "us.amazon.nova-micro-v1:0",  # Amazon Nova Micro model, a smaller version of Amazon Nova for resource-constrained environments
    "Meta Llama 3.1 70B Instruct": "us.meta.llama3-1-70b-instruct-v1:0" # Meta Llama 3.1 70B Instruct model from Meta, used for instruction-following tasks
}


**Output:** (No output. The MODELS dictionary is defined in memory.)

This block of code helps you easily reference different AI models by their unique IDs, making it simpler to switch models or call them in subsequent parts of the code.

By organizing the models in a dictionary, you can easily iterate or look up specific models as needed in your application.

In [None]:
# Utility function to display model responses in a more readable format

def display_response(response, model_name=None):
    # If a model name is provided, display it as a Markdown header
    if model_name:
        display(Markdown(f"### Response from {model_name}"))

    # Display the model's response as Markdown content (formatted text)
    display(Markdown(response))

    # Print a separator line for better readability of the output
    print("\n" + "-"*80 + "\n")


**Output:** (No output. The `display_response` function is defined in memory.)

This utility function, `display_response`, is designed to display the **response from a model** in a more readable and formatted way. It enhances the readability of the output in Jupyter notebooks, making it easier to interpret results from the AI models.

- **Function Parameters**:
  - **`response`**: The model's output (usually a string) that you want to display.
  - **`model_name`** (optional): The name of the model that generated the response. If provided, it will be included in the display header.

- **What the Function Does**:
  1. **If `model_name` is provided**: It displays a markdown header with the model's name (e.g., "Response from Claude 3.7 Sonnet").
  2. **Displays the response**: The function then displays the actual **response** (the model output) using **Markdown** for proper formatting.
  3. **Prints a separator line**: After the response, a line of dashes (`-`) is printed to clearly separate the outputs when multiple responses are displayed.

- **Why We Use It**:
  - The function makes the model responses easy to read and well-structured, especially in Jupyter notebooks where markdown formatting is useful for visual clarity.

## **2. Text Summarization with Foundation Models**

Let's start by exploring how to leverage Amazon Bedrock APIs for text summarization. We'll first use the basic Invoke API, then introduce the more powerful Converse API.

As an example, let's take a paragraph about Amazon Bedrock from an [AWS blog post](https://aws.amazon.com/jp/blogs/machine-learning/announcing-new-tools-for-building-with-generative-ai-on-aws/).


In [None]:
text_to_summarize = """
AWS took all of that feedback from customers, and today we are excited to announce Amazon Bedrock, \
a new service that makes FMs from AI21 Labs, Anthropic, Stability AI, and Amazon accessible via an API. \
Bedrock is the easiest way for customers to build and scale generative AI-based applications using FMs, \
democratizing access for all builders. Bedrock will offer the ability to access a range of powerful FMs \
for text and images—including Amazons Titan FMs, which consist of two new LLMs we're also announcing \
today—through a scalable, reliable, and secure AWS managed service. With Bedrock's serverless experience, \
customers can easily find the right model for what they're trying to get done, get started quickly, privately \
customize FMs with their own data, and easily integrate and deploy them into their applications using the AWS \
tools and capabilities they are familiar with, without having to manage any infrastructure (including integrations \
with Amazon SageMaker ML features like Experiments to test different models and Pipelines to manage their FMs at scale).
"""

**Output:** *(No output. The `text_to_summarize` variable is defined.)*

In this step, we will use Amazon Bedrock's **Invoke API** to perform text summarization on a sample paragraph. This will demonstrate how to leverage **Foundation Models (FMs)** from **Amazon Bedrock** to quickly summarize lengthy content.

**Text to Summarize:**

We will start with the following text about **Amazon Bedrock** from an AWS blog post.

The Above paragraph describes Amazon Bedrock, an AWS service that enables easy access to foundation models (FMs) from various providers, simplifying the process of building and scaling generative AI applications.

### **2.1 Text Summarization using the Invoke Model API**

Amazon Bedrock's **Invoke Model API** serves as the most basic method for sending requests to foundation models. Since each model family has its own distinct request and response format, you'll need to craft specific JSON payloads tailored to each model.

For this example, we will call Claude 3.7 Sonnet via Invoke Model API (using the `invoke_model` function of the Bedrock Runtime Client) to generate a summary of our text.

In [None]:
# Create the prompt for summarization
# We are using an f-string to dynamically insert the `text_to_summarize` into the prompt for the model.

prompt = f"""Please provide a summary of the following text. Do not add any information that is not mentioned in the text below.
<text>
{text_to_summarize}  # Insert the long text to be summarized here
</text>
"""


**Output:** *(No output. The `prompt` variable is defined.)*

This prompt is designed to guide the model to generate an accurate and concise summary of the provided paragraph without introducing any additional information.

**Example Usage:**

This prompt will be sent to a text summarization API (e.g., Amazon Bedrock) to generate a summary of the content. The result will be a shortened, easy-to-read version of the input paragraph.

In [None]:
# Create request body for Claude 3.7 Sonnet

claude_body = json.dumps({                     # Convert the dictionary into a JSON-formatted string for the API
    "anthropic_version": "bedrock-2023-05-31", # Specifies the version of the Claude model API being called
    "max_tokens": 1000,                        # Maximum number of tokens the model is allowed to generate
    "temperature": 0.5,                        # Controls creativity (0 = deterministic, 1 = creative)
    "top_p": 0.9,                              # Nucleus sampling to control diversity of the output
    "messages": [                              # List of messages representing the conversation history
        {
            "role": "user",                    # Indicates that the message is coming from the user
            "content": [{"type": "text", "text": prompt}]  # Actual text prompt asking for summarization
        }
    ],
})


**Output:** *(No output. The `claude_body` JSON string is created.)*

In this step, we are using the `json.dumps()` method to prepare the **request body** for the **Claude 3.7 Sonnet** model. This request body is in **JSON format**, which is required by the **Amazon Bedrock API** to process the request.


1. **`json.dumps()`**:
   - We are using the `json.dumps()` method to convert a **Python dictionary** into a **JSON string**. This is necessary because APIs typically expect the data in **JSON format**.

2. **`anthropic_version`**:
   - This specifies the **version** of the Claude model being used. In this case, it is set to `bedrock-2023-05-31`. This version indicates which model and API version the request will interact with.

3. **`max_tokens`**:
   - This parameter defines the **maximum number of tokens** the model should generate in the response. Here, it is set to `1000` tokens. A **token** typically represents a word or part of a word, and this controls the length of the response generated by the model.

4. **`temperature`**:
   - This controls the **creativity** of the generated response. A value of `0.5` strikes a balance between **randomness** and **determinism**.
   - Lower values (closer to 0) make the model's output more focused and deterministic, while higher values (closer to 1) make it more creative and diverse.

5. **`top_p`**:
   - This parameter controls **nucleus sampling**. It defines the **probability threshold** for selecting tokens.
   - A value of `0.9` means the model will only consider the **top 90%** of possible tokens for generating the next word, ensuring that the output remains coherent and contextually relevant.

6. **`messages`**:
   - This section contains the **conversation history**.
   - The model receives a **message** from the **user** (you), containing the **summarization prompt** as input content.
   - The **role** (`user`) indicates who is sending the message, and the **content** is the actual **text prompt** that needs to be summarized.

This request body is then sent to the **Claude 3.7 Sonnet** model, which processes the input and generates a response based on the provided prompt and settings.




In [None]:
# Send request to Claude 3.7 Sonnet

try:
    response = bedrock.invoke_model(                     # Call the Bedrock Invoke API to run the model
        modelId=MODELS["Claude 3.7 Sonnet"],             # The model ID for Claude 3.7 Sonnet
        body=claude_body,                                # JSON request body created earlier
        accept="application/json",                       # Expected response format
        contentType="application/json"                   # Content type of the request body
    )
    response_body = json.loads(response.get('body').read())  # Read and parse the model's JSON response

    # Extract and display the response text
    claude_summary = response_body["content"][0]["text"]     # Extract the actual summary text from the response
    display_response(claude_summary, "Claude 3.7 Sonnet (Invoke Model API)")  # Display formatted output

except botocore.exceptions.ClientError as error:             # Handle AWS API errors
    if error.response['Error']['Code'] == 'AccessDeniedException':
        # Provide helpful message for access issues
        print(f"\x1b[41m{error.response['Error']['Message']}\
            \nTo troubleshoot this issue please refer to the following resources.\
            \nhttps://docs.aws.amazon.com/IAM/latest/UserGuide/troubleshoot_access-denied.html\
            \nhttps://docs.aws.amazon.com/bedrock/latest/userguide/security-iam.html\x1b[0m\n")
    else:
        raise error                                          # Re-throw other unexpected errors


**Expected Output:** (Assuming a successful API call, the output would look similar to this. The actual text will vary slightly each time.)

In this step, we send a request to the **Claude 3.7 Sonnet** model using the **Amazon Bedrock API** and handle the response.

#### `bedrock.invoke_model()`:

- **Purpose**: This function sends a request to the **Claude 3.7 Sonnet** model (using the model ID defined in the `MODELS` dictionary) via the **Amazon Bedrock API**.
  
- **Parameters**:
  - **`modelId`**: Specifies the model we are using (Claude 3.7 Sonnet).
  - **`body`**: The prepared request body (`claude_body`), which contains the input for the model (in this case, the summarization prompt).
  - **`accept`**: Specifies that we expect the response to be in **JSON format** (`application/json`).
  - **`contentType`**: Specifies that the request body is also in **JSON format**.

#### Response Parsing:

- The **API response** is in JSON format. We use **`json.loads()`** to parse the JSON response into a Python dictionary.
  
- **Extracting the Summary**:
  - After parsing the response, we extract the summary text from the response body.
  - This accesses the **`text`** field within the **`content`** array in the response.

#### Error Handling:

- If the request fails due to an **Access Denied error** (denoted as `AccessDeniedException`), we display a **troubleshooting message** with relevant resources to resolve IAM permission issues.
  
- For other errors, the error is raised for further investigation.

#### `display_response()`:

- After extracting the summary from the API response, the **`display_response()`** function (defined earlier) is used to display the result in a **readable format** in the notebook.
- This function ensures the response is formatted and displayed clearly for easy interpretation.


This setup ensures that you can **send a request**, **process the response**, handle errors, and **display the result** from the **Claude 3.7 Sonnet** model in a structured and efficient manner.


### **2.2 Text Summarization using the Converse API (Recommended Approach)**

While the **Invoke Model API** allows direct access to foundation models, it has several limitations:
1. it uses different request/response formats for each model family;
2. there is no built-in support for multi-turn conversations;
3. it requires custom handling for different model capabilities.

The **Converse API** addresses these limitations by providing a unified interface. Let's explore it on our text summarization task:

In [None]:
# Create a converse request with our summarization task

converse_request = {
    "messages": [  # List of messages representing the conversation
        {
            "role": "user",  # Role of the sender (user in this case)
            "content": [  # Content of the message
                {
                    "text": f"Please provide a concise summary of the following text in 2-3 sentences. Text to summarize: {text_to_summarize}"  # The summarization prompt, including the text to summarize
                }
            ]
        }
    ],
    "inferenceConfig": {  # Configuration settings for controlling the model’s response
        "temperature": 0.4,  # Controls the randomness of the model's output (lower value for more deterministic output)
        "topP": 0.9,  # Nucleus sampling for controlling diversity of the output
        "maxTokens": 500  # Maximum number of tokens (words) for the summary response
    }
}



**Output:** *(No output. The `converse_request` dictionary is defined.)*

In this step, we are sending a **request to the Converse API** for text summarization. Here's a breakdown of the key points:

1. **`messages`**:
   - The `messages` array contains the communication between the **user** and the **model**. In this case, the **user** is sending a request to the model to summarize the provided text in **2-3 sentences**.

2. **`role`**:
   - The `role` field specifies **who is sending the message**. Here, the role is **`user`**, indicating that the input is coming from the user.

3. **`content`**:
   - The `content` key holds the **actual text content** that we want the model to process. The **text field** contains a prompt that explicitly asks the model to **summarize** the provided text.

4. **`inferenceConfig`**:
   - **`temperature`** (0.4): This controls the **creativity** of the model’s response. A value of 0.4 keeps the output more focused and deterministic, preventing randomness.
   - **`topP`** (0.9): This sets the **probability threshold** for token selection. A value of 0.9 means the model will only consider the **top 90%** of possible tokens, ensuring coherence and relevance.
   - **`maxTokens`** (500): This defines the **maximum length** of the model's response. It limits the output to **500 tokens**, ensuring that the summary remains concise and manageable.

##### **Purpose:**
This request structure allows you to send the **text summarization prompt** to the **Converse API**. The model will process the input and generate a **summarized version** of the provided content, limited to **2-3 sentences**.


This setup ensures that we can interact with the **Converse API** to perform text summarization efficiently, using well-defined configurations for response control.


In [None]:
# Call Claude 3.7 Sonnet with Converse API

try:
    # Send the conversation request to the model using the Converse API
    response = bedrock.converse(
        modelId=MODELS["Claude 3.7 Sonnet"],  # Model ID for Claude 3.7 Sonnet from the MODELS dictionary
        messages=converse_request["messages"],  # The message list containing the user prompt
        inferenceConfig=converse_request["inferenceConfig"]  # Configuration parameters for the model (temperature, maxTokens)
    )


    # Extract the model's response from the JSON response body
    claude_converse_response = response["output"]["message"]["content"][0]["text"]  # Extracts the text from the response
    display_response(claude_converse_response, "Claude 3.7 Sonnet (Converse API)")  # Display the model’s response using the display_response function


except botocore.exceptions.ClientError as error:  # Handle any AWS API client errors
    if error.response['Error']['Code'] == 'AccessDeniedException':  # If the error is due to access denial
        # Print the error message in red, along with instructions for troubleshooting access issues
        print(f"\x1b[41m{error.response['Error']['Code']}: {error.response['Error']['Message']}\x1b[0m")
        print("Please ensure you have the necessary permissions for Amazon Bedrock.")

    else:
        raise error  # For other errors, raise the exception for further investigation


**Expected Output:** (Assuming a successful API call, the output would look similar to this. The actual text will vary slightly but adhere to the 2-3 sentence limit.)

In this step, we send a **Converse request** to the **Claude 3.7 Sonnet** model using the **Amazon Bedrock API** and handle the response.

#### `bedrock.converse()`:

- **Purpose**: This function sends the **Converse request** to the **Claude 3.7 Sonnet** model via the **Amazon Bedrock API**.
  
- **Parameters**:
  - **`modelId`**: Specifies the model we are using (Claude 3.7 Sonnet).
  - **`messages`**: Contains the user input, which in this case is the **summarization prompt**.
  - **`inferenceConfig`**: Defines the configuration for generating the response (e.g., **temperature**, **topP**, **maxTokens**).

##### **Response Parsing:**

- After sending the request, the **response** is parsed to extract the **summarized text**.


##### **Error Handling:**

- **Access Denied Error**:
  - If an **Access Denied** error (`AccessDeniedException`) occurs, a troubleshooting message is displayed to guide the user on resolving **AWS IAM permission** issues.
  
- **Other Errors**:
  - For any other errors, the exception is raised for further investigation.

##### `display_response()`:

- Once the summary is extracted, the **summarized content** is displayed in a readable format using the **`display_response()`** function. This function was defined earlier to present the output clearly.


This code demonstrates how to send a request to the **Claude 3.7 Sonnet** model using the **Converse API**, process the response, and handle potential errors effectively.


### **2.3 Overview of the Converse API**

Now, that we have used the **Converse API**, let's take some time to take a closer look. To use the Converse API, you use the <a href="https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Converse.html" target="_blank">Converse</a> or <a href="https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ConverseStream.html" target="_blank">ConverseStream</a> (for streaming responses) operations to send messages to a model.

While, it is possible to use the existing base inference operations (<a href="https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html" target="_blank">InvokeModel</a> or <a href="https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html" target="_blank">InvokeModelWithResponseStream</a>) for conversation applications as well, we recommend using the Converse API as it provides consistent API, that works with <a href="https://docs.aws.amazon.com/bedrock/latest/userguide/converse-api.html" target="_blank">all Amazon Bedrock models that support messages</a>.

This means you can write code once and use it with different models. Should a model have unique inference parameters, the Converse API also allows you to pass those unique parameters in a model specific structure.

You can use the Amazon Bedrock Converse API to create conversational applications that send and receive messages to and from an Amazon Bedrock model.

For example, you can create a chat bot that maintains a conversation over many turns and uses a persona or tone customization that is unique to your needs, such as a helpful technical support assistant. The Converse API also supports other Bedrock capabilites, like <a href="https://docs.aws.amazon.com/bedrock/latest/userguide/tool-use.html" target="_blank">tool use</a> and <a href="https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails-use-converse-api.html" target="_blank">guardrails</a>.


Let's break down its key components (you can also review the <a href="https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-runtime/client/converse.html" target="_blank">documentation</a> for a full list of parameters):

```json
{
  "modelId": "us.anthropic.claude-3-7-sonnet-20250219-v1:0", // Required: Model identifier for the Claude 3.7 Sonnet model
  
  "messages": [ // Required: Conversation history
    {
      "role": "user", // Specifies that the message is from the user
      "content": [
        {
          "text": "Your prompt or message here" // The actual message content or prompt to the model
        }
      ]
    }
  ],
  
  "system": [ // Optional: System instructions that guide the model's behavior
    {
      "text": "You are a helpful AI assistant." // Example: A simple instruction for the model to act as a helpful assistant
    }
  ],
  
  "inferenceConfig": { // Optional: Parameters for controlling model inference behavior
    "temperature": 0.7, // Controls the randomness of the response (0.0 = deterministic, 1.0 = very random)
    "topP": 0.9, // Controls diversity by setting a threshold for token selection, making the response more diverse
    "maxTokens": 2000, // Maximum number of tokens (words/parts of words) allowed in the response
    "stopSequences": [] // Optional: Define sequences to stop the model's response generation (e.g., specific words/phrases)
  },
  
  "toolConfig": { // Optional: Settings for function calling setup (if using external tools or APIs)
    "tools": [], // List of tools or functions the model can use (empty in this case)
    "toolChoice": {
      "auto": {} // Let the model decide automatically when to use the tools
    }
  }
}

```

### **2.4 Easily switch between models**

One of the biggest advantages of the Converse API is the ability to easily switch between models using the exact same request format. Let's compare summaries across different foundation models by looping over the model dictionary we defined above:

In [None]:
# call different models with the same converse request

results = {}  # Initialize an empty dictionary to store results for each model

# Loop over all models defined in the MODELS dictionary
for model_name, model_id in MODELS.items():  # model_name is the name of the model, model_id is its identifier
    try:
        # Record the start time to calculate response time
        start_time = time.time()

        # Send the converse request to the model
        response = bedrock.converse(
            modelId=model_id,  # The model ID to be used
            messages=converse_request["messages"],  # The messages to be sent to the model
            inferenceConfig=converse_request["inferenceConfig"] if "inferenceConfig" in converse_request else None  # Optional inference config
        )

        # Record the end time after receiving the response
        end_time = time.time()

        # Extract the model's response from the API response
        model_response = response["output"]["message"]["content"][0]["text"]

        # Calculate the response time
        response_time = round(end_time - start_time, 2)

        # Store the response and time in the results dictionary
        results[model_name] = {
            "response": model_response,  # Store the model's response
            "time": response_time  # Store the time taken to get the response
        }

        # Print success message with model name and response time
        print(f"✅ Successfully called {model_name} (took {response_time} seconds)")

    except Exception as e:  # If an error occurs during the request
        # Print the error message
        print(f"❌ Error calling {model_name}: {str(e)}")

        # Store the error message and time in the results dictionary
        results[model_name] = {
            "response": f"Error: {str(e)}",  # Store the error message
            "time": None  # No time in case of an error
        }


**Expected Output:** (The output below simulates successful, time-tracked calls. Actual times will vary.)

This step demonstrates how we call multiple models using the **Converse API** and track their responses and processing time.

#### Key Points:

1. **Looping through Models**:
   - The `for` loop iterates over all the models listed in the `MODELS` dictionary. For each model, we extract the **`model_name`** (e.g., "Claude", "Titan") and the **`model_id`** (the model's unique identifier).

2. **Sending the Converse Request**:
   - For each model, the `bedrock.converse()` function is called with the following parameters:
     - **`modelId`**: The unique identifier for the model.
     - **`messages`**: The input prompt or conversation history sent to the model.
     - **`inferenceConfig`** (optional): Parameters for controlling the response, like temperature, topP, and maxTokens.

3. **Tracking Time**:
   - **`start_time`** is recorded before making the request to monitor how long the model takes to process the input.
   - **`end_time`** is recorded after receiving the response.
   - The **`response_time`** is calculated by subtracting **`start_time`** from **`end_time`**.

4. **Extracting the Model’s Response**:
   - After receiving the response from the API, the **model's response** (summary or text output) is extracted from the returned JSON structure.
   - This gets the **text content** of the model's response.

5. **Handling Errors**:
   - If an **error** occurs while calling the model (e.g., permission issues, API errors), an **exception** is caught.
   - The error message is printed, and the error response is stored in the `results` dictionary for future review and troubleshooting.

6. **Storing Results**:
   - The **response** and **response time** for each model are stored in the `results` dictionary.
   - This allows us to track both the **output** and the **time taken** for each API call.

7. **Output**:
   - If the **call is successful**, the model’s response and **response time** are printed in a **success message**.
   - If there’s an **error**, an **error message** is printed, and the error response is stored in the `results`.

---

This approach ensures that we can call multiple models efficiently, compare their responses, and monitor how long each model takes to process the request. The results for each model, including the **response** and **response time**, are stored in the `results` dictionary for further analysis.


In [None]:
# Display results in a formatted way

for model_name, result in results.items():  # Loop through all models and their results
    if "Error" not in result["response"]:  # Check if there is no error in the model's response
        # Display the model name and the time taken to process the request
        display(Markdown(f"### {model_name} (took {result['time']} seconds)"))

        # Display the model's response (summarization or output text)
        display(Markdown(result["response"]))

        # Print a separator line for readability
        print("-" * 80)


**Expected Output:** (The following are representative examples of what the models might output, using the time placeholders from the previous step. The actual summary text would be the result of the prompt asking for a concise 2-3 sentence summary.)

In this step, we loop through the results of each model and display them in a **structured and readable format**.

#### Key Points:

1. **Looping through Results**:
   - The `for` loop iterates through the `results` dictionary, where each **`model_name`** (e.g., "Claude", "Titan") and its corresponding **`result`** (response and time) are processed.

2. **Checking for Errors**:
   - The condition `if "Error" not in result["response"]:` ensures that **only successful responses** are displayed. If there was an error in the model response (e.g., "Error: Access Denied"), that model’s result is skipped.
   - This ensures that only the correct outputs are shown and that any errors are excluded from the displayed results.

3. **Displaying Results**:
   - For each successful model response, the **model name** and the **time taken** to get the response are displayed as a **Markdown heading**.
   - The **model’s response** (summary or output text) is displayed underneath the heading.
   - A separator line (`"-" * 80`) is printed after each model’s result for better readability, visually separating the outputs of different models.


This approach helps organize the results of each model in a **clear and readable format**, making it easier to compare the outputs and analyze the performance of each model.


### **2.5 Cross-Regional Inference in Amazon Bedrock**

Amazon Bedrock offers Cross-Regional Inference which automatically selects the optimal AWS Region within your geography to process your inference requests.

Cross-Regional Inference offers higher throughput limits (up to 2x allocated quotas) and seamlessly manages traffic bursts by dynamically routing requests across multiple AWS regions, enhancing application resilience during peak demand periods without additional routing or data transfer costs.

Customers can control where their inference data flows by selecting from a pre-defined set of regions, helping them comply with applicable data residency requirements and sovereignty laws. Moreover, this capability prioritizes the connected Bedrock API source region when possible, helping to minimize latency and improve responsiveness.

As a result, customers can enhance their applications' reliability, performance, and efficiency. Please review the list of <a href="https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles-support.html" target="_blank">supported regions and models for inference profiles</a>.

To use Cross-Regional Inference, you simply need to specify a cross-region inference profile as the `modelId` when making a request. Cross-region inference profiles are identified by including a region prefix (e.g., `us.` or `eu.`) before the model name.

For example:
```json
{
    "Amazon Nova Pro": "amazon.nova-pro-v1:0",  # Regular model ID
    "Amazon Nova Pro (CRIS)": "us.amazon.nova-pro-v1:0"  # Cross-regional model ID
}
```



Let's see how easy it is to use Cross Region Inference by invoking the Claude 3.5 Sonnet model:

In [None]:
# Regular model invocation (standard region)

standard_response = bedrock.converse(
    modelId="anthropic.claude-3-5-sonnet-20240620-v1:0",  # Standard model ID (default region)
    messages=converse_request["messages"]  # Passing the conversation history
)

# Cross-region inference (note the "us." prefix)
cris_response = bedrock.converse(
    modelId="us.anthropic.claude-3-5-sonnet-20240620-v1:0",  # Cross-region model ID (US region)
    messages=converse_request["messages"]  # Passing the conversation history
)

# Print responses
print("Standard response:", standard_response["output"]["message"]["content"][0]["text"])  # Print the model response from the standard region
print("Cross-region response:", cris_response["output"]["message"]["content"][0]["text"])  # Print the model response from the cross-region


**Expected Output:** (The actual text will be very similar or identical, as the underlying model is the same, but they will be sourced from different endpoints.)

In this step, we demonstrate how to invoke a model both in the **standard region** and via **cross-region inference** using the `bedrock.converse()` function.

#### Key Points:

1. **Standard Model Invocation**:
   - The first `bedrock.converse()` function call is a **standard invocation** using the model ID `"anthropic.claude-3-5-sonnet-20240620-v1:0"`. This sends the conversation request to the model in the **default region** (based on your AWS setup). This is the default configuration unless specified otherwise.

2. **Cross-Region Inference**:
   - The second `bedrock.converse()` function call is for **cross-region inference**. The model ID starts with `"us."`, which specifies the region for the model, meaning the model is hosted in the **US region**.
   - By specifying a **regional prefix** in the model ID, you can instruct the request to go to a specific region (in this case, the US). This can be important when working with **cross-region requests**, which may affect the **latency**, **availability**, or **compliance** of the request.

3. **Response Extraction**:
   - After both API calls, the responses from the models are parsed. The **text content** of the model's output is extracted using the key `"content"`.
   - The responses from both the **standard region** and the **cross-region** models are printed for comparison.

4. **Purpose**:
   - This setup demonstrates how to invoke models both in the **standard region** (default AWS setup) and in a **cross-region** scenario by specifying a **regional prefix** in the model ID.
   - The comparison between responses from both models helps **understand the impact of region-specific deployment**, such as response time and model consistency.

This approach helps you explore and compare the differences between invoking models in the **default region** and using **cross-region inference**, providing insights into the behavior of models hosted in different regions.


### **2.6 Multi-turn Conversations**
The Converse API makes multi-turn conversations simple. Let's see it in action:

In [None]:
# Example of a multi-turn conversation with Converse API

multi_turn_messages = [
    {
        "role": "user",  # First message from the user (initial summary request)
        "content": [{"text": f"Please summarize this text: {text_to_summarize}"}]  # User asks to summarize the text
    },
    {
        "role": "assistant",  # Response from the assistant (Claude model)
        "content": [{"text": results["Claude 3.7 Sonnet"]["response"]}]  # Assistant’s first response (summary)
    },
    {
        "role": "user",  # Follow-up message from the user (asking for a shorter summary)
        "content": [{"text": "Can you make this summary even shorter, just 1 sentence?"}]  # User asks for a more concise summary
    }
]

try:
    # Send the multi-turn conversation to the model using the Converse API
    response = bedrock.converse(
        modelId=MODELS["Claude 3.7 Sonnet"],  # Specify the model (Claude 3.7 Sonnet)
        messages=multi_turn_messages,  # Provide the conversation history (multi-turn)
        inferenceConfig={"temperature": 0.2, "maxTokens": 500}  # Set inference configuration to control creativity and length
    )

    # Extract the model's response using the correct structure
    follow_up_response = response["output"]["message"]["content"][0]["text"]  # Extract the summarized response

    # Display the follow-up response from the assistant
    display_response(follow_up_response, "Claude 3.7 Sonnet (Multi-turn conversation)")

except Exception as e:
    # Catch and display any errors encountered during the request
    print(f"Error: {str(e)}")



**Expected Output:** (The response will be a single sentence, demonstrating the model maintained context from the previous turn.)

In this step, we explore how to handle **multi-turn conversations** using the Converse API, where the model generates responses based on the previous messages in the conversation.

#### Key Points:

1. **Multi-turn Conversation Setup**:
   - The `multi_turn_messages` list contains multiple messages, each representing an interaction between the **user** and the **assistant** (model).
   - **First message**: A summarization prompt from the **user** asking the model to summarize the given text.
   - **Second message**: The **assistant's** response (model's output) to the summarization prompt.
   - **Third message**: A follow-up question from the **user**, requesting the model to shorten the summary even further.

2. **Calling the Converse API**:
   - The `bedrock.converse()` function sends the entire **multi-turn conversation** to the model specified by the **modelId**.
   - The **`inferenceConfig`** includes parameters like **`temperature`** (controls creativity) and **`maxTokens`** (limits response length).

3. **Extracting and Displaying the Response**:
   - After receiving the response from the API, the `follow_up_response` is extracted from the returned JSON structure.
   - The response is then displayed using the `display_response()` function, which presents it in a user-friendly format.

4. **Error Handling**:
   - If any error occurs (e.g., network issues, model errors), an exception is caught, and an error message is printed for troubleshooting.
   - This helps ensure smooth execution of the API calls and helps identify and resolve any issues quickly.


This example demonstrates how to **manage multi-turn conversations** with the **Converse API**, enabling more **interactive communication** with the model. It allows the model to reference previous exchanges, providing more meaningful and contextually aware responses.


### **2.7 Streaming Responses with ConverseStream API**

For longer generations, you might want to receive the content as it's being generated. The ConverseStream API supports streaming, which allows you to process the response incrementally:

In [None]:
# Example of streaming with Converse API

def stream_converse(model_id, messages, inference_config=None):
    if inference_config is None:
        inference_config = {}  # Set default inference config if none provided

    print("Streaming response (chunks will appear as they are received):\n")
    print("-" * 80)

    full_response = ""  # Initialize an empty string to store the full response

    try:
        # Sending the conversation to the model and enabling streaming
        response = bedrock.converse_stream(
            modelId=model_id,  # Model ID to specify which model to use
            messages=messages,  # The conversation history to be sent to the model
            inferenceConfig=inference_config  # Inference configuration (temperature, max tokens, etc.)
        )

        # Retrieve the stream of the response from the model
        response_stream = response.get('stream')
        if response_stream:
            for event in response_stream:  # Iterate through each event in the response stream

                # Check for message start event and display the role (user or assistant)
                if 'messageStart' in event:
                    print(f"\nRole: {event['messageStart']['role']}")

                # If a content block delta is present, display the text content
                if 'contentBlockDelta' in event:
                    print(event['contentBlockDelta']['delta']['text'], end="")

                # Check for message stop event and display the stop reason
                if 'messageStop' in event:
                    print(f"\nStop reason: {event['messageStop']['stopReason']}")

                # If metadata is present, display usage and latency information
                if 'metadata' in event:
                    metadata = event['metadata']
                    if 'usage' in metadata:  # Display token usage information
                        print("\nToken usage")
                        print(f"Input tokens: {metadata['usage']['inputTokens']}")
                        print(f"Output tokens: {metadata['usage']['outputTokens']}")
                        print(f"Total tokens: {metadata['usage']['totalTokens']}")
                    if 'metrics' in event['metadata']:  # Display latency information
                        print(f"Latency: {metadata['metrics']['latencyMs']} milliseconds")

            # End of the stream, display a separator
            print("\n" + "-" * 80)

        return full_response  # Return the full response (not updated here, as the response is printed in chunks)

    except Exception as e:
        # Handle any errors that occur during the streaming process
        print(f"Error in streaming: {str(e)}")
        return None  # Return None if an error occurs


**Output:** *(No output. The `stream_converse` function is defined in memory.)*

In this step, we explore how to **stream** responses using the **Converse API**, which allows real-time interaction with the model. Streaming enables us to receive model responses as they are generated, providing a faster and more dynamic experience.

#### Key Points:

1. **Function Definition**:
   - The function `stream_converse()` is used to stream responses from a model via the Converse API. It accepts three parameters:
     - **`model_id`**: The **ID** of the model to be used.
     - **`messages`**: The **conversation history/messages** to be sent to the model.
     - **`inference_config`** (optional): **Configuration options** to control the model’s behavior (e.g., temperature, max tokens).

2. **Streaming the Response**:
   - `bedrock.converse_stream()` is used to send the conversation to the model and receive the response **in real-time** as **streamed chunks**.
   - Each chunk can contain:
     - **Message Start**: The **role** of the speaker (either the user or assistant).
     - **Content Block**: The **actual text** generated by the model.
     - **Message Stop**: A **marker** indicating the end of the message.
     - **Metadata**: Includes **token usage** (input/output tokens) and **latency** (response time).

3. **Real-Time Output**:
   - As the response is streamed, chunks of the message are printed **live**.
   - The code tracks **token usage**, **latency**, and displays this information **in real-time** as the model generates text.

4. **Error Handling**:
   - If any error occurs during the streaming process (e.g., API connection issues), it is caught and printed for troubleshooting.

5. **Purpose**:
   - This code demonstrates how to **stream responses** from a model using the **Converse API**, allowing for **real-time interaction** with the model. It’s especially useful for tasks that require **fast feedback** or **continuous updates**, such as generating long-form text or interactive conversations.
   
6. **Streaming Benefits**:
   - **Dynamic interaction** with AI models.
   - **Real-time processing** for use cases like **interactive chatbots**, **real-time summarization**, or any application requiring **immediate feedback** from the model.


Streaming with the Converse API provides a **dynamic way** to interact with models, making it ideal for use cases that need real-time responses and iterative communication with the model.


In [None]:
# Let's try streaming a longer summary

# Define the streaming request, which contains the user's conversation history.
# In this case, we are asking the model to provide a detailed summary of the text provided above.

streaming_request = [
    {
        "role": "user",  # The 'role' indicates that the user is sending the request to the model.
        "content": [
            {
                "text": f"""Please provide a detailed summary of the following text, explaining its key points and implications:

                {text_to_summarize}  # This is the actual text content that we want summarized.

                Make your summary comprehensive but clear.  # Additional instructions for the model to ensure clarity and comprehensiveness.
                """
            }
        ]
    }
]


**Output:** *(No output. The `streaming_request` list is defined.)*

In this step, we modify the conversation to request a more **detailed summary** using the **Converse API**. The goal is to generate a **comprehensive yet clear summary** of the provided text, which will be streamed as it is processed.

#### Key Points:

1. **Streaming Request**:
   - We define a `streaming_request` list, which contains the **user's message** with a more detailed prompt.
   - The prompt requests the model to **summarize the text** while explaining the key points and implications, making the summary **comprehensive** but **clear**.

2. **Request Format**:
   - The **role** of the speaker is set to **"user"**.
   - The **content** is a list of messages, where the **text** field contains the actual prompt sent to the model.

3. **Purpose**:
   - This request is designed to test the model’s ability to provide a **longer, more detailed summary** of the given text. It also shows how the model can handle more complex requests and deliver summaries that are **informative** yet concise.
   
   This setup can be useful when needing deeper insights or a more thorough analysis of the content.
   

This method demonstrates how to interact with the **Converse API** to request more **detailed summaries** and process the information in a **streaming** manner, providing a better, more dynamic user experience.


In [None]:
# Only run this when you're ready to see streaming output

# This line of code calls the 'stream_converse' function to initiate the streaming conversation with the specified model.

streamed_response = stream_converse(
    MODELS["Claude 3.7 Sonnet"],  # The model being used for this conversation, "Claude 3.7 Sonnet" in this case.
    streaming_request,  # The conversation request (including the detailed prompt) defined earlier.
    inference_config={"temperature": 0.4, "maxTokens": 1000}  # Inference parameters to control creativity and output length.
)


**Expected Output:** (The output would appear in real-time chunks, followed by metadata upon completion. This is a simulated, completed streaming output.)

In this step, we execute the **streaming request** to generate a detailed summary using the **Converse API**. This part of the code will allow us to see **real-time streaming output** as the model generates the summary.

#### Key Points:

1. **Function Call**:
   - `stream_converse()` is called with the following parameters:
     - **`model_id`**: The model to use for the request, in this case, **"Claude 3.7 Sonnet"** from the `MODELS` dictionary.
     - **`messages`**: The streaming request (i.e., `streaming_request`), which contains the conversation history and user prompt.
     - **`inference_config`**: Parameters to control the response, such as **temperature** (0.4 for less creativity) and **maxTokens** (1000 to allow for a longer response).

2. **Streaming Output**:
   - The model will stream the output in real-time, and you’ll see each chunk of the response as it’s processed. The model will continue to generate content until it completes the summary or reaches the **maxTokens** limit.

3. **Purpose**:
   - This function demonstrates **real-time interaction** with the Converse API, allowing you to observe the model's output as it's being generated.
   - By setting **temperature** to 0.4 and **maxTokens** to 1000, we're requesting a **detailed** and **coherent** summary, while ensuring that the response stays concise and focused.


This is an effective way to handle **real-time data processing** and interact with the model in a **dynamic manner** to retrieve longer, more detailed summaries as they are generated.


## **Conclusion**

In this notebook, we explored how to effectively interact with Amazon **Bedrock** and **Converse API** to perform tasks like **text summarization** and **real-time streaming**. Here's a summary of what we've learned:

1. **Text Summarization with Invoke API**:
   - We began by demonstrating how to summarize text using the **Invoke Model API**, where we sent a single request to the model and extracted the response.
   - We further extended this by using the **Converse API**, which simplifies multi-turn conversations and allows us to send more complex, dynamic requests like **multi-turn conversations** and **real-time streaming**.

2. **Cross-Region Requests**:
   - We explored how to call models both in the **default region** and with **cross-region inference**, allowing us to better understand the impact of latency and model availability.

3. **Streaming Responses**:
   - We integrated **real-time streaming** of model responses using the **Converse API**, which enabled us to receive and display outputs as they are generated, making interactions more dynamic.

4. **Error Handling**:
   - Throughout the examples, we implemented robust **error handling** to catch common issues like permission errors, ensuring the code runs smoothly and provides helpful troubleshooting resources when necessary.

5. **Benefits of Converse API**:
   - The **Converse API** proved to be a powerful tool for simplifying requests and responses by standardizing the format, supporting multi-turn conversations, and offering easy configuration options for controlling model behavior (like temperature, maxTokens, etc.).


The integration of **Amazon Bedrock**, **Converse API**, and **Claude 3.7 Sonnet** offers a streamlined way to interact with advanced foundation models, enabling real-time, high-quality text generation tasks such as summarization, multi-turn conversations, and interactive feedback, ideal for use in a wide range of applications such as chatbots, content generation, and AI-driven applications.
