# AWS Bedrock


AWS Bedrock is a fully managed service that provides access to foundation models (FMs) from leading AI companies. It allows you to build and scale generative AI applications using these models. In this notebook, we will explore how to use AWS Bedrock with the `boto3` library in Python.

## LLMs and Foundation Models
Foundation models are large-scale machine learning models that are trained on vast amounts of data and can be fine-tuned for specific tasks. AWS Bedrock provides access to various foundation models, including text generation, image generation, and more. </br>

* LLM is a stateless function. When "memory" or "context" is required, it is passed as part of the prompt.
* The operation of fetching information from a database or other source is called "retrieval augmentation generation" (RAG).
* The datasource used to fetch information is called a knowledge base.
* LLMs [pricing](https://aws.amazon.com/bedrock/pricing/) depends on the model complexity and the number of tokens (~4 chars ≅ word) in the prompt and the response.
* The longer the prompt, the more expensive and the slower it is.

### Using AWS Bedrock models and regions

AWS LLM models availability is region specific. `us-east-1` or `N. Virginia` is the region where all models are available. </br>
In order to use a model of a specific provider, you need to request access to that model in the [Model Access](https://us-east-1.console.aws.amazon.com/bedrock/home?region=us-east-1#/modelaccess) section </br>
In this tutorial we'll be using `us-east-1` region with several different models.

### Cross-region inference
Cross-region inference is a new AWS feature that enable LLM requests  to be processed in a different geographical region than where the request originated.
Instead of being limited to the models and compute resources available in a single region, cross-region inference can automatically route your inference requests to other available regions.
You can use the `Inference Profile ID` instead of the `Model ID` to specify the model you want to use. The Inference Profile ID is a unique identifier for a specific model and its associated compute resources in a specific region. </br>
For list of available models and their Inference Profile IDs, please refer to the [AWS Cross-region inference Console](https://us-east-1.console.aws.amazon.com/bedrock/home?region=us-east-1#/inference-profiles).



## Conversation API
The `conversation` API is used to interact with the LLMs in AWS Bedrock. It allows you to send messages to the model and receive responses. The API supports different roles for the messages, including `user`, `assistant`, and `system`. </br>

The `user` role is used for the user's input, the `assistant` role is used for the model's response, and the `system` role is used for system messages that provide context or instructions to the model. </br>

### Important Parameters
* `modelId`: The ID of the model you want to use. This can be either the model ID or the Inference Profile ID.
* `messages`: A list of messages to send to the model. Each message should include a `role` and `content`.
* `inferenceConfig`: A dictionary of inference configuration options, such as `maxTokens`, `stopSequences`, `temperature`, and `topP`.
*  `temperature`: Controls the randomness of the model's output. A higher temperature (e.g., 0.8) makes the output more random, while a lower temperature (e.g., 0.2) makes it more deterministic.
* `topP`: It sets a threshold probability and selects the top tokens whose cumulative probability exceeds the threshold. The model then randomly samples from this set of tokens to generate output. This method can produce more diverse and interesting output than traditional methods that randomly sample the entire vocabulary.

In [17]:
import boto3
from enum import Enum

#  Get the list of available models in the region
boto3_bedrock = boto3.client('bedrock')

class LLMModel(Enum):
    """Enum for Bedrock models."""
    # Anthropic
    CLAUDE_3_5_V1 = 'anthropic.claude-3-5-sonnet-20240620-v1:0'
    CLAUDE_3_5_v2 = 'us.anthropic.claude-3-5-sonnet-20241022-v2:0' # Inference Profile ID
    # Amazon
    NOVA_LITE = 'amazon.nova-lite-v1:0'
    NOVA_PRO = 'amazon.nova-pro-v1:0'
    TITAN_LITE = 'amazon.titan-text-lite-v1'
    TITAN_EXPRESS = 'amazon.titan-text-express-v1'
    META_LLMA3_1B = 'meta.llama3-2-1b-instruct-v1:0'
    META_LLMA3_3B = 'meta.llama3-2-3b-instruct-v1:0'

[model['modelId'] for model in boto3_bedrock.list_foundation_models()['modelSummaries']]


LLMModel.TITAN_EXPRESS


## Zero Shot
The following example is called `zero-shot` prompting. </br>
There are no examples or context provided to the model. The model is expected to understand the task and generate a response based on its own training. </br>


> change the models in order to switch between different LLMs. </br>

In [19]:
import boto3
from botocore.exceptions import ClientError

# Create a Bedrock Runtime client in the AWS Region you want to use.
client = boto3.client("bedrock-runtime", region_name="us-east-1")



# Start a conversation with the user message.
user_message = """Meeting transcript:
Miguel: Hi Brant, I want to discuss the workstream  for our new product launch
Brant: Sure Miguel, is there anything in particular you want to discuss?
Miguel: Yes, I want to talk about how users enter into the product.
Brant: Ok, in that case let me add in Namita.
Namita: Hey everyone
Brant: Hi Namita, Miguel wants to discuss how users enter into the product.
Miguel: its too complicated and we should remove friction. for example, why do I need to fill out additional forms?  I also find it difficult to find where to access the product when I first land on the landing page.
Brant: I would also add that I think there are too many steps.
Namita: Ok, I can work on the landing page to make the product more discoverable but brant can you work on the additional forms?
Brant: Yes but I would need to work with James from another team as he needs to unblock the sign up workflow.  Miguel can you document any other concerns so that I can discuss with James only once?
Miguel: Sure.

From the meeting transcript above, Create a list of action items for each person.
"""
conversation = [
    {
        "role": "user",
        "content": [{"text": user_message}],
    }
]

model_id = LLMModel.NOVA_LITE.value # Inference Profile ID
try:
    # Send the message to the model, using a basic inference configuration.
    #  https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-runtime/client/converse.html
    response = client.converse(
        modelId=model_id,
        messages=conversation,
        inferenceConfig={"maxTokens": 4096, "stopSequences": ["User:"], "temperature": 0, "topP": 1},
        additionalModelRequestFields={}
    )

    # Extract and print the response text.
    response_text = response["output"]["message"]["content"][0]["text"]
    print(response_text)

except (ClientError, Exception) as e:
    print(f"ERROR: Can't invoke '{model_id}'. Reason: {e}")
    exit(1)


Based on the meeting transcript, here is a list of action items for each person involved:

**Miguel:**
1. Document any other concerns or issues related to the user entry process, specifically focusing on the sign-up workflow and landing page. This documentation will be used by Brant to discuss with James from the other team.

**Brant:**
1. Collaborate with James from the other team to address the issues related to the additional forms and the sign-up workflow. Use Miguel's documented concerns to streamline the discussion and ensure all relevant points are covered.

**Namita:**
1. Work on improving the landing page to make the product more discoverable for users. This includes addressing the issue of users finding it difficult to access the product when they first land on the landing page.

By following these action items, the team can work together to improve the user entry process for the new product launch, ultimately reducing friction and enhancing the overall user experience.


### Understanding the response

The LLM response is a JSON object that contains important information in addition to the generated text. </br>
When analyzing and comparing LLM responses, look for the following fields:

* `latency`: The time taken to process the request and generate a response.
* `usage`: The number of tokens used in the prompt and the response. This is important for cost estimation.
* `RequestId`: A unique identifier for the request. This can be useful for debugging and tracking purposes.


In [None]:
    # print the response with json format and indentation
    import json

    print(json.dumps(response, indent=4, sort_keys=True))


## One Shot
One-shot prompting is a technique used to provide a single example to the model, helping it understand the task better. </br>

In this example, we will use one-shot prompting to create a meeting summary and action items. </br>
We will use the `assistant` role to provide the model with a system message that describes its role and the task it needs to perform. </br>
In addition, we will provide a user message that contains the meeting transcript. </br>


In [25]:
import boto3
from botocore.exceptions import ClientError

client = boto3.client("bedrock-runtime", region_name="us-east-1")

# Start a conversation with the user message.
system_message = """You are a meeting assistant that helps to summarize the meeting and create action items for each person.
You are given a meeting transcript and you need to create a list of action items for each person in the meeting.
The action items should be in the following format:

=== Miguel ===
    - action item 1
    - action item 2
=== Brant ===
    - action item1
    - action item 2
=== Namita ===
    - action item 1
    - action item 2

The action items should be clear and concise.
The action items should be based on the meeting transcript and should not include any additional information.
In the end of the response, mention the meeting participants and their roles in a JSON format.
In addition rank their involvement in the meeting from 1 to 5, where 5 is the most involved and 1 is the least involved.
{
    {"name": "Miguel", "role": "Product Manager", "involvement": 5},
    {"name": "Brant", "role": "Software Engineer", "involvement": 4},
    {"name": "Namita", "role": "UX Designer", "involvement": 3}
}

"""

user_message = """Meeting transcript:
Miguel: Hi Brant, I want to discuss the workstream  for our new product launch
Brant: Sure Miguel, is there anything in particular you want to discuss?
Miguel: Yes, I want to talk about how users enter into the product.
Brant: Ok, in that case let me add in Namita.
Namita: Hey everyone
Brant: Hi Namita, Miguel wants to discuss how users enter into the product.
Miguel: its too complicated and we should remove friction. for example, why do I need to fill out additional forms?  I also find it difficult to find where to access the product when I first land on the landing page.
Brant: I would also add that I think there are too many steps.
Namita: Ok, I can work on the landing page to make the product more discoverable but brant can you work on the additional forms?
Brant: Yes but I would need to work with James from another team as he needs to unblock the sign up workflow.  Miguel can you document any other concerns so that I can discuss with James only once?
Miguel: Sure.
"""

user_message2 = """Meeting transcript:
Attendees: Shimon (PM), Igor (CTO), Avi (Tech Lead)

Shimon: Right, let's discuss integrating code coverage into the main pipeline. What are the main benefits and drawbacks?

Igor: The major pro is improved code quality and maintainability. It gives us objective data on test effectiveness, reducing future bugs and technical debt. It’s a standard best practice.

Avi: Agreed, Igor. The con is potential friction – longer build times initially, and developers needing to potentially refactor or add more tests, impacting velocity slightly. We need to manage thresholds carefully.

Shimon: So, increased confidence in quality versus a possible short-term slowdown?

Igor: Exactly. A worthwhile investment for long-term stability.

Avi: We can mitigate the impact by starting with warnings, not blockers.
Barkoni: We stopped doing code coverage in our planet before we wrote the first line of code.
Shimon: Barkoni, can you elaborate on that?
Barkoni: What's the point? You will not understand.
"""

conversation = [
    {
        "role": "user",
        "content": [{"text": 'you are a meeting assistant that summarize meetings'}],
    },
    {
        "role": "assistant",
        "content": [{"text": system_message}],
    },
    {
        "role": "user",
        "content": [{"text": user_message2}],
    }
]

model_id = LLMModel.CLAUDE_3_5_v2.value # Inference Profile ID

try:
    # Send the message to the model, using a basic inference configuration.
    #  https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-runtime/client/converse.html
    response = client.converse(
        modelId=model_id,
        messages=conversation,
        inferenceConfig={"maxTokens": 1000,  "temperature": 0, "topP": 0.9},
        additionalModelRequestFields={}
    )

    # Extract and print the response text.
    response_text = response["output"]["message"]["content"][0]["text"]
    print(response_text)

except (ClientError, Exception) as e:
    print(f"ERROR: Can't invoke '{model_id}'. Reason: {e}")
    exit(1)


Meeting Summary and Action Items:

=== Shimon ===
- Create a proposal document outlining the code coverage implementation plan
- Schedule a follow-up meeting to discuss specific threshold values
- Communicate the planned changes to the development team

=== Igor ===
- Research and recommend code coverage tools that would best integrate with the current pipeline
- Prepare technical documentation for code coverage implementation
- Define initial threshold recommendations based on current codebase

=== Avi ===
- Identify high-priority areas for initial coverage focus
- Prepare developer guidelines for test coverage requirements

=== Barkoni ===
- Share detailed experience and learnings from previous implementation attempts
- Provide specific concerns about code coverage implementation in writing

Participants Information:
{
    {"name": "Shimon", "role": "Product Manager", "involvement": 4},
    {"name": "Igor", "role": "CTO", "involvement": 5},
    {"name": "Avi", "role": "Tech Lead", "i

## Few Shots
Few-shot prompting is a technique used to provide multiple examples to the model, helping it understand the task better. </br>
Each example is called a `shot` and it contains a prompt and a desired response. </br>
The `shots` can be based on real-world examples or synthetic examples that exemplify the task. </br>


In [26]:

import boto3
from botocore.exceptions import ClientError

client = boto3.client("bedrock-runtime", region_name="us-east-1")
model_id = LLMModel.CLAUDE_3_5_v2.value

# Start a conversation with the user message.

prompt = """You are a product manager for a website builder platform (like Wix). For each new feature request, you need to outline the functional requirements, highlight important UI/UX considerations, and explain the business advantages. Respond in the following format for each feature request:

**Functional Requirement:**
[Clearly describe what the feature should do.]

**UI/UX Important Points:**
[List key considerations for the user interface and user experience of this feature.]

**Business Advantage:**
[Explain how this feature benefits the website builder platform.]

----
Examples:

**Feature Request:** Add a built-in image editor with basic cropping and resizing tools.

**Functional Requirement:**
Users should be able to crop and resize images directly within the website builder without needing to upload pre-edited files. Supported formats should include JPG, PNG, and GIF. The editor should offer standard aspect ratio presets and freeform resizing.

**UI/UX Important Points:**
- The image editor should be easily accessible within the image settings panel.
- Controls for cropping and resizing should be intuitive and visually clear.
- Users should see a real-time preview of their edits.
- An option to revert to the original image should be available.

**Business Advantage:**
- Improves user convenience and efficiency by eliminating the need for external image editing tools.
- Can attract users who need quick image adjustments without complex software.
- May reduce support requests related to image sizing issues.

---

**Feature Request:** Implement a library of pre-designed website sections (e.g., headers, footers, contact forms).

**Functional Requirement:**
Users should be able to browse and insert professionally designed website sections into their pages with a single click. The library should include various categories and styles, and users should be able to customize the content and styling of these sections.

**UI/UX Important Points:**
- The section library should be well-organized and easy to navigate, possibly with categories and search functionality.
- Previews of the sections should be clear and representative of the final design.
- The insertion process should be seamless and not disrupt the user's workflow.
- Users should have clear visual cues on how to customize the content of the inserted sections.

**Business Advantage:**
- Speeds up the website creation process for users, making it more appealing to beginners.
- Provides users with professionally designed elements, potentially leading to more visually appealing websites.
- Can encourage users to build more comprehensive websites by offering readily available components.

---

**Feature Request:** Allow users to embed social media feeds (e.g., Instagram, Twitter) directly onto their websites.

**Functional Requirement:**
Users should be able to connect their social media accounts and display their latest posts on their website. They should have options to customize the number of posts displayed and the layout of the feed.

**UI/UX Important Points:**
- The connection process to social media accounts should be secure and straightforward.
- Embedding options should be easily accessible within the website editor.
- Users should have control over the visual presentation of the feed to match their website's design.
- The embedded feed should be responsive and display correctly on different devices.

**Business Advantage:**
- Enhances user engagement by allowing them to showcase their social media presence.
- Can drive traffic between the user's website and their social media profiles.
- Adds dynamic content to websites, making them more lively and up-to-date.

---

Here is the actual Feature Request: {0}
"""

# feature_request = """Integrate with a third-party payment processor (e.g., Stripe, PayPal) to enable e-commerce functionality."""
feature_request = """I want a no-code feature for my website builder that allows users to create custom forms with drag-and-drop functionality. The forms should support various field types (text, checkbox, radio button, dropdown) and allow users to customize the layout and design. Additionally, users should be able to set up email notifications for form submissions and view submission data in a user-friendly dashboard."""

conversation = [
    {
        "role": "user",
        "content": [{"text": prompt.format(feature_request)}],
        # "content": [{"text": prompt.format("Implement A built-in image editor with basic cropping and resizing tools.")}],
    }
]

try:
    # Send the message to the model, using a basic inference configuration.
    #  https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-runtime/client/converse.html
    response = client.converse(
        modelId=model_id,
        messages=conversation,
        inferenceConfig={"maxTokens": 4096, "stopSequences": ["User:"], "temperature": 0, "topP": 1},
        additionalModelRequestFields={}
    )

    # Extract and print the response text.
    response_text = response["output"]["message"]["content"][0]["text"]
    print(response_text)

except (ClientError, Exception) as e:
    print(f"ERROR: Can't invoke '{model_id}'. Reason: {e}")
    exit(1)


I'll analyze this feature request for a no-code form builder:

**Functional Requirements:**
1. Drag-and-drop form builder interface with:
   - Standard field types (text, email, phone, checkbox, radio, dropdown, file upload)
   - Multi-column layout options
   - Field validation rules (required fields, email format, number ranges)
   - Custom field labels and placeholder text
   - Conditional logic (show/hide fields based on responses)

2. Form submission handling:
   - Secure data storage
   - Automated email notifications to form owner
   - Custom auto-response emails to form submitter
   - Export functionality for submission data (CSV, Excel)
   - File attachment storage and management

3. Form management dashboard:
   - List of all created forms
   - Submission statistics and analytics
   - Search and filter capabilities for submissions
   - Archive/delete options for old submissions

**UI/UX Important Points:**
- Clear visual cues for draggable elements and drop zones
- Real-time 

## Tool Use / Function Calling
LLMs can be enhanced with tools and APIs to provide additional functionality. </br>
For example, you can use APIs to fetch data from external sources, perform calculations, or interact with other services. </br>
In this example, we will use the `python` tool to perform calculations and the `weather` API to fetch weather data. </br>

**Tool Use Workflow:**

1.  **Define Tools:** Specify tools with names, descriptions, and argument schemas. Include a user prompt (e.g., "What's the weather like in New York today?").
2.  **LLM Decides:** The LLM determines if a tool is necessary and halts text generation if so.
3.  **JSON Call:** The LLM outputs a JSON object containing the selected tool and its parameter values.
4.  **Execute & Return:** The system extracts parameters, runs the tool, and returns the output to the LLM.
5.  **Generate Answer:** The LLM uses the tool output to create a final response.

![Tool Calling](../resources/images/tool-call-schema-removebg.webp)
### References

* [AWS Bedrock: Converse API tool use examples](https://docs.aws.amazon.com/bedrock/latest/userguide/tool-use-examples.html)
* [Guide to Tool Calling](https://www.analyticsvidhya.com/blog/2024/08/tool-calling-in-llms/)




In [27]:
import requests
import boto3
import json

client = boto3.client("bedrock-runtime", region_name="us-east-1")
model_id = "us.anthropic.claude-3-5-sonnet-20241022-v2:0" # Inference Profile ID

class LocationNotFoundException(Exception):
    """Raised when a location is not found."""
    pass

# Weather API: Fetch weather data from weatherapi.com, api key from user: mistriela@yopmail.com
def fetch_weather(city):
    print('Fetching weather data for city:', city)
    base_url = "https://api.weatherapi.com/v1/current.json"
    params = {
        "q": city,
        "key": "6e7b99ab0f454283ab9125132252104",
        "aqi": "no"  # Get temperature in Celsius
    }
    try:
        response = requests.get(base_url, params=params)
        response.raise_for_status()  # Raise an error for bad HTTP responses
        weather_data = response.json()
        return {
            "city": weather_data["location"]["name"],
            "temperature": weather_data["current"]["temp_c"],
            "description": weather_data["current"]["condition"]["text"]
        }
    except requests.exceptions.RequestException as e:
        raise LocationNotFoundException(f"Error fetching weather data: {e}")


def invoke_bedrock_llm_with_function_calling(prompt: str, model: LLMModel):
    """
    Invokes an LLM on AWS Bedrock with function calling to get weather information.

    Args:
        prompt (str): The user's query (e.g., "What's the weather in London?").
        model_id (str): The ID of the Bedrock LLM model to use.
        region_name (str): The AWS region.

    Returns:
        str: The LLM's response, which may include the weather information
             or an error message.
    """

    tool_config = {
        "tools": [
            {
                "toolSpec": {
                    "name": "fetch_weather",
                    "description": "fetch weather information for a given city",
                    "inputSchema": {
                        "json": {
                            "type": "object",
                            "properties": {
                                "city": {
                                    "type": "string",
                                    "description": "The name of the city for which to fetch weather information."
                                }
                            },
                            "required": [
                                "city"
                            ]
                        }
                    }
                }
            }
        ]
    }


    # Note: there is no usage of `assistant` role as part of the prompt. It's the LLM responsibility to understand that a tool is required.
    input_messages = [{
        "role": "user",
        "content": [{"text": prompt}]
    }]

    # Send the initial message to the model. if there is a weather request, the model will stop and return a tool use request.
    # Several tools requests can be sent in a single response.
    response = client.converse(
        modelId=model.value,
        messages=input_messages,
        toolConfig=tool_config
    )
    output_message = response['output']['message']
    input_messages.append(output_message)
    stop_reason = response['stopReason']

    print(f"LLM Message: {json.dumps(response, indent=4)}")
    # Print the LLM message and stop reason.
    # See if any response starts with a json object root node: "toolUse"
    if stop_reason == 'tool_use':
        # Tool use requested. Call the tool and send the result to the model.
        tool_requests = output_message['content']
        for tool_request in tool_requests:
            if 'toolUse' in tool_request:
                tool = tool_request['toolUse']
                print(f"Requesting tool {tool['name']} Request: {tool['toolUseId']} ")

                if tool['name'] == 'fetch_weather':
                    tool_result = {}
                    try:
                        weather_data = fetch_weather(tool['input']['city'])
                        tool_result = {
                            "toolUseId": tool['toolUseId'],
                            "content": [{"json": weather_data}]
                        }
                    except LocationNotFoundException as err:
                        tool_result = {
                            "toolUseId": tool['toolUseId'],
                            "content": [{"text":  err.args[0]}],
                            "status": 'error'
                        }

                    tool_result_message = {
                        "role": "user",
                        "content": [{"toolResult": tool_result}]
                    }
                    input_messages.append(tool_result_message)

                    # Send the tool result to the model.
                    response = client.converse(
                        modelId=model.value,
                        messages=input_messages,
                        toolConfig=tool_config
                    )
                    output_message = response['output']['message']

    # print the final response from the model.
    # for content in output_message['content']:
    #     print(json.dumps(content, indent=4))

    return output_message



In [28]:
user_query = "What is the weather like in London?"
llm_response = invoke_bedrock_llm_with_function_calling(user_query, LLMModel.CLAUDE_3_5_v2)
print(llm_response['content'][0]['text'])

LLM Message: {
    "ResponseMetadata": {
        "RequestId": "3c64c8c5-63da-4ae0-938a-f2df8c23c56a",
        "HTTPStatusCode": 200,
        "HTTPHeaders": {
            "date": "Thu, 15 May 2025 09:39:00 GMT",
            "content-type": "application/json",
            "content-length": "367",
            "connection": "keep-alive",
            "x-amzn-requestid": "3c64c8c5-63da-4ae0-938a-f2df8c23c56a"
        },
        "RetryAttempts": 0
    },
    "output": {
        "message": {
            "role": "assistant",
            "content": [
                {
                    "text": "I'll help you check the weather in London using the fetch_weather function."
                },
                {
                    "toolUse": {
                        "toolUseId": "tooluse_esZ_6Mf5Skeigm1Khu5Jgw",
                        "name": "fetch_weather",
                        "input": {
                            "city": "London"
                        }
                    }
           

## Tool Chaining
Tool chaining is a technique used to combine multiple tools or functions in a sequence to achieve a more complex task. </br>
In this example, we will use tool chaining **recursively** to fetch the weather data and then determine the dress code based on the temperature. </br>

### Execution Flow
We will ask the model to provide a dress code for the hotter city. </br>
The model will need to understand that it needs to call the `fetch_weather` tool to get the weather data for **both** cities. </br>
Then, it will call the `get_dress_code` tool to determine the dress code based on the temperature. </br>

### Important Lookouts
* Experiment with the different models, check if `lite` models can be used instead of `pro` models.
* Check the input/output of the llm - see how the conversation is built gradually.

In [37]:
import requests
import boto3
import json

client = boto3.client("bedrock-runtime", region_name="us-east-1")


class LocationNotFoundException(Exception):
    """Raised when a location is not found."""
    pass

# Weather API: Fetch weather data from weatherapi.com, api key from user: mistriela@yopmail.com
def fetch_weather(city):
    print('Fetching weather data for city:', city)
    base_url = "https://api.weatherapi.com/v1/current.json"
    params = {
        "q": city,
        "key": "6e7b99ab0f454283ab9125132252104",
        "aqi": "no"  # Get temperature in Celsius
    }
    try:
        response = requests.get(base_url, params=params)
        response.raise_for_status()  # Raise an error for bad HTTP responses
        weather_data = response.json()
        return {
            "city": weather_data["location"]["name"],
            "temperature": weather_data["current"]["temp_c"],
            "description": weather_data["current"]["condition"]["text"]
        }
    except requests.exceptions.RequestException as e:
        raise LocationNotFoundException(f"Error fetching weather data: {e}")

def get_dress_code(temperature: float) -> str:
    """
    Determine the dress code based on the temperature.
    Temperature ranges: -20 to 50 degrees Celsius.
    Args:
        temperature (float): The temperature in Celsius.

    Returns:
        str: The recommended dress code.
    """
    print('Getting dress code for temperature:', temperature)
    if temperature < -10:
        return "Wear a heavy winter coat, gloves, and a warm hat."
    elif -10 <= temperature < 0:
        return "Wear a warm coat and gloves."
    elif 0 <= temperature < 10:
        return "Wear a light jacket."
    elif 10 <= temperature < 20:
        return "Wear a long-sleeve shirt."
    elif 20 <= temperature < 30:
        return "Wear a short-sleeve shirt."
    else:
        return "Wear summer clothes."

tool_config = {
    "tools": [
        {
            "toolSpec": {
                "name": "fetch_weather",
                "description": "fetch weather information for a given city",
                "inputSchema": {
                    "json": {
                        "type": "object",
                        "properties": {
                            "city": {
                                "type": "string",
                                "description": "The name of the city for which to fetch weather information."
                            }
                        },
                        "required": ["city"]
                    }
                }
            }
        },
        {
            "toolSpec": {
                "name": "get_dress_code",
                "description": "Get the dress code based on the temperature.",
                "inputSchema": {
                    "json": {
                        "type": "object",
                        "properties": {
                            "temperature": {
                                "type": "number",
                                "description": "The temperature in Celsius."
                            }
                        },
                        "required": ["temperature"]
                    }
                }
            }
        }
    ]
}


def invoke_bedrock_llm_with_multiple_function_calling(prompt: str, model: LLMModel, input_messages=None):
    if input_messages is None:
        input_messages = [{"role": "user", "content": [{"text": prompt}]}]

    # print('input messages:', json.dumps(input_messages, indent=4))

    response = client.converse(
        modelId=model.value,
        messages=input_messages,
        toolConfig=tool_config
    )
    output_message = response['output']['message']
    # print('output_message:', json.dumps(response['output'], indent=4))
    input_messages.append(output_message)
    stop_reason = response['stopReason']

    if stop_reason == 'tool_use':
        response_contents = output_message['content']
        tool_responses = []
        for response_content in response_contents:
            if 'toolUse' in response_content: ## if there is a tool use request in the response content
                tool_request = response_content['toolUse']
                print(f"Requesting tool '{tool_request['name']}' ID[{tool_request['toolUseId']}] Input: {tool_request['input']}")

                tool_result = {}
                if tool_request['name'] == 'fetch_weather':
                    try:
                        tool_result = fetch_weather(tool_request['input']['city'])

                    except LocationNotFoundException as err:
                        tool_result = {"error": err.message}

                elif tool_request['name'] == 'get_dress_code':
                    tool_result = {"text": get_dress_code(tool_request['input']['temperature'])}


                tool_result_response = {
                    "toolResult": {
                        "toolUseId": tool_request['toolUseId'],
                        "content": [{"json": tool_result}]
                    }
                }

                tool_responses.append(tool_result_response)

        tool_result_message = {
            "role": "user",
            "content": tool_responses
        }

        input_messages.append(tool_result_message)

        # Recursively call the function to process the next step
        return invoke_bedrock_llm_with_multiple_function_calling(prompt, model, input_messages)

    # If no more tools are required, return the final response
    return output_message


user_query = """I want to travel to a cold location.
I'm thinking Tel-Aviv or Moscow. What is the colder city? and what should I wear there?
Provide concise textual answer.
"""

llm_response = invoke_bedrock_llm_with_multiple_function_calling(user_query, LLMModel.CLAUDE_3_5_v2)
print("\n")
print(llm_response['content'][0]['text'])


Requesting tool 'fetch_weather' ID[tooluse_79dfHnB6Qw-GlebEkvhnUw] Input: {'city': 'Tel Aviv'}
Fetching weather data for city: Tel Aviv
Requesting tool 'fetch_weather' ID[tooluse_OVUCz2GiR2GzEH1F6VPCyA] Input: {'city': 'Moscow'}
Fetching weather data for city: Moscow
Requesting tool 'get_dress_code' ID[tooluse_jApkOgPsQJWVrOd6ga2ogg] Input: {'temperature': 15.3}
Getting dress code for temperature: 15.3


Moscow is the colder city at 15.3°C, while Tel Aviv is warmer at 24°C. For Moscow's current temperature, you should wear a long-sleeve shirt.


## Structured Output
Structured output is a technique used to format the output of the LLM in a specific way. </br>
This can be useful for various applications, such as generating JSON or XML responses. </br>
It also helps to ensure that the output is consistent and compliant with the expected format. </br>

In [None]:
import boto3
from botocore.exceptions import ClientError

# Create a Bedrock Runtime client in the AWS Region you want to use.
client = boto3.client("bedrock-runtime", region_name="us-east-1")

##> Show the differences between
# model_id = "anthropic.claude-3-5-sonnet-20240620-v1:0" # Model ID
# model_id = "us.anthropic.claude-3-5-sonnet-20241022-v2:0" # Inference Profile ID
# model_id = "amazon.titan-text-express-v1"
model_id = "amazon.nova-lite-v1:0"

system_message = """You are a car expert with ability to provide technical details about a specific car you are asked for.
The response must be in a JSON format."""

# Start a conversation with the user message.
# user_message = """Hi i would like to know the technical details about the car Tesla Model Y."""
user_message = """Hi i would like to know the technical details about the car Toyota Prius 2010."""

assistant_message = """```json
{
    "car": {
        "name": "Car name",
        "type": "Type of the car. E.g: Electric SUV",
        "range": "The range of the car in miles",
        "top_speed": "Top speed of the car in mph",
        "acceleration": "0-60 mph time in seconds",
        "battery_capacity": "For electric cars, mention the battery capacity in kWh",
        "features": Array of features of the car. E.g: "Autopilot","All-wheel drive", "Premium interior", "Panoramic glass roof", "Advanced safety features",  "Over-the-air software updates", "ABS brakes"

    }
}
```"""

conversation = [
    {
        "role": "user",
        "content": [{"text": system_message}],
    },
    {
        "role": "assistant",
        "content": [{"text": assistant_message}],
    },
    {
        "role": "user",
        "content": [{"text": user_message}],
    },
]

try:
    # Send the message to the model, using a basic inference configuration.
    #  https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-runtime/client/converse.html
    response = client.converse(
        modelId=model_id,
        messages=conversation,
        inferenceConfig={"maxTokens": 4096, "stopSequences": ["User:"], "temperature": 0, "topP": 1},
        additionalModelRequestFields={}
    )

    # Extract and print the response text.
    response_text = response["output"]["message"]["content"][0]["text"]
    print(response_text)

except (ClientError, Exception) as e:
    print(f"ERROR: Can't invoke '{model_id}'. Reason: {e}")
    exit(1)


## Prompt Caching

Prompt caching is a powerful feature in Amazon Bedrock that significantly reduces response latency for workloads with repetitive contexts.

### What is Prompt Caching?

Prompt caching allows you to store portions of your conversation context, enabling models to:
- Reuse cached context instead of reprocessing inputs
- Reduce response Time-To-First-Token (TTFT) for subsequent queries

### When to Use Prompt Caching

Prompt caching delivers maximum benefits for:
- **Chat with Document**: By caching the document as input context on the first request, each user query becomes more efficient, perhaps enabling simpler architectures that avoid heavier solutions like vector databases.
- **Coding assistants**: Reusing long code files in prompts enables near real-time inline suggestions, eliminating much of the time spent reprocessing code files.
- **Agentic workflows**: Longer system prompts can be used to refine agent behavior without degrading the end-user experience. By caching the system prompts and complex tool definitions, the time to process each step in the agentic flow can be reduced.
- **Few-Shot Learning**: Including numerous high-quality examples and complex instructions, such as for customer service or technical troubleshooting, can benefit from prompt caching.

### Benefits of Prompt Caching

- **Faster Response Times**: Avoid reprocessing the same context repeatedly
- **Improved User Experience**: Reduced TTFT to create more natural conversations
- **Cost Efficiency**: Potentially lower token usage by avoiding redundant processing

### Model Support
Be advised that the prompt caching feature is model-specific. You should review the [supported models](https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html#prompt-caching-models) and details on the minimum number of tokens per cache checkpoint and maximum number of cache checkpoints per request

### How it Works

Prompt caching works by storing the context of the conversation in a cache. When a new request is made, the model checks if the context is already in the cache. If it is, the model uses the cached context instead of reprocessing the input.

As we saw earlier, a prompt is an array of messages. A message can be marked as a cache checkpoint. Once a message is marked as a cache checkpoint,
The **entire section of the prompt preceding the checkpoint then becomes the cached prompt prefix**.

Now when the model processes a new request, that includes the all the prompt, it will not process the messages that are part of the cached prompt prefix, it will only need to process the messages that follow the checkpoint.


![Prompt Caching](../resources/images/prompt-caching-aws-bedrock.png)

The following diagram illustrates how cache hits work. A, B, C, D represent distinct portions of the prompt. A, B and C are marked as the prompt prefix. Cache hits occur when subsequent requests contain the same A, B, C prompt prefix.

![Prompt Caching](../resources/images/cache-prompt-prefix.png)
### References
* [Amazon Bedrock: Prompt Caching](https://aws.amazon.com/blogs/machine-learning/effectively-use-prompt-caching-on-amazon-bedrock/)



<div class="alert alert-block alert-info">
<b>Important:</b> Execute the following cell to make sure you have the latest version of the SDK (boto3 min version: boto3-1.37.24.)
</div>

In [None]:
# Before we start lets make sure we have the latest version of the SDK (boto3 min version: boto3-1.37.24.)
import boto3
boto3_version = boto3.__version__
print(f"boto3 version: {boto3_version}")
if boto3_version < "1.37.24":
    print("Updating your boto3 version to latest version.")
    !pip install --upgrade boto3
    print("boto3 was updated to the latest version.")
    print("Please restart the kernel and re-run the notebook.")
    exit(1)
else:
    print("boto3 version is up to date.")


### Prompt Caching Example
In the following example we will create a **Terms of Use Analyzer**  that can answer questions about the terms of use of a specific website. </br>
Since the terms of use are long, we will use the prompt caching feature to cache the terms of use and only process the user question. </br>


In [15]:
import json

def chat_with_document(document, user_query, model_id):

    instructions_terms_of_use_analyzer = """
    I will provide you with the Terms of Use document of a specific website, followed by a question about its content. Your task is to analyze the Terms of Use, extract relevant information, and provide a comprehensive answer to the question. Please follow these detailed instructions:

    1. Identifying Relevant Clauses (Quotes):
       - Carefully read through the entire Terms of Use document.
       - Identify sections or clauses of the text that are directly relevant to answering the question.
       - Pay special attention to definitions, user obligations, disclaimers of warranties, limitations of liability, governing law, dispute resolution, and any clauses related to the specific query.
       - Select quotes that provide key information, context, or support for the answer. These are typically specific clauses or parts of clauses.
       - Quotes should be concise and to the point, ideally no more than 2-4 sentences each, capturing the core of the relevant legal statement.
       - Choose a diverse range of quotes if multiple clauses address different aspects of the question.
       - Aim to select between 2 to 5 quotes, depending on the complexity of the question and the structure of the Terms of Use.

    2. Presenting the Relevant Clauses:
       - List the selected quotes under the heading 'Relevant clauses:'
       - Number each quote sequentially, starting from [1].
       - Present each quote exactly as it appears in the original text, enclosed in quotation marks.
       - If no relevant clauses can be found to directly answer the question, write 'No directly relevant clauses found' instead.
       - Example format:
         Relevant clauses:
         [1] "Users agree not to use the service for any illegal or unauthorized purpose."
         [2] "The Company reserves the right to terminate your access to the Service at any time, without notice, for any reason whatsoever."

    3. Formulating the Answer:
       - Begin your answer with the heading 'Answer:' on a new line after the quotes.
       - Provide a clear, concise, and accurate answer to the question based on the information in the Terms of Use.
       - Ensure your answer is comprehensive and addresses all aspects of the question.
       - Use information from the quoted clauses to support your answer, but rephrase and explain the implications rather than repeating them verbatim.
       - Where possible, explain any legal jargon or complex phrasing in simpler terms, while remaining faithful to the original meaning of the clause.
       - Maintain a logical flow and structure in your response.

    4. Referencing Clauses in the Answer:
       - Do not explicitly mention or introduce quotes in your answer (e.g., avoid phrases like 'According to clause [1]').
       - Instead, add the bracketed number of the relevant quote at the end of each sentence or point that uses information from that clause.
       - If a sentence or point is supported by multiple clauses, include all relevant quote numbers.
       - Example: 'The website prohibits users from engaging in unlawful activities. [1] Furthermore, the platform can suspend user accounts without prior notification if terms are violated. [2]'

    5. Handling Ambiguity or Lack of Specific Information:
       - If the Terms of Use do not contain enough specific information to fully answer the question, or if a clause is ambiguous, clearly state this in your answer.
       - Provide any partial information that is available, and explain what aspects are not explicitly covered or remain unclear.
       - If there are multiple possible interpretations of a clause relevant to the question, explain this and provide answers based on plausible interpretations if possible, noting the ambiguity.
       - State clearly that the analysis is based *only* on the provided text and cannot infer unstated terms.

    6. Maintaining Objectivity and Disclaimer:
       - Stick to the facts and statements presented in the Terms of Use document. Do not include personal opinions, interpretations beyond the text, or external information not found in the document.
       - Your analysis is for informational purposes only and should NOT be considered legal advice. Always recommend consulting with a legal professional for specific advice regarding Terms of Use.
       - If the document presents one-sided or particularly restrictive terms, you can note this objectively in your answer without endorsing or refuting the legal validity or fairness of such terms.

    7. Formatting and Style:
       - Use clear paragraph breaks to separate different points or aspects of your answer.
       - Employ bullet points or numbered lists if it helps to organize information about specific rights, obligations, or restrictions more clearly.
       - Ensure proper grammar, punctuation, and spelling throughout your response.
       - Maintain a professional, neutral, and informative tone throughout your answer.

    8. Length and Depth:
       - Provide an answer that is sufficiently detailed to address the question comprehensively based on the Terms of Use.
       - However, avoid unnecessary verbosity. Aim for clarity and conciseness, focusing on the aspects most relevant to the user's query.
       - The length of your answer should be proportional to the complexity of the question and the amount of relevant information within the Terms of Use.

    9. Dealing with Complex or Multi-part Questions about Terms:
       - For questions with multiple parts (e.g., 'What are the user's rights regarding data privacy and what is the process for account termination?'), address each part separately and clearly.
       - Use subheadings or numbered points to break down your answer if necessary, ensuring each component of the query is addressed.

    10. Concluding the Answer:
        - If appropriate, provide a brief summary of the key findings from the Terms of Use related to the question.
        - Reiterate that the information is based solely on the provided document and is not legal advice. If the question implies seeking guidance (e.g., 'Should I be concerned about X clause?'), frame the answer by explaining what the clause means according to the text, rather than advising on concern.

    Remember, your goal is to provide a clear, accurate, and well-supported analysis based solely on the content of the given Terms of Use document. Adhere to these instructions carefully to ensure a high-quality response that effectively addresses the user's query about the website's terms.
    """


    document_content =  f"Here is the document:  <document> {document} </document>"

    messages_body = [
        {
            'role': 'user',
            'content': [
                {
                    'text': instructions_terms_of_use_analyzer
                },
                {
                    'text': document_content
                },
                {
                    "cachePoint": {
                        "type": "default"
                    }
                },
                {
                    'text': user_query
                },
            ]
        },
    ]

    inference_config={
        'maxTokens': 10000,
        'temperature': 0,
        'topP': 1
    }
    client = boto3.client("bedrock-runtime", region_name="us-east-1")


    response = client.converse(
        messages=messages_body,
        modelId=model_id,
        inferenceConfig=inference_config
    )

    output_message = response["output"]["message"]
    response_text = output_message["content"][0]["text"]

    print("Response text:")
    print(response_text)

    print("Usage:")
    print(json.dumps(response["usage"], indent=2))


Now lets chat with the document and pay attention to the:
* `cacheWriteInputTokens` - the number of tokens used to write the cache
* `cacheReadInputTokens` - the number of tokens used to read the cache

In the first request, the model will process the entire document and write it to the cache. </br>
In the second request, the model will only process the user question and read the rest from the cache. </br>

In [18]:
import requests

terms_of_use = requests.get('https://www.pexels.com/terms-of-service/').text
model_id="amazon.nova-lite-v1:0"


questions = [
    'Is my information is used by 3rd parties?',
    'Is the service is GDPR compliant?',
]


In [19]:
chat_with_document(
    document=terms_of_use,
    user_query=questions[0],
    model_id=model_id
)

Response text:
Relevant clauses:
[1] "No, your information is not shared with third parties except as described in this document."

Answer:
Based on the provided Terms of Use document, your information is not used by third parties unless explicitly stated in the document. [1] The document clearly mentions that user information is not shared with third parties except as described within the terms themselves. If you have concerns about how your data might be used or shared, it would be best to review the specific sections of the Terms of Use that address data sharing and privacy practices. Remember, this analysis is based solely on the provided text and does not constitute legal advice. For specific concerns or detailed understanding, consulting with a legal professional is recommended.
Usage:
{
  "inputTokens": 16,
  "outputTokens": 144,
  "totalTokens": 5753,
  "cacheReadInputTokens": 0,
  "cacheWriteInputTokens": 5593
}


In [20]:
chat_with_document(
    document=terms_of_use,
    user_query=questions[1],
    model_id=model_id
)

Response text:
Relevant clauses:
[1] "We collect and process personal data in accordance with applicable data protection laws, including the General Data Protection Regulation (GDPR)."
[2] "Users have the right to access, rectify, and delete their personal data, as well as the right to data portability and to object to processing."
[3] "We implement appropriate technical and organizational measures to protect personal data against unauthorized or unlawful processing and against accidental loss, destruction, or damage."

Answer:
Based on the provided Terms of Use, the service claims to collect and process personal data in compliance with applicable data protection laws, including the GDPR. [1] Users are informed that they have specific rights regarding their personal data, such as the right to access, rectify, and delete their data, as well as the right to data portability and to object to processing. [2] Additionally, the service states that it implements appropriate technical and orga