## Setup

In this notebook, we will use the [`boto3` Python SDK](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html) to work with [Amazon Bedrock](https://aws.amazon.com/bedrock/) Foundation Models.

Run the following cell to install the packages needed by the notebook.

In [None]:
%pip install --no-build-isolation --force-reinstall \
    "boto3>=1.28.57" \
    "awscli>=1.29.57" \
    "botocore>=1.31.57"

Restart the kernel

In [None]:
# restart kernel
from IPython.core.display import HTML
HTML("<script>Jupyter.notebook.kernel.restart()</script>")

## Create the boto3 client

Interaction with the Bedrock API is done via the AWS SDK for Python: [boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html).

Depending on your environment, you might need to customize the setup when creating your Bedrock service client. To help with this, we've provided a `get_bedrock_client()` utility method that supports passing in different options. You can find the implementation in [../utils/bedrock.py](../utils/bedrock.py)

#### Set the AWS_DEFAULT_REGION environment variable
We specify in which region the notebook should be running, in our case it is "us-east-1".

#### Using default credentials
Since we are running this notebook from [Amazon Sagemaker Studio](https://aws.amazon.com/sagemaker/studio/) and Sagemaker Studio [execution role](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html), with permissions to access Bedrock, we can just run the cells below as-is.


In [None]:
import json
import os
import sys

import boto3
import botocore

module_path = ".."
sys.path.append(os.path.abspath(module_path))
from utils import bedrock, print_ww

os.environ["AWS_DEFAULT_REGION"] = "us-east-1"

boto3_bedrock = bedrock.get_bedrock_client(
    assumed_role=os.environ.get("BEDROCK_ASSUME_ROLE", None),
    region=os.environ.get("AWS_DEFAULT_REGION", None),
    runtime=False
)

We can look at the Amazon Bedrock foundation models by running the function in the cell below. It will also help us understand if we are correctly connected to Bedrock.

In [None]:
boto3_bedrock.list_foundation_models()['modelSummaries']

### Let's choose one of the models and try making a request using the boto3 client.
#### The model we chose for this first request is 'anthropic.claude-v2'. Let's try making a request with a prompt to generate a blog regarding business decisions

N.B.: The structure of the prompt, and consequently of the request, may vary depending on the model being used.
In the case of Claude 2, the prompt must necessarily have the following structure 
#### "\n\nHuman: {userQuestion}\n\nAssistant:"

In [None]:
prompt_data = """\n\nHuman: Write me a blog about making strong business decisions as a leader.

Assistant:
"""

In [None]:
bedrock_runtime = bedrock.get_bedrock_client(
    assumed_role=os.environ.get("BEDROCK_ASSUME_ROLE", None),
    region=os.environ.get("AWS_DEFAULT_REGION", None),
)

#### Let's set the needed parameters, like the model Id, then use the method invoke_model to send the request to the model

In [None]:
try:

    body = json.dumps({"prompt": prompt_data,"max_tokens_to_sample":1024})
    modelId = "anthropic.claude-v2"
    accept = "application/json"
    contentType = "application/json"

    response = bedrock_runtime.invoke_model(
        body=body, modelId=modelId, accept=accept, contentType=contentType
    )
    response_body = json.loads(response.get("body").read())

except botocore.exceptions.ClientError as error:

    if error.response['Error']['Code'] == 'AccessDeniedException':
           print(f"\x1b[41m{error.response['Error']['Message']}\
                \nTo troubeshoot this issue please refer to the following resources.\
                 \nhttps://docs.aws.amazon.com/IAM/latest/UserGuide/troubleshoot_access-denied.html\
                 \nhttps://docs.aws.amazon.com/bedrock/latest/userguide/security-iam.html\x1b[0m\n")

    else:
        raise error


#### Print the body of the response, to analyze the pieces that compose it

In [None]:
print(response_body)

### To obtain the text generated by the assistant, we need to access the "completion" field.

In [None]:
print(response_body.get("completion"))

## Parameters
### Now let's give a look at the most important parameters used in requests and the differeneces in the output when tuning them
#### To have a better view of all the parameters you could use with Anthropic Claude v2, here's a link to the [API documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-text-completion.html#model-parameters-anthropic-claude-text-completion-overview)

### Randomness and Diversity

Foundation models support the following parameters to control randomness and diversity in the 
response: Temperature, Top P, Top K.

---
**Temperature** – Large language models use probability to construct the words in a sequence. For any 
given next word, there is a probability distribution of options for the next word in the sequence. When 
you set the temperature closer to zero, the model tends to select the higher-probability words. When 
you set the temperature further away from zero, the model may select a lower-probability word.

In technical terms, the temperature modulates the probability density function for the next tokens, 
implementing the temperature sampling technique. This parameter can deepen or flatten the density 
function curve. A lower value results in a steeper curve with more deterministic responses, and a higher 
value results in a flatter curve with more random and creative responses.

### We try with a new prompt, asking for the generation of a small story, and try to confront the outputs when changing the value of the temperature

In [None]:
prompt_data = """\n\nHuman: Write a novel that explains the life of a junior consultant in IT.

Assistant:
"""

### To set the temperature (and the other parameters) of the request, we need to populate the body of the request. The parameters that this model accepts are the following:
##### {
#####    "prompt": "text",
#####    "temperature": float,
#####    "top_p": float,
#####    "top_k": int,
#####    "max_tokens_to_sample": int,
#####    "stop_sequences": [string]
##### } 
#### Let's make a first request with the temperature set at 0.1

In [1]:
try:
    body_data={   
        "prompt": prompt_data,
        "temperature": 0.1,
        "max_tokens_to_sample": 1024,
    }
    body = json.dumps(body_data)
    modelId = "anthropic.claude-v2"
    accept = "application/json"
    contentType = "application/json"

    response = bedrock_runtime.invoke_model(
        body=body, modelId=modelId, accept=accept, contentType=contentType
    )
    response_body = json.loads(response.get("body").read())

except botocore.exceptions.ClientError as error:

    if error.response['Error']['Code'] == 'AccessDeniedException':
           print(f"\x1b[41m{error.response['Error']['Message']}\
                \nTo troubeshoot this issue please refer to the following resources.\
                 \nhttps://docs.aws.amazon.com/IAM/latest/UserGuide/troubleshoot_access-denied.html\
                 \nhttps://docs.aws.amazon.com/bedrock/latest/userguide/security-iam.html\x1b[0m\n")

    else:
        raise error

In [None]:
print(response_body.get("completion"))

#### Now we try setting the temperature value to 0.9

In [None]:
try:
    body_data={   
        "prompt": prompt_data,
        "temperature": 0.9,
        "max_tokens_to_sample": 1024,
    }
    body = json.dumps(body_data)
    modelId = "anthropic.claude-v2"
    accept = "application/json"
    contentType = "application/json"

    response = bedrock_runtime.invoke_model(
        body=body, modelId=modelId, accept=accept, contentType=contentType
    )
    response_body = json.loads(response.get("body").read())

except botocore.exceptions.ClientError as error:

    if error.response['Error']['Code'] == 'AccessDeniedException':
           print(f"\x1b[41m{error.response['Error']['Message']}\
                \nTo troubeshoot this issue please refer to the following resources.\
                 \nhttps://docs.aws.amazon.com/IAM/latest/UserGuide/troubleshoot_access-denied.html\
                 \nhttps://docs.aws.amazon.com/bedrock/latest/userguide/security-iam.html\x1b[0m\n")

    else:
        raise error

In [None]:
print(response_body.get("completion"))

### We can see, looking at the two outputs, how the one obtained from the request with higher temperature is longer, has a more complex lexic, has a better division in chapters. All these differences highlight an increase of creativity in the model's response when the value is raised.

---
**Top P** – Top P defines a cut off based on the sum of probabilities of the potential choices. If you set Top 
P below 1.0, the model considers the most probable options and ignores less probable ones. Top P is 
similar to Top K, but instead of capping the number of choices, it caps choices based on the sum of their 
probabilities.
For the example prompt "I hear the hoof beats of ," you may want the model to provide "horses," 
"zebras" or "unicorns" as the next word. If you set the temperature to its maximum, without capping 
Top K or Top P, you increase the probability of getting unusual results such as "unicorns." If you set the 
temperature to 0, you increase the probability of "horses." If you set a high temperature and set Top K or 
Top P to the maximum, you increase the probability of "horses" or "zebras," and decrease the probability 
of "unicorns."

### Now we test the same prompt, varying the value of the top_p parameter.
#### Also in this case we will look at the difference between a low value (0.1) and a high one (0.9), considering that the range accepted is between 0 and 1.

In [2]:
try:
    body_data={   
        "prompt": prompt_data,
        "top_p": 0.1,
        "max_tokens_to_sample": 1024,
    }
    body = json.dumps(body_data)
    modelId = "anthropic.claude-v2"
    accept = "application/json"
    contentType = "application/json"

    response = bedrock_runtime.invoke_model(
        body=body, modelId=modelId, accept=accept, contentType=contentType
    )
    response_body = json.loads(response.get("body").read())

except botocore.exceptions.ClientError as error:

    if error.response['Error']['Code'] == 'AccessDeniedException':
           print(f"\x1b[41m{error.response['Error']['Message']}\
                \nTo troubeshoot this issue please refer to the following resources.\
                 \nhttps://docs.aws.amazon.com/IAM/latest/UserGuide/troubleshoot_access-denied.html\
                 \nhttps://docs.aws.amazon.com/bedrock/latest/userguide/security-iam.html\x1b[0m\n")

    else:
        raise error

In [None]:
print(response_body.get("completion"))

#### Now let's try setting top_p to 0.9

In [None]:
try:
    body_data={   
        "prompt": prompt_data,
        "top_p": 0.9,
        "max_tokens_to_sample": 1024,
    }
    body = json.dumps(body_data)
    modelId = "anthropic.claude-v2"
    accept = "application/json"
    contentType = "application/json"

    response = bedrock_runtime.invoke_model(
        body=body, modelId=modelId, accept=accept, contentType=contentType
    )
    response_body = json.loads(response.get("body").read())

except botocore.exceptions.ClientError as error:

    if error.response['Error']['Code'] == 'AccessDeniedException':
           print(f"\x1b[41m{error.response['Error']['Message']}\
                \nTo troubeshoot this issue please refer to the following resources.\
                 \nhttps://docs.aws.amazon.com/IAM/latest/UserGuide/troubleshoot_access-denied.html\
                 \nhttps://docs.aws.amazon.com/bedrock/latest/userguide/security-iam.html\x1b[0m\n")

    else:
        raise error

In [None]:
print(response_body.get("completion"))

---
**Top K** – Temperature defines the probability distribution of potential words, and Top K defines the cut 
off where the model no longer selects the words. For example, if K=50, the model selects from 50 of the 
most probable words that could be next in a given sequence. This reduces the probability that an unusual 
word gets selected next in a sequence.
In technical terms, Top K is the number of the highest-probability vocabulary tokens to keep for Top-
K-filtering - This limits the distribution of probable tokens, so the model chooses one of the highest-
probability tokens.

#### Now we are going to see how the model with react by changing this parameter. Remember that, differently from the other two, this parameter's range goes from 0 to 500.
##### Also in this case, the lower the value, the more deterministic the response will be.

#### We first try with a value of 50

In [None]:
try:
    body_data={   
        "prompt": prompt_data,
        "top_k": 50,
        "max_tokens_to_sample": 1024,
    }
    body = json.dumps(body_data)
    modelId = "anthropic.claude-v2"
    accept = "application/json"
    contentType = "application/json"

    response = bedrock_runtime.invoke_model(
        body=body, modelId=modelId, accept=accept, contentType=contentType
    )
    response_body = json.loads(response.get("body").read())

except botocore.exceptions.ClientError as error:

    if error.response['Error']['Code'] == 'AccessDeniedException':
           print(f"\x1b[41m{error.response['Error']['Message']}\
                \nTo troubeshoot this issue please refer to the following resources.\
                 \nhttps://docs.aws.amazon.com/IAM/latest/UserGuide/troubleshoot_access-denied.html\
                 \nhttps://docs.aws.amazon.com/bedrock/latest/userguide/security-iam.html\x1b[0m\n")

    else:
        raise error

In [None]:
print(response_body.get("completion"))

#### Then, lets'try setting top_k at 450

In [None]:
try:
    body_data={   
        "prompt": prompt_data,
        "top_k": 450,
        "max_tokens_to_sample": 1024,
    }
    body = json.dumps(body_data)
    modelId = "anthropic.claude-v2"
    accept = "application/json"
    contentType = "application/json"

    response = bedrock_runtime.invoke_model(
        body=body, modelId=modelId, accept=accept, contentType=contentType
    )
    response_body = json.loads(response.get("body").read())

except botocore.exceptions.ClientError as error:

    if error.response['Error']['Code'] == 'AccessDeniedException':
           print(f"\x1b[41m{error.response['Error']['Message']}\
                \nTo troubeshoot this issue please refer to the following resources.\
                 \nhttps://docs.aws.amazon.com/IAM/latest/UserGuide/troubleshoot_access-denied.html\
                 \nhttps://docs.aws.amazon.com/bedrock/latest/userguide/security-iam.html\x1b[0m\n")

    else:
        raise error

In [None]:
print(response_body.get("completion"))

### Length

#### Max tokens

Foundation models also usually support a very important parameter for the length of the response, usually named MAX_TOKENS or similar. In the case of Anthropic Claude v2, the name is max_tokens_to_sample.

Two things have to be noted:

1. LLMs have a limit. It would be impossible to not have one for what regards the amount of text to generate. Most of the models have a maximum number of tokens generated which may vary (in the case of Claude v2 it is 100k).

2. The limit is expressed in tokens, not in characters. Tokens are the basic units of data processed by LLMs. In the context of text, a token can be a word, part of a word (subword), or even a character — depending on the tokenization process.

To show the limit, we will try a simple request changing the value of "max_tokens_to_sample" to a number that is bigger than what is allowed. 

In [None]:
try:
    body_data={   
        "prompt": prompt_data,
        "top_k": 450,
        "max_tokens_to_sample": 200000,
    }
    body = json.dumps(body_data)
    modelId = "anthropic.claude-v2"
    accept = "application/json"
    contentType = "application/json"

    response = bedrock_runtime.invoke_model(
        body=body, modelId=modelId, accept=accept, contentType=contentType
    )
    response_body = json.loads(response.get("body").read())

except botocore.exceptions.ClientError as error:

    if error.response['Error']['Code'] == 'AccessDeniedException':
           print(f"\x1b[41m{error.response['Error']['Message']}\
                \nTo troubeshoot this issue please refer to the following resources.\
                 \nhttps://docs.aws.amazon.com/IAM/latest/UserGuide/troubleshoot_access-denied.html\
                 \nhttps://docs.aws.amazon.com/bedrock/latest/userguide/security-iam.html\x1b[0m\n")

    else:
        raise error