# Retrieval Augmented Generation with Amazon Bedrock - Workshop Setup

> *PLEASE NOTE: This notebook should work well with the **`Data Science 3.0`** kernel in SageMaker Studio*

---

In this notebook, we will set up the [`boto3` Python SDK](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html) to work with [Amazon Bedrock](https://aws.amazon.com/bedrock/) Foundation Models as well as install extra dependencies needed for this workshop. Specifically, we will be using the following libraries throughout the workshop...

* [LangChain](https://python.langchain.com/docs/get_started/introduction) for large language model (LLM) utilities
* [FAISS](https://github.com/facebookresearch/faiss) for vector similarity searching
* [Streamlit](https://streamlit.io/) for user interface (UI) building

---
## Install External Dependencies

The code below will install the rest of the Python packages required for the workshop.

In [7]:
%pip install --upgrade pip
%pip install --quiet -r ../requirements.txt

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
jupyter-scheduler 2.5.1 requires sqlalchemy~=1.0, but you have sqlalchemy 2.0.21 which is incompatible.[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


---
## Create the `boto3` client connection to Amazon Bedrock

Interaction with the Bedrock API is done via the AWS SDK for Python: [boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html).

As you are running this notebook from [Amazon Sagemaker Studio](https://aws.amazon.com/sagemaker/studio/) and your Sagemaker Studio [execution role](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html) has permissions to access Bedrock you can just run the cells below as-is in order to create a connection to Amazon Bedrock. This is also the case if you are running these notebooks from a computer whose default AWS credentials have access to Bedrock.

In [9]:
import boto3
import os
from IPython.display import Markdown, display

region = os.environ.get("AWS_REGION")
bedrock_service = boto3.client(
    service_name='bedrock',
    region_name=region,
)

#### Validate the connection

We can check the client works by trying out the `list_foundation_models()` method, which will tell us all the models available for us to use 

In [12]:
bedrock_service.list_foundation_models()

{'ResponseMetadata': {'RequestId': 'd680f274-65f8-4e53-b061-e47018c67471',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'date': 'Wed, 15 May 2024 15:48:35 GMT',
   'content-type': 'application/json',
   'content-length': '23438',
   'connection': 'keep-alive',
   'x-amzn-requestid': 'd680f274-65f8-4e53-b061-e47018c67471'},
  'RetryAttempts': 0},
 'modelSummaries': [{'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-tg1-large',
   'modelId': 'amazon.titan-tg1-large',
   'modelName': 'Titan Text Large',
   'providerName': 'Amazon',
   'inputModalities': ['TEXT'],
   'outputModalities': ['TEXT'],
   'responseStreamingSupported': True,
   'customizationsSupported': [],
   'inferenceTypesSupported': ['ON_DEMAND'],
   'modelLifecycle': {'status': 'ACTIVE'}},
  {'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-image-generator-v1:0',
   'modelId': 'amazon.titan-image-generator-v1:0',
   'modelName': 'Titan Image Generator G1',
   'providerName': 'Amazon',

---

## `InvokeModel` body and output

The `invoke_model()` method of the Amazon Bedrock client (`InvokeModel` API) will be the primary method we use for most of our Text Generation and Processing tasks - whichever model we're using.

Although the method is shared, the format of input and output varies depending on the foundation model used - as described below:

### Anthropic Claude

#### Input

```json
{
    "prompt": "\n\nHuman:<prompt>\n\Assistant:",
    "max_tokens_to_sample": 300,
    "temperature": 0.5,
    "top_k": 250,
    "top_p": 1,
    "stop_sequences": ["\n\nHuman:"]
}
```

#### Output

```json
{
    "completion": "<output>",
    "stop_reason": "stop_sequence"
}
```

---

## Common inference parameter definitions

### Randomness and Diversity

Foundation models support the following parameters to control randomness and diversity in the 
response.

**Temperature** – Large language models use probability to construct the words in a sequence. For any 
given next word, there is a probability distribution of options for the next word in the sequence. When 
you set the temperature closer to zero, the model tends to select the higher-probability words. When 
you set the temperature further away from zero, the model may select a lower-probability word.

In technical terms, the temperature modulates the probability density function for the next tokens, 
implementing the temperature sampling technique. This parameter can deepen or flatten the density 
function curve. A lower value results in a steeper curve with more deterministic responses, and a higher 
value results in a flatter curve with more random responses.

**Top K** – Temperature defines the probability distribution of potential words, and Top K defines the cut 
off where the model no longer selects the words. For example, if K=50, the model selects from 50 of the 
most probable words that could be next in a given sequence. This reduces the probability that an unusual 
word gets selected next in a sequence.
In technical terms, Top K is the number of the highest-probability vocabulary tokens to keep for Top-
K-filtering - This limits the distribution of probable tokens, so the model chooses one of the highest-
probability tokens.

**Top P** – Top P defines a cut off based on the sum of probabilities of the potential choices. If you set Top 
P below 1.0, the model considers the most probable options and ignores less probable ones. Top P is 
similar to Top K, but instead of capping the number of choices, it caps choices based on the sum of their 
probabilities.
For the example prompt "I hear the hoof beats of ," you may want the model to provide "horses," 
"zebras" or "unicorns" as the next word. If you set the temperature to its maximum, without capping 
Top K or Top P, you increase the probability of getting unusual results such as "unicorns." If you set the 
temperature to 0, you increase the probability of "horses." If you set a high temperature and set Top K or 
Top P to the maximum, you increase the probability of "horses" or "zebras," and decrease the probability 
of "unicorns."

### Length

The following parameters control the length of the generated response.

**Response length** – Configures the minimum and maximum number of tokens to use in the generated 
response.

**Length penalty** – Length penalty optimizes the model to be more concise in its output by penalizing 
longer responses. Length penalty differs from response length as the response length is a hard cut off for 
the minimum or maximum response length.

In technical terms, the length penalty penalizes the model exponentially for lengthy responses. 0.0 
means no penalty. Set a value less than 0.0 for the model to generate longer sequences, or set a value 
greater than 0.0 for the model to produce shorter sequences.

### Repetitions

The following parameters help control repetition in the generated response.

**Repetition penalty (presence penalty)** – Prevents repetitions of the same words (tokens) in responses. 
1.0 means no penalty. Greater than 1.0 decreases repetition.

---

## Try out the text generation model

With some theory out of the way, let's see the models in action! Run the cells below to see how to generate text with the Anthropic Claude instant model. 

### Client side `boto3` bedrock-runtime connection

In [11]:
bedrock = boto3.client(
    service_name='bedrock-runtime',
    region_name=region,
)

### Anthropic Claude Instant

In [13]:
import json

PROMPT = '''Human: Write me a blog about making strong business decisions as a leader.

Assistant:
'''

body = json.dumps({"prompt": PROMPT, "max_tokens_to_sample": 500})
modelId = "anthropic.claude-instant-v1"
accept = "application/json"
contentType = "application/json"

response = bedrock.invoke_model(
    body=body, modelId=modelId, accept=accept, contentType=contentType
)
response_body = json.loads(response.get("body").read())
output = response_body.get("completion").strip()
display(Markdown(output))

Here is a draft blog post on making strong business decisions as a leader:

How to Make Strong Business Decisions as a Leader

Making tough business decisions is one of the most important responsibilities of being a leader. However, choosing the right course of action is often difficult, with many factors to consider and potential downsides to weigh. Here are some tips for making strong, effective decisions that will help move your business in the right direction:

Gather All Necessary Information
Before deciding on anything major, take the time to gather all relevant information. Understand the issue fully by gathering data, consulting with knowledgeable colleagues, and identifying risks and opportunities. Don't rush into a decision before you have a clear picture of the situation.

Consider Multiple Options 
Force yourself to consider more than one option before settling on a choice. Brainstorming alternative strategies will help uncover new ideas you may not have thought of otherwise. Take the time to fully flesh out two or three viable options to compare pros and cons.

Weigh Short and Long-Term Impacts
Assess not just the immediate impacts but how each option might affect the business in the short, medium, and long run. Will any decisions have unintended consequences years down the line that you need to account for? Think through possible second and third order effects. 

Get Input From Others
Discuss major decisions with your management team and get varied perspectives on alternatives. Don't be afraid to let others challenge your assumptions. Multiple viewpoints can help catch flaws in reasoning and surface issues you may have overlooked on your own.

Follow a Logical Decision Process
Use a step-by-step process like listing criteria, scoring options against those criteria, and selecting the best alternative based on objective analysis. Relying on facts and data will lead to decisions viewed as well-reasoned whether or not they turn out perfectly.

Communicate Decisions Clearly  
Once a choice is made, communicate it to all affected parties along with your rationale. Provide context so people understand the reasoning even if they don't fully agree. This will gain trust and buy-in which are critical for successful implementation.

Overall, taking the time for thorough consideration, objective analysis of options, and clear communication of the decision process will serve you well in leading the business confidently through both calm and turbulent times. Strong decision-making is key to navigating challenges and capitalizing on new opportunities for growth.

## Generate streaming output

For large language models, it can take noticeable time to generate long output sequences. Rather than waiting for the entire response to be available, latency-sensitive applications may like to **stream** the response to users.

Run the code below to see how you can achieve this with Bedrock's `invoke_model_with_response_stream()` method - returning the response body in separate chunks.

In [14]:
from IPython.display import clear_output, display, display_markdown, Markdown

response = bedrock.invoke_model_with_response_stream(
    body=body, modelId=modelId, accept=accept, contentType=contentType
)
stream = response.get('body')
output = []

if stream:
    for event in stream:
        chunk = event.get('chunk')
        if chunk:
            chunk_obj = json.loads(chunk.get('bytes').decode())
            text = chunk_obj['completion']
            clear_output(wait=True)
            output.append(text)
            display_markdown(Markdown(''.join(output)))

 Here is a draft blog post on making strong business decisions as a leader:

How to Make Strong Business Decisions as a Leader

As a leader, one of your most important responsibilities is making decisions that help drive your company's success. However, making the right calls is often challenging given the complexity of business issues and high stakes involved. Here are some tips for approaching decision-making in a strategic, thoughtful manner:

Gather All Relevant Information
Make sure you understand all aspects of the situation before committing to a course of action. Seek input from various teams and individuals who can provide different perspectives. Look at past performance data, market research, financial projections, and anything else that gives useful context. The more informed you are, the stronger your decisions will be. 

Weigh Multiple Options Carefully
Don't stop at the first idea that seems workable. Challenge yourself and others to brainstorm alternative approaches. Consider pros and cons of two or three viable options so you can compare them systematically. Think about potential downsides or unintended consequences of each path. Thoroughly exploring alternatives leads to optimal solutions.

Consult With your Team
As a leader, you don't need to go it alone. Discuss important decisions with your management team and get their feedback and recommendations. Allow for debate so different viewpoints surface. You may realize an approach you hadn't considered or that a consensus is forming around one option. Team involvement fosters buy-in and commitment to following through on the decision.

Consider Short and Long-Term Impact
Focus on decisions that balance both immediate needs and long-term strategic interests. Avoid fixes that only address pressing problems but risk undermining future goals. Likewise, don't pursue opportunities that may not pay off for years when speed or cash flow is essential now. Choosing strategies with longevity ensures the sustainable success of your business.  

Trust Your Instincts But Get Second Opinions
You have expertise and experience to rely on, but getting input from knowledgeable outsiders provides an objective reality check. Consult advisors, industry contacts, mentors etc. who can point out flaws in your reasoning or assumptions. Then weigh all perspectives carefully before deciding. Outsider perspectives challenge you to defend your position and strengthen the quality of your chosen path.

Make Decisions Decisively
Once considering all factors, make a clear decision and confidently lead its implementation. Wavering or protracted deliberation

## Generate embeddings

Use text embeddings to convert text into meaningful vector representations. You input a body of text 
and the output is a (1 x n) vector. You can use embedding vectors for a wide variety of applications. 
Bedrock currently offers Titan Embeddings for text embedding that supports text similarity (finding the 
semantic similarity between bodies of text) and text retrieval (such as search).

At the time of writing you can use `amazon.titan-embed-text-v1` as embedding model via the API. The input text size is 8192 tokens and the output vector length is 1536.

To use a text embeddings model, use the InvokeModel API operation or the Python SDK.
Use InvokeModel to retrieve the vector representation of the input text from the specified model.



#### Input

```json
{
    "inputText": "<text>"
}
```

#### Output

```json
{
    "embedding": []
}
```


Let's see how to generate embeddings of some text:

In [15]:
prompt_data = "Amazon Bedrock supports foundation models from industry-leading providers such as \
AI21 Labs, Anthropic, Stability AI, and Amazon. Choose the model that is best suited to achieving \
your unique goals."

In [16]:
body = json.dumps({"inputText": prompt_data})
modelId = "amazon.titan-embed-text-v1"
accept = "application/json"
contentType = "application/json"

response = bedrock.invoke_model(
    body=body, modelId=modelId, accept=accept, contentType=contentType
)
response_body = json.loads(response.get("body").read())

embedding = response_body.get("embedding")
print(f"The embedding vector has {len(embedding)} values\n{embedding[0:3]+['...']+embedding[-3:]}")

The embedding vector has 1536 values
[0.16601562, 0.23632812, 0.703125, '...', 0.26953125, -0.609375, -0.55078125]


## Next steps

In this notebook we have successfully set up our Bedrock compatible environment and showed some basic examples of invoking Amazon Bedrock models using the AWS Python SDK. You're now ready to move on to the next notebook to start building our retrieval augmented generation (RAG) application!