# Maximizing AI Potentials: Leveraging Foundational Models from Amazon Bedrock and Amazon OpenSearch Serverless as Vector Engine

### Context
Amazon Bedrock is a fully managed service that provides access to FMs from third-party providers and Amazon; available via an API. With Bedrock, you can choose from a variety of models to find the one that’s best suited for your use case. On one hand Amazon Bedrock provides an option to generate vectors as well as summarizezation of texts, then on other hands vector engine for Amazon OpenSearch Serverless complements it by providing a machinsm to store those vectors and run semantic search against those vectors. 

In this sample notebook you will explore some of the most common usage patterns we are seeing with our customers for Generative AI such as generating text and images, creating value for organizations by improving productivity. This is achieved by leveraging foundation models to help in composing emails, summarizing text, answering questions, building chatbots, and creating images.

### Challenges
- How to manage large document(s) that exceed the token limit
- How to find the document(s) relevant to the question being asked

### Proposal
To the above challenges, this notebook proposes the following strategy
#### Prepare documents
![Embeddings](./images/Embeddings_lang.png)

Before being able to answer the questions, the documents must be processed and a stored in a document store index
- Load the documents
- Process and split them into smaller chunks
- Create a numerical vector representation of each chunk using Amazon Bedrock Titan Embeddings model
- Create an index using the chunks and the corresponding embeddings and store into OpenSearch Serverless
#### Ask question
![Question](./images/Chatbot_lang.png)

When the documents index is prepared, you are ready to ask the questions and relevant documents will be fetched based on the question being asked. Following steps will be executed.
- Create an embedding of the input question
- Compare the question embedding with the embeddings stored in OpenSearch Serverless
- Fetch the (top N) relevant document chunks using vector engine
- Add those chunks as part of the context in the prompt
- Send the prompt to the model under Amazon Bedrock
- Get the contextual answer based on the documents retrieved

## Usecase
#### Dataset
To explain this architecture pattern we are using the documents from IRS. These documents explain topics such as:
- Original Issue Discount (OID) Instruments
- Reporting Cash Payments of Over $10,000 to IRS
- Employer's Tax Guide

The model will try to answer from the documents in easy language.


## Implementation
In order to follow the RAG approach this notebook is using the LangChain framework where it has integrations with different services and tools that allow efficient building of patterns such as RAG. We will be using the following tools:

- **LLM (Large Language Model)**: Anthropic Claude V1 available through Amazon Bedrock
- **Vector Store**: vector engine for Amazon OpenSearch Serverless
  In this notebook we are using OpenSearch Serverless as a vector-store to store both the embeddings and the documents. 
- **Index**: VectorIndex - This model will be used to understand the document chunks and provide an answer in human friendly manner.
- **Embeddings Model**: Amazon Titan Embeddings available through Amazon Bedrock

  This model will be used to generate a numerical representation of the textual documents
- **Document Loader**: PDF Loader available through LangChain

  This is the loader that can load the documents from a source, in this example we are loading the vector embeddings generated from those file chunks to OpenSearch Serverless. 

  The index helps to compare the input embedding and the document embeddings to find relevant document
- **Wrapper**: wraps index, vector store, embeddings model and the LLM to abstract away the logic from the user.

### Setup
To run this notebook you would need to install dependencies such as, [PyPDF](https://pypi.org/project/pypdf/)



Then begin with instantiating the LLM and the Embeddings model. Here we are using Amazon Titan to demonstrate the use case.

Note: It is possible to choose other models available with Bedrock. You can replace the `model_id` as follows to change the model.

`llm = Bedrock(model_id="amazon.titan-tg1-large")`

Available models under Bedrock have the following IDs:
- `amazon.titan-tg1-large`
- `ai21.j2-grande-instruct`
- `ai21.j2-jumbo-instruct`
- `anthropic.claude-instant-v1`
- `anthropic.claude-v1`

#### ⚠️⚠️⚠️ Execute the following cells before running this notebook ⚠️⚠️⚠️

For a detailed description on what the following cells do refer to [Bedrock boto3 setup](../00_Intro/bedrock_boto3_setup.ipynb) notebook.

In [1]:
# Make sure you run `download-dependencies.sh` from the root of the repository to download the dependencies before running this cell
%pip install ./dependencies/botocore-1.29.162-py3-none-any.whl ./dependencies/boto3-1.26.162-py3-none-any.whl ./dependencies/awscli-1.27.162-py3-none-any.whl --force-reinstall
%pip install langchain==0.0.245 --quiet
%pip install pypdf==3.8.1 faiss-cpu==1.7.4 --quiet
%pip install requests_aws4auth opensearch-py

[0mProcessing ./dependencies/botocore-1.29.162-py3-none-any.whl
[31mERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: '/Users/mankhanu/Github/test/amazon-bedrock-workshop/03_QuestionAnswering/dependencies/botocore-1.29.162-py3-none-any.whl'
[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [1]:
#### Un comment the following lines to run from your local environment outside of the AWS account with Bedrock access

# import os
# os.environ['BEDROCK_ASSUME_ROLE'] = '<enter role>'
# os.environ['AWS_PROFILE'] = '<aws-profile>'

In [2]:
import boto3
import json
import os
import sys

module_path = ".."
sys.path.append(os.path.abspath(module_path))
from utils import bedrock, print_ww

os.environ['AWS_DEFAULT_REGION'] = 'us-east-1'
boto3_bedrock = bedrock.get_bedrock_client(os.environ.get('BEDROCK_ASSUME_ROLE', None))
print (f"bedrock client {boto3_bedrock}")

Create new client
  Using region: us-east-1
boto3 Bedrock client successfully created!
bedrock(https://bedrock.us-east-1.amazonaws.com)
bedrock client <botocore.client.Bedrock object at 0x7fb140475060>


In [3]:
## set up opensearch
from opensearchpy import OpenSearch, RequestsHttpConnection
from requests_aws4auth import AWS4Auth
import json

# create open search collection public endpoint from public preview in us-east-2
host = 'jobm91bhqffl30fsl22a.us-east-2.aoss.amazonaws.com' # OpenSearch Serverless collection endpoint
region = 'us-east-2' # e.g. us-west-2

service = 'aoss'
credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service,
session_token=credentials.token)

# Create an OpenSearch client
client = OpenSearch(
    hosts = [{'host': host, 'port': 443}],
    http_auth = awsauth,
    timeout = 300,
    use_ssl = True,
    verify_certs = True,
    connection_class = RequestsHttpConnection
)

### Setup langchain

We create an instance of the Bedrock classes for the LLM and the embedding models. In this example we are showing an example with "titan" model from Amazon, and "claude" model from Anthropic.

In [5]:
# We will be using the Titan Embeddings Model to generate our Embeddings.
from langchain.embeddings import BedrockEmbeddings
from langchain.llms.bedrock import Bedrock

# - create the Anthropic Model
claude_llm = Bedrock(model_id="anthropic.claude-v1", client=boto3_bedrock, model_kwargs={'max_tokens_to_sample':200})
titan_llm = Bedrock(model_id= "amazon.titan-tg1-large", client=boto3_bedrock)
bedrock_embeddings = BedrockEmbeddings(client=boto3_bedrock)

### Data Preparation
Let's first download some of the files to build our document store. For this example we will be using public IRS documents from [here](https://www.irs.gov/publications).

In [6]:
!mkdir data

from urllib.request import urlretrieve
files = [
    'https://www.irs.gov/pub/irs-pdf/p1544.pdf',
    'https://www.irs.gov/pub/irs-pdf/p15.pdf',
    'https://www.irs.gov/pub/irs-pdf/p1212.pdf'
]
for url in files:
    file_path = './data/' + url.split('/')[-1]
    urlretrieve(url, file_path)

mkdir: data: File exists


After downloading we can load the documents with the help of [DirectoryLoader from PyPDF available under LangChain](https://python.langchain.com/en/latest/reference/modules/document_loaders.html) and splitting them into smaller chunks.

Note: The retrieved document/text should be large enough to contain enough information to answer a question; but small enough to fit into the LLM prompt. Also the embeddings model has a limit of the length of input tokens limited to 512 tokens, which roughly translates to ~2000 characters. For the sake of this use-case we are creating chunks of roughly 1000 characters with an overlap of 100 characters using [RecursiveCharacterTextSplitter](https://python.langchain.com/en/latest/modules/indexes/text_splitters/examples/recursive_text_splitter.html).

In [34]:
import numpy as np
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader, PyPDFDirectoryLoader

loader = PyPDFDirectoryLoader("./data/")

documents = loader.load()
# - in our testing Character split works better with this PDF data set
text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to show.
    chunk_size = 1000,
    chunk_overlap  = 100,
)
docs = text_splitter.split_documents(documents)

In [35]:
avg_doc_length = lambda documents: sum([len(doc.page_content) for doc in documents])//len(documents)
avg_char_count_pre = avg_doc_length(documents)
avg_char_count_post = avg_doc_length(docs)
print(f'Average length among {len(documents)} documents loaded is {avg_char_count_pre} characters.')
print(f'After the split we have {len(docs)} documents more than the original {len(documents)}.')
print(f'Average length among {len(docs)} documents (after split) is {avg_char_count_post} characters.')

Average length among 73 documents loaded is 5850 characters.
After the split we have 503 documents more than the original 73.
Average length among 503 documents (after split) is 910 characters.


We had 3 PDF documents which have been split into smaller ~500 chunks.

Now we can see how a sample embedding would look like for one of those chunks

In [36]:
query_embedding = np.array(bedrock_embeddings.embed_query(docs[0].page_content))
np.array(query_embedding)

array([-0.15917969,  0.76171875,  0.24511719, ...,  0.12109375,
       -0.25585938,  0.06591797])

 
The below function will establish a connection with OpenSearch Serverless, create a new index, create embeddings for the documents and then store the embeddings in OpenSearch serverless. For details on documentation refer this link: https://python.langchain.com/docs/integrations/vectorstores/opensearch

*Note: Wait for a minute or two after the below command to excute, before the new index can be queried.*

In [15]:
# TODO - Direct langchain integration with version 0.0.245 gives timeout error, therefore, commenting the following code. 

# from langchain.vectorstores import OpenSearchVectorSearch

# docsearch = OpenSearchVectorSearch.from_documents(
#     docs,
#     bedrock_embeddings,
#     opensearch_url=host,
#     http_auth=awsauth,
#     timeout = 300,
#     use_ssl = True,
#     verify_certs = True,
#     connection_class = RequestsHttpConnection,
#     index_name="bedrock-aos-irs-index2",
#     engine="faiss",
#     bulk_size=len(docs)
# )

In [43]:
index_name = "bedrock-aos-demo-new1"
vector_size = 4096

In [44]:
# create a new index
index_body = {
    "settings": {
        "index.knn": True
  },
  'mappings': {
    'properties': {
      "title": { "type": "text", "fields": { "keyword": { "type": "keyword" } } }, #the field will be title.keyword and the data type will be keyword, this will act as sub field for
      "v_title": { "type": "knn_vector", "dimension": vector_size },
    }
  }
}

client.indices.create(
  index=index_name, 
  body=index_body
)

{'acknowledged': True,
 'shards_acknowledged': True,
 'index': 'bedrock-aos-demo-new1'}

In [45]:
# python code to view schema for OpenSearch Serverless. 
client.indices.get_mapping(index_name)

{'bedrock-aos-demo-new1': {'mappings': {'properties': {'title': {'type': 'text',
     'fields': {'keyword': {'type': 'keyword'}}},
    'v_title': {'type': 'knn_vector', 'dimension': 4096}}}}}

In [46]:
actions =[]
bulk_size = 0
action = {"index": {"_index": index_name}}


# # Prepare bulk request
# actions.append(action)
# actions.append(json_data.copy())

In [47]:
# Bulk API to ingest documents in OSS.
# it will take about 5 mins to ingest the 503 vectors
for document in docs: 
    sample_embedding = np.array(bedrock_embeddings.embed_query(document.page_content))
    actions.append(action)
    json_data = {
             "title" : document.page_content,
            "v_title" : sample_embedding
        }
    actions.append(json_data)
    bulk_size+=1
    if(bulk_size > 200 ):
        client.bulk(body=actions)
        print(f"bulk request sent with size: {bulk_size}")
        bulk_size = 0

#ingest remaining documents
print("remaining documents: ", bulk_size)
client.bulk(body=actions)

bulk request sent with size: 201
bulk request sent with size: 201
remaining documents:  101


{'took': 1008,
 'errors': False,
 'items': [{'index': {'_index': 'bedrock-aos-demo-new1',
    '_id': '1%3A0%3AYF9Ws4kBYEx0HOrvedTs',
    '_version': 1,
    'result': 'created',
    '_shards': {'total': 0, 'successful': 0, 'failed': 0},
    '_seq_no': 0,
    '_primary_term': 0,
    'status': 201}},
  {'index': {'_index': 'bedrock-aos-demo-new1',
    '_id': '1%3A0%3AYV9Ws4kBYEx0HOrvedTs',
    '_version': 1,
    'result': 'created',
    '_shards': {'total': 0, 'successful': 0, 'failed': 0},
    '_seq_no': 0,
    '_primary_term': 0,
    'status': 201}},
  {'index': {'_index': 'bedrock-aos-demo-new1',
    '_id': '1%3A0%3AYl9Ws4kBYEx0HOrvedTs',
    '_version': 1,
    'result': 'created',
    '_shards': {'total': 0, 'successful': 0, 'failed': 0},
    '_seq_no': 0,
    '_primary_term': 0,
    'status': 201}},
  {'index': {'_index': 'bedrock-aos-demo-new1',
    '_id': '1%3A0%3AY19Ws4kBYEx0HOrvedTs',
    '_version': 1,
    'result': 'created',
    '_shards': {'total': 0, 'successful': 0, 'failed

Following the similar pattern embeddings could be generated for the entire corpus and stored in a vector store.
**⚠️⚠️⚠️ NOTE: it might take few minutes to run the following cell ⚠️⚠️⚠️**

### Question Answering

Now that we have our vector store in place, we can start asking questions.

In [48]:
query = "Is it possible that I get sentenced to jail due to failure in filings?"

The first step would be to create an embedding of the query such that it could be compared with the documents

In [49]:
query_embedding = np.array(bedrock_embeddings.embed_query(query))
np.array(query_embedding)

array([-0.11181641, -0.20019531,  0.00915527, ..., -0.4921875 ,
       -0.05664062,  0.43359375])

In [50]:
index_name

'bedrock-aos-demo-new1'

In [51]:
query_os = {
  "size": 3,
  "fields": ["title"],
  "_source": False,
  "query": {
    "knn": {
      "v_title": {
        "vector": query_embedding,
        "k": vector_size
      }
    }
  }
}

relevant_documents = client.search(
body = query_os,
index = index_name
)

In [52]:
relevant_documents

{'took': 86,
 'timed_out': False,
 '_shards': {'total': 0, 'successful': 0, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 603, 'relation': 'eq'},
  'max_score': 0.0019617553,
  'hits': [{'_index': 'bedrock-aos-demo-new1',
    '_id': '1%3A0%3Ar19Vs4kBYEx0HOrvINPF',
    '_score': 0.0019617553,
    'fields': {'title': ['There are civil penalties for failure to:\nFile a correct Form 8300 by the date it is \ndue, and\nProvide the required statement to those \nnamed in the Form 8300.\nIf you intentionally disregard the requirement \nto file a correct Form 8300 by the date it is due, \nthe penalty is the greater of:\n1.$25,000, or\n2.The amount of cash you received and \nwere required to report (up to $100,000).\nThere are criminal penalties for:\nWillful failure to file Form 8300,\nWillfully filing a false or fraudulent Form \n8300,\nStopping or trying to stop Form 8300 from \nbeing filed, and\nSetting up, helping to set up, or trying to \nset up a transaction in a way that would 

We can use this embedding of the query to then fetch relevant documents.
Now our query is represented as embeddings we can do a similarity search of our query against our data store providing us with the most relevant information.

In [53]:
print(len(relevant_documents["hits"]["hits"]))
print("--------------------")
context = " "
for i, rel_doc in enumerate(relevant_documents["hits"]["hits"]):
    print_ww(f'## Document {i+1}: {relevant_documents["hits"]["hits"][i]["fields"]["title"][0]}.......')
    print('---')
    context += relevant_documents["hits"]["hits"][i]["fields"]["title"][0]

3
--------------------
## Document 1: There are civil penalties for failure to:
File a correct Form 8300 by the date it is
due, and
Provide the required statement to those
named in the Form 8300.
If you intentionally disregard the requirement
to file a correct Form 8300 by the date it is due,
the penalty is the greater of:
1.$25,000, or
2.The amount of cash you received and
were required to report (up to $100,000).
There are criminal penalties for:
Willful failure to file Form 8300,
Willfully filing a false or fraudulent Form
8300,
Stopping or trying to stop Form 8300 from
being filed, and
Setting up, helping to set up, or trying to
set up a transaction in a way that would
make it seem unnecessary to file Form
8300.
If you willfully fail to file Form 8300, you can
be fined up to $250,000 for individuals
RECORDS($500,000 for corporations) or sentenced to up
to 5 years in prison, or both. These dollar
amounts are based on Section 3571 of Title 18
of the U.S. Code........
---
## Document 

In [54]:
parameters = {
    "maxTokenCount":512,
    "stopSequences":[],
    "temperature":0,
    "topP":0.9
    }

In [55]:
query

'Is it possible that I get sentenced to jail due to failure in filings?'

In [62]:
prompt_data_claude = f"""Human: Answer the question based only on the information provided. If the answer is not in the context, say "I don't know, answer not found in the documents. Provide quote from the document.
<context>
{context}
</context>
<question>
{query}
</question>
Assistant:"""



In [63]:
output_text_claude = claude_llm(prompt_data_claude)

print ("########## Ouput from Claude Model #################\n")
print(output_text_claude)



########## Ouput from Claude Model #################

 Yes, it is possible to be sentenced to jail for failure to file Form 8300. The document states:

"There are criminal penalties for:
Willful failure to file Form 8300, 
Willfully filing a false or fraudulent Form 
8300,
Stopping or trying to stop Form 8300 from 
being filed, and
Setting up, helping to set up, or trying to 
set up a transaction in a way that would 
make it seem unnecessary to file Form 
8300.
If you willfully fail to file Form 8300, you can 
be fined up to $250,000 for individuals 
RECORDS($500,000 for corporations) or sentenced to up 
to 5 years in prison, or both."


In [64]:
prompt_data_titan = f"""Answer the below question based on the context provided. If the answer is not in the context, say "I don't know, answer not found in the documents".
{context}
{query}
"""

In [65]:
output_text_titan = titan_llm(prompt_data_titan)
print ("########## Ouput from Titan Model ################\n")
print(output_text_titan)

########## Ouput from Titan Model ################

Sorry, this model is designed to avoid giving legal advice. Please see our content limitations page for more information. 

Based on the provided context:
The penalty for willful failure to file Form 8300 is a fine of up to $250,000 for individuals or $500,000 for corporations, or up to 5 years in prison, or both.


## Conclusion
Congratulations on completing this moduel on retrieval augmented generation! This is an important technique that combines the power of large language models with the precision of retrieval methods. By augmenting generation with relevant retrieved examples, the responses we recieved become more coherent, consistent and grounded. You should feel proud of learning this innovative approach. I'm sure the knowledge you've gained will be very useful for building creative and engaging language generation systems. Well done!

In the above implementation of RAG based Question Answering we have explored the following concepts and how to implement them using Amazon Bedrock and it's LangChain integration.

- Loading documents and generating embeddings to create a vector store
- Retrieving documents to the question
- Preparing a prompt which goes as input to the LLM
- Present an answer in a human friendly manner

### Take-aways
- Experiment with different Vector Stores
- Leverage various models available under Amazon Bedrock to see alternate outputs
- Explore options such as persistent storage of embeddings and document chunks
- Integration with enterprise data stores

# Thank You