# AIM329 - Build a chat assistant with Amazon Bedrock

Welcome to session AIM329 of AWS re:Invent 2023 - Build a chat assistant with Amazon Bedrock.

This notebook will walk you through the process of building a chat assistant using a Large Language Model (LLM) hosted on [Amazon Bedrock](https://aws.amazon.com/bedrock/). We will use the [Retrieval Augment Generation (RAG)](https://docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-foundation-models-customize-rag.html) architecture with the [Amazon Titan Embeddings](https://aws.amazon.com/about-aws/whats-new/2023/09/amazon-titan-embeddings-generally-available/) model to convert raw text to vectors and [Amazon OpenSearch Serverless](https://aws.amazon.com/opensearch-service/features/serverless/) to store the vectors.

<div class="alert alert-block alert-info">
<b>Note:</b>
    <ul>
        <li>This notebook should only be run from within an <a href="https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html">Amazon SageMaker Notebook instance</a> or within an <a href="https://docs.aws.amazon.com/sagemaker/latest/dg/notebooks.html">Amazon SageMaker Studio Notebook</a>.</li>
        <li>At the time of this writing, the most relevant latest version of the Kernel for this notebook was <i>conda_python3</i>.</li>
    </ul>
</div>

**Table of Contents:**

1. [Complete prerequisites](#Complete%20prerequisites)

    1. [Check and configure access to the Internet](#Check%20and%20configure%20access%20to%20the%20Internet)

    2. [Check and upgrade required software versions](#Check%20and%20upgrade%20required%20software%20versions)
    
    3. [Create an Amazon OpenSearch Serverless collection](#Create%20an%20Amazon%20OpenSearch%20Serverless%20collection)
    
    4. [Enable model access in Amazon Bedrock](#Enable%20model%20access%20in%20Amazon%20Bedrock)
    
    5. [Check and configure security permissions](#Check%20and%20configure%20security%20permissions)

    6. [Organize imports](#Organize%20imports)
    
    7. [Create common objects](#Create%20common%20objects)
    
    8. [Create an index in the Amazon OpenSearch Serverless collection](#Create%20index%20in%20collection)
    
 2. [Build the chat assistant](#Build%20the%20chat%20assistant)

    1. [Architecture](#Architecture)
    
    2. [Step 0a: Prepare to load data into the vector database](#Step0a)
    
    3. [Step 0b and 0c: Create the embeddings](#Step0band0c)
    
    4. [Step 0d: Store the embeddings in the vector database](#Step0d)
    
    5. [Step 1 to 6: Build the chat steps](#Step1to6)
    
 3. [Chat with the assistant](#Chat%20with%20the%20assistant)
 
 4. [Cleanup](#Cleanup)
 
 5. [Conclusion](#Conclusion)

##  1. Complete prerequisites <a id ='Complete%20prerequisites'> </a>

Check and complete the prerequisites.

###  A. Check and configure access to the Internet <a id ='Check%20and%20configure%20access%20to%20the%20Internet'> </a>
This notebook requires outbound access to the Internet to download the required software updates and to download the dataset.  You can either provide direct Internet access (default) or provide Internet access through an [Amazon VPC](https://aws.amazon.com/vpc/).  For more information on this, refer [here](https://docs.aws.amazon.com/sagemaker/latest/dg/appendix-notebook-and-internet-access.html).

<div class="alert alert-block alert-info">
<b>Note:</b> During the AIM329 session, by default, outbound Internet access will be enabled for this notebook.
</div>

### B. Check and upgrade required software versions <a id ='Check%20and%20upgrade%20required%20software%20versions'> </a>
This notebook requires the following libraries:
* [SageMaker Python SDK version 2.x](https://sagemaker.readthedocs.io/en/stable/v2.html)
* [Python 3.10.x](https://www.python.org/downloads/release/python-3100/)
* [Boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html)
* [LangChain](https://www.langchain.com/)
* [Unstructured](https://pypi.org/project/unstructured/)
* [OpenSearch Python Client](https://pypi.org/project/opensearch-py/)
* [AWS v4 authentication for the Python Requests library](https://pypi.org/project/requests-aws4auth/)

Run the following cell to install the required libraries.

<div class="alert alert-block alert-warning">  
<b>Note:</b> At the end of the installation, the Kernel will be forcefully restarted immediately.
</div>

In [None]:
!pip install boto3==1.28.72
!pip install langchain==0.0.324
!pip install unstructured==0.10.27
!pip install opensearch-py==2.3.2
!pip install requests-aws4auth==1.2.3

import IPython

IPython.Application.instance().kernel.do_shutdown(True)

Print the versions of the installed libraries.

In [None]:
import boto3
import langchain
import opensearchpy
import sagemaker
import sys

print("Python version : {}".format(sys.version))
print("Boto3 version : {}".format(boto3.__version__))
print("SageMaker Python SDK version : {}".format(sagemaker.__version__))
print("LangChain version : {}".format(langchain.__version__))
print("OpenSearch Python Client version : {}".format(opensearchpy.__version__))

###  C. Create an Amazon OpenSearch Serverless collection <a id ='Create%20an%20Amazon%20OpenSearch%20Serverless%20collection'> </a>

This notebook uses an [Amazon OpenSearch Serverless collection](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-collections.html) as the vector database that will be used by the chat assistant.

<div class="alert alert-block alert-info">
<b>Note:</b> During the AIM329 session, by default, a collection will be pre-created and ready to use.
</div>

###  D. Enable model access in Amazon Bedrock <a id ='Enable%20model%20access%20in%20Amazon%20Bedrock'> </a>

Before invoking any model in Amazon Bedrock, enable access to that model by following the instructions [here](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html). Otherwise, you will get an authorization error.

<div class="alert alert-block alert-warning">  
<b>Note:</b> You will have to do this manually after reading the End User License Agreement (EULA) for each of the models that you want to enable. Unless you disable it, this is a one-time setup for each model in an AWS account.
</div>

###  E. Check and configure security permissions <a id ='Check%20and%20configure%20security%20permissions'> </a>
This notebook uses the IAM role attached to the underlying notebook instance.  To view the name of this role, run the following cell.

This IAM role should have the following permissions,

1. Full access to invoke Large Language Models (LLMs) on Amazon Bedrock.
2. Full access to read and write to the Amazon OpenSearch Serverless collection created in the previous step.
3. Access to write to Amazon CloudWatch Logs.

In addition, [data access control](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-data-access.html) should be setup on the Amazon OpenSearch Serverless Collection to provide create, read and write access to the IAM role associated with this Notebook instance.

<div class="alert alert-block alert-info">
<b>Note:</b>  During the AIM329 session, by default, all these permissions will be setup.
</div>

In [None]:
print(sagemaker.get_execution_role())

###  F. Organize imports <a id ='Organize%20imports'> </a>

Organize all the library and module imports for later use.

In [None]:
import os
import json
import boto3
import time
import datetime
import logging
import pandas as pd
import pprint
import requests
import numpy as np
from glob import glob
from tqdm import tqdm
from typing import Dict, List
from tqdm.contrib.concurrent import process_map
from langchain.chains import RetrievalQA, ConversationalRetrievalChain
from langchain.embeddings import BedrockEmbeddings
from langchain.prompts.prompt import PromptTemplate
from langchain.llms import Bedrock
from langchain.memory import ChatMessageHistory, ConversationBufferMemory
from langchain.chains import ConversationChain, LLMChain, RetrievalQA
from langchain.agents import load_tools, initialize_agent, AgentType
from langchain.chains.router import MultiPromptChain
from langchain.chains.router.llm_router import LLMRouterChain, RouterOutputParser
from langchain.chains.router.multi_prompt_prompt import MULTI_PROMPT_ROUTER_TEMPLATE
from langchain.retrievers import AmazonKendraRetriever
from langchain.vectorstores import OpenSearchVectorSearch
from IPython.core.display import HTML
from opensearchpy import OpenSearch, RequestsHttpConnection
from requests_aws4auth import AWS4Auth

# Import the helper functions from the 'scripts' folder
sys.path.append(os.path.join(os.getcwd(), "scripts"))
#print("Updated sys.path: {}".format(sys.path))
from helper_functions import *

###  G. Create common objects <a id='Create%20common%20objects'></a>

To begin with, get the current AWS Region (where this notebook is running) and the SageMaker Session. This will be used to initiate some of the clients to AWS services using the boto3 APIs.

In [None]:
# Get the AWS Region and SageMaker Session references
my_session = boto3.session.Session()
print("SageMaker Session: {}".format(my_session))
my_region = my_session.region_name
print("AWS Region: {}".format(my_region))

Pick a LLM and the Embeddings model within Amazon Bedrock that you will be using in this notebook. Then, get their model-ids. In order to do this, list all the available models in Amazon Bedrock by running the following cell.

In [None]:
# List all the available models in Amazon Bedrock
bedrock_client = boto3.client("bedrock", region_name = my_region)
response = bedrock_client.list_foundation_models()
model_summaries = response["modelSummaries"]
for model_summary in model_summaries:
    print("Model provider: {}; Model name: {}; Model-id: {}".format(model_summary["providerName"],
                                                                    model_summary["modelName"], model_summary["modelId"]))

Now, create common objects to be used in future steps in this notebook.

In [None]:
##### Specify the model-ids
# Model-id of the LLM to be used in the chat assistant
llm_model_id = "anthropic.claude-instant-v1"
# Model-id of the Embeddings model to be used in the chat assistant
embeddings_model_id = "amazon.titan-embed-text-v1"

##### LLM related objects
# Create the Amazon Bedrock runtime client
bedrock_rt_client = boto3.client("bedrock-runtime", region_name = my_region)
# Create the LangChain client for the LLM using the Bedrock client created above.
llm = Bedrock(
    model_id = llm_model_id,
    client = bedrock_rt_client
)

##### Embeddings related objects
# Use the LangChain BedrockEmbeddings class to create the Embeddings client.
br_embeddings = BedrockEmbeddings(client = bedrock_rt_client, model_id = embeddings_model_id, region_name = my_region)

##### Amazon OpenSearch Serverless (AOSS) related objects
# Create the AOSS client
aoss_client = boto3.client("opensearchserverless")
# Get the reference to the first Amazon OpenSearch Serverless (AOSS) collection within the same AWS Region
response = aoss_client.list_collections()['collectionSummaries'][0]
collection_id = response['id']
collection_name = response['name']
print("The following Amazon OpenSearch Serverless collection will be used:\nCollection id: {}; Collection name: {}"
      .format(collection_id, collection_name))
# Print the AWS console URL to the AOSS collection
collection_aws_console_url = "https://{}.console.aws.amazon.com/aos/home?region={}#opensearch/collections/{}".format(my_region,
                                                                                                                     my_region,
                                                                                                                     collection_name)
print("If you like to take a look at this collection, visit {}".format(collection_aws_console_url))
# Create the AOSS Python client from the AOSS boto3 client
aoss_py_client = auth_opensearch(host = "{}.{}.aoss.amazonaws.com".format(collection_id, my_region),
                            service = 'aoss', region = my_region)
# Specify the name of the index in the AOSS collection; this will be created later in the notebook
index_name = "aim-329-docs-index"
# Specify the max workers for loading data in parallel into the index
max_workers = 8
# To access an Opensearch Collection using LangChain, we can use the OpenSearchVectorSearch class.
doc_search = OpenSearchVectorSearch(
    opensearch_url = "{}.{}.aoss.amazonaws.com".format(collection_id, my_region),
    index_name = index_name,
    embedding_function = br_embeddings)
# Set the doc search client to the AOSS Python client
doc_search.client = aoss_py_client

##### File related objects
# Specify the path to the directory that will contain the RAG data
rag_dir = os.path.join(os.getcwd(), "data/rag")
# Create the directory if it doesn't exist
os.makedirs(rag_dir, exist_ok = True)

###  H. Create an index in the Amazon OpenSearch Serverless collection <a id='Create%20index%20in%20collection'></a>

To create an index in the Amazon OpenSearch Serverless (AOSS) collection, we first need to define a schema for our index. AOSS allows users to specify a simple search index, which utilizes keyword matching, or the vector search feature, which utilizes [k-Nearest Neighbor (k-NN) search](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/knn.html). Vector search differs from standard search in that instead of using a typical keyword matching or fuzzy matching algorithm, vector search compares [embeddings](https://en.wikipedia.org/wiki/Word_embedding) of two pieces of text. An embedding is a numerical representation of a piece of information, like text, that we can compare against other embeddings. To learn more about embeddings, take a look at [this blog](https://huggingface.co/blog/getting-started-with-embeddings). The vector search feature allows us to search for documents that are semantically similar to the questions that our end users send to our chat assistant. This can improve the context that we then give to our LLM to answer the user's questions.

In [None]:
# Define the schema for the index with an k-NN type vector as the embedding
knn_index = {
    "settings": {
        "index.knn": True,
    },
    "mappings": {
        "properties": {
           "content-embedding": { 
                "type": "knn_vector",
                "dimension": 1536 # can have dimension up to 10k
            },
            "content": {
                "type": "text"
            },
            "page-name": {
                "type": "text"
            },
            "page-link": {
                "type": "text"
            }
        }
    }
}

# Delete the index
#aoss_py_client.indices.delete(index = index_name)

# Create the index if it does not exist
if aoss_py_client.indices.exists(index = index_name):
    print("AOSS index '{}' already exists.".format(index_name))
else:
    print("Creating AOSS index '{}'...".format(index_name))
    pprint.pprint(aoss_py_client.indices.create(index = index_name, body = knn_index, ignore = 400))

In [None]:
# Print the AWS console URL to the AOSS index
#index_aws_console_url = collection_aws_console_url + "?tabId=collectionIndices"
index_aws_console_url = collection_aws_console_url + "/" + index_name
print("If you like to take a look at this index, visit {}".format(index_aws_console_url))

## 2. Build the chat assistant <a id ='Build%20the%20chat%20assistant'> </a>

Large language models (LLMs) have a tendency to [hallucinate](https://en.wikipedia.org/wiki/Hallucination_(artificial_intelligence)). Hallucination in a LLM context is our model providing a confident but factually incorrect response that often tells us what the model thinks we want to hear, regardless of if it actually is the correct answer. One way to prevent LLMs from giving us incorrect information is by using a [Retrieval Augment Generation (RAG)](https://docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-foundation-models-customize-rag.html) mechanism.

RAG allows us to provide our model with correct context information that it can use to ground its response in facts, instead of it trying to remember facts from its training data. To setup RAG, we need to have a document database that we can utilize to provide our model with related source documents. There are many ways to setup a document database. In this lab, we will use [Amazon Opensearch Serverless](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-overview.html).

###  A. Architecture <a id='Architecture'></a>

![Architecture](./images/architecture.png)





###  B. Step 0a: Prepare to load data into the vector database <a id='Step0a'></a>

An Amazon OpenSearch Serverless (AOSS) Collection is a logical grouping of one or more indexes. Run the following cell to populate some documentation on Amazon Bedrock so our chat assistant can answer questions about Amazon Bedrock with factually correct information.

The following cell will download the data from the provided links to the Amazon Bedrock documentation and store them in a local folder named `data/aws_gen_ai/`. The directory will be created if it doesn't exist.

In [None]:
# A simple list of Amazon Bedrock documentation to index
link_list = [
    "https://docs.aws.amazon.com/bedrock/latest/userguide/what-is-service.html",
    "https://docs.aws.amazon.com/bedrock/latest/userguide/setting-up.html",
    "https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html",
    "https://docs.aws.amazon.com/bedrock/latest/userguide/text-playground.html",
    "https://docs.aws.amazon.com/bedrock/latest/userguide/image-playground.html",
    "https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters.html",
    "https://docs.aws.amazon.com/bedrock/latest/userguide/embeddings.html",
    "https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization-prepare.html",
    "https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization-guidelines.html",
    "https://docs.aws.amazon.com/bedrock/latest/userguide/prov-throughput.html",
    "https://docs.aws.amazon.com/bedrock/latest/userguide/agents.html",
    "https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base.html",
    "https://docs.aws.amazon.com/AmazonS3/latest/userguide/creating-bucket.html",
    "https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance.html",
    "https://docs.aws.amazon.com/lambda/latest/dg/welcome.html",
    "https://docs.aws.amazon.com/lambda/latest/dg/getting-started.html",
    "https://docs.aws.amazon.com/lambda/latest/dg/lambda-python.html",
    "https://docs.aws.amazon.com/vpc/latest/userguide/what-is-amazon-vpc.html",
    "https://docs.aws.amazon.com/vpc/latest/userguide/vpc-getting-started.html",
    "https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Introduction.html",
    "https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SQLtoNoSQL.CreateTable.html",
]

# Download and store each HTML file; overwrite existing file
for link in link_list:
    link_html = requests.get(link).content.decode('utf-8')
    title = link.split("/")[-1]
    with open('{}/{}'.format(rag_dir, title), "w") as f:
        f.write(link_html)

print("Downloaded {} HTML files to '{}'.".format(len(link_list), rag_dir))
        
# Display the last downloaded HTML file
#HTML(link_html)

When we are indexing documents for information retrieval, providing an entire document to a LLM as context can be overwhelming to our LLM, especially for very long documents. A best practice is to divide the document into easier to consume partially overlapping chunks. Dividing the document in this way also tends to improve search result relevance as often the answer we are looking for is contained within a specific passage of a document and providing the entire document is unnecessary. 

Let's use the LangChain's [RecursiveCharacterTextSplitter](https://python.langchain.com/docs/modules/data_connection/document_transformers/text_splitters/recursive_text_splitter) to create a text splitting object that we will use in our loading pipeline.

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 2000,
    chunk_overlap  = 100,
    length_function = len,
    add_start_index = True,
)

Now we have downloaded all of the documents, let's go ahead and load them using a HTML loader. We are going to use LangChain's [Unstructured HTML Loader](https://python.langchain.com/docs/modules/data_connection/document_loaders/html). This will parse the raw HTML documents we have from the Bedrock user guide and put it in a format that we can then give to our LLMs. Finally we will split each document as per the splitter configuration defined above.

In [None]:
from langchain.document_loaders import UnstructuredHTMLLoader

# Load HTML using LangChain's UnstructuredHTMLLoader
doc_data_list = []
doc_link_list = []
# Process every HTML file
for link in link_list:
    title = link.split("/")[-1]
    new_link = '{}/{}'.format(rag_dir, title)
    
    # Load the file content
    loader = UnstructuredHTMLLoader(new_link)
    data = loader.load()
    
    # Remove irrelevant text
    html_doc = data[0].page_content.replace("""Did this page help you? - Yes

Thanks for letting us know we're doing a good job!

If you've got a moment, please tell us what we did right so we can do more of it.

Did this page help you? - No

Thanks for letting us know this page needs work. We're sorry we let you down.

If you've got a moment, please tell us how we can make the documentation better.""", "").replace("""Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.""", "")
    
    # Split our document into chunks
    texts = text_splitter.create_documents([html_doc])
    
    # Create a list of document chunks as well as a list of links
    for text in texts:
        doc_data_list.append(text.page_content)
        doc_link_list.append(link)

print("Created {} chunks from {} HTML files.".format(len(doc_data_list), len(link_list)))        

# Print the first chunk
#print(doc_data_list[0])

###  C. Step 0b and 0c: Create the embeddings <a id='Step0band0c'></a>

Now that we have the document data ready, let us vectorize it to create the embeddings by running the below cell which does the following,

For each downloaded Amazon Bedrock documentation.
1. Creates dictionary objects for the content of the document along with the name and link.
2. Generates embeddings for the entire article and stores them in a dictionary object.

<div class="alert alert-block alert-warning">  
<b>Note:</b> The below cell can take up to 3 minutes to complete.
</div>

In [None]:
# Loop through the chunked documents and create embeddings
doc_list = []
for i, doc_data in tqdm(enumerate(doc_data_list)):
    # Get the data formatted for indexing; time.sleep in place to avoid throttling
    doc_dict = {}
    doc_dict['content'] = doc_data
    doc_dict['page-name'] = doc_link_list[i].split('/')[-1].replace('.html','')
    doc_dict['page-link'] = doc_link_list[i] 
    # Create the embedding for the document chunk
    embedding = br_embeddings.embed_query(text = doc_data)
    doc_dict['content-embedding'] = embedding
    # Store all the data in a list
    doc_list.append(doc_dict)
    # Sleep for 2 seconds
    time.sleep(2)

###  D. Step 0d: Store the embeddings in the vector database <a id='Step0d'></a>

We have now created embeddings for all of our documents, so the next step is to actually upload them to our created AOSS index. The below function uses a parallel processing function to upload our documents into our index. The number of parallel worker threads is controlled by the `max_workers` variable.

<div class="alert alert-block alert-info">
<b>Note:</b> After executing the below cell, it may take up to 30 seconds for the data to be available for reading.
</div>

In [None]:
# Populate the AOSS index. OpenSearch has a flexible schema, so you can add fields you have not previously defined,
# with the exception of vectors and fields with specific data types like dates.
def os_import(article):
    """
    This function imports the documents and their metadata into the AOSS index.
    """
    aoss_py_client.index(index = index_name,
                         body={"content-embedding": article['content-embedding'],
                               "content": article['content'],
                               "page-name": article['page-name'],
                               "page-link": article['page-link'],
                              }
                        )
    
# Parallelize and populate the AOSS Collection's index
process_map(os_import, doc_list, max_workers = max_workers) 

###  E. Step 1 to 6: Build the chat steps <a id='Step1to6'></a>

In order to demonstrate the usefulness of the [Retrieval Augment Generation (RAG)](https://docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-foundation-models-customize-rag.html) architecture, let us directly call the LLM without any searches on the vector database. Here, we will ask a question about Amazon Bedrock Agents which we know the LLM will not be aware of because Bedrock Agents was not available when the LLM was trained.

In [None]:
output = llm.generate(["How do I setup Agents in Amazon Bedrock?"])
print(output.generations[0][0].text)

While this response may seem plausible, it is actually incorrect. In general, LLMs will try to answer your question but in this case it has hallucinated. This is an example of how the LLM was not able to provide the correct answer. Here is where a RAG architecture will provide the remedy.

Let us see if we can find a document that contains information about Amazon Bedrock Agents in our document index. We can do this in two ways:

Method 1: Using the AOSS Python client's `search` directly. 

In [None]:
query = "What is Amazon Bedrock?"
temp_embedding = br_embeddings.embed_query(text = query)
search_query = {"query": {"knn": {"content-embedding": {"vector": temp_embedding, "k": 4}}}}
results = aoss_py_client.search(index = index_name, body = search_query)
hits = results["hits"]["hits"]
print("Found {} hit(s).".format(len(hits)))
for hit in hits:
    print(hit["_source"]["page-link"])

Method 2: Using LangChain's `similarity_search` which will use the AOSS Python client under the covers. 

In [None]:
max_results = 5
query = "How do I setup Agents in Amazon Bedrock?"
docs = doc_search.similarity_search(
    # Our text query
    query = query,
    # The name of the field that contains our vector
    vector_field = "content-embedding",
    # The actual text field we are looking for
    text_field = "content",
    # The number of results we want to return
    k = max_results
)
print("Specified {} max results. Found {} hit(s).".format(max_results, len(docs)))
for doc in docs:
    print(docs[0].metadata['page-link'])

It looks like we do have some information on Amazon Bedrock Agents in the Bedrock user guide that we vectorized and stored in the AOSS index. Now that we know we have the right information in our document index, let us setup a [RetrievalQA chain](https://python.langchain.com/docs/use_cases/question_answering/vector_db_qa). This chain allows us to supply a [prompt template](https://python.langchain.com/docs/modules/model_io/prompts/prompt_templates/), our LLM, and our document index to form a question answering chain that will answer questions based on the returned context document. 

Prompt templates are pre-defined recipes for generating prompts for language models. In the one we create below, we specify context and question input variables, which our RetrievalQA chain will fill in with the query and source documents.

In [None]:
# Create the prompt template
prompt_template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Don't include harmful content.

<context>
{context}
</context>

Question: {question}
Answer:"""

PROMPT = PromptTemplate(
    template = prompt_template, input_variables = ["context", "question"]
)

# Create the Retrieval QA chain
qa = RetrievalQA.from_chain_type(llm = llm, 
                                 chain_type = "stuff", 
                                 retriever = doc_search.as_retriever(search_kwargs = {
                                     "vector_field": "content-embedding",
                                     "text_field": "content",
                                     "k": 5}),
                                 return_source_documents = True,
                                 chain_type_kwargs = {"prompt": PROMPT, "verbose": True},
                                 verbose = True)

# Ask the question to the LLM and print the response along with the references from the source
question = "How do I setup Agents in Amazon Bedrock?"
response = qa(question, return_only_outputs = True)
print("Answer:\n{}".format(response["result"]))
source_metadata = response["source_documents"][0].metadata
print("\n\nSource page name: {}".format(source_metadata["page-name"]))
print("Source page link: {}".format(source_metadata["page-link"]))

Let's now take it one step further with a [ConversationalRetrievalChain](https://python.langchain.com/docs/expression_language/cookbook/retrieval#conversational-retrieval-chain). 

LLMs on their own will not remember the last input you provided them. So we need a mechanism to remember and supply our previous conversation information back to our LLM. We can do this using a LangChain ConversationChain, paired with a [ConversationBufferMemory](https://python.langchain.com/docs/modules/memory/adding_memory) class. This will allow us to hold a conversation with our LLM and retain the previous conversation in memory.

In [None]:
conversation = ConversationChain(
    llm = llm, verbose = True, memory = ConversationBufferMemory()
)

The following cell adds a conversational element to the retrieval chain and allows us to add chat memory to the retrieval. This chain uses a LLM call prior to the document retrieval that condenses conversation history and the current question into a single new question to improve document retrieval.

In [None]:
# Create the Conversational Retrieval chain
cqa = ConversationalRetrievalChain.from_llm(llm = llm, 
                                            chain_type = "stuff", 
                                            condense_question_llm = llm,
                                            retriever = doc_search.as_retriever(search_kwargs = {
                                                "vector_field": "content-embedding",
                                                "text_field": "content",
                                                "k": 5}),
                                            return_source_documents = True,
                                            memory = ConversationBufferMemory(input_key = "question", output_key = "answer"),
                                            # chain_type_kwargs={"prompt": PROMPT, "verbose": True},
                                            verbose = False)

# Ask the question to the LLM and print the response along with the references from the source
question = "How do I setup Agents in Amazon Bedrock?"
response = cqa.invoke({"question": question, "chat_history": ""})
print("Answer:\n{}".format(response["answer"]))
source_metadata = response["source_documents"][0].metadata
print("\n\nSource page name: {}".format(source_metadata["page-name"]))
print("Source page link: {}".format(source_metadata["page-link"]))

## 3. Chat with the assistant <a id='Chat%20with%20the%20assistant'></a>

Now we are going to put it all together in a single browser interface. The below call will initiate a simple conversational UI inside of our Jupyter Notebook. Run it and start asking questions!

In [None]:
chatux = ChatUX(cqa)
chatux.start_chat()

## 4. Cleanup <a id='Cleanup'></a>

As a best practice, you should delete AWS resources that are no longer required.  This will help you avoid incurring unncessary costs.

**Note:** During the AIM329 session, by default, all resources will be cleaned up at the end of the session.


## 5. Conclusion <a id='Conclusion'></a>

We have now seen how to build chat assistant using a Large Language Model (LLM) hosted on Amazon Bedrock. In the process, we also demonstrated how a Retrieval Augment Generation (RAG) mechanism can help prevent hallucination. While using RAG, we showed you how to use Amazon Titan Embeddings to convert raw text to vectors and how to store them in an Amazon OpenSearch Serverless collection.