# RAG with Amazon Bedrock Knowledge Base

In this notebook we use the information ingested in the Bedrock knowledge base to answer user queries.

## Import packages and utility functions
Import packages, setup utility functions, interface with Amazon OpenSearch Service Serverless (AOSS).

In [89]:
import os
import sys
import json
import boto3
from typing import Dict
from urllib.request import urlretrieve
# from langchain_aws import ChatBedrock
from langchain.llms import Bedrock
from langchain_core.prompts import PromptTemplate
from langchain_core.messages import HumanMessage, SystemMessage
from IPython.display import Markdown, display
from langchain.embeddings import BedrockEmbeddings
from opensearchpy import OpenSearch, RequestsHttpConnection
from opensearchpy import OpenSearch, RequestsHttpConnection, AWSV4SignerAuth


In [2]:
%pip install opensearch-py


Note: you may need to restart the kernel to use updated packages.


In [44]:
# global constants
SERVICE = 'aoss'

# do not change the name of the CFN stack, we assume that the 
# blog post creates a stack by this name and read output values
# from the stack.
CFN_STACK_NAME = "rag-w-bedrock-kb"


In [46]:
# utility functions

def get_cfn_outputs(stackname: str) -> str:
    cfn = boto3.client('cloudformation')
    outputs = {}
    for output in cfn.describe_stacks(StackName=stackname)['Stacks'][0]['Outputs']:
        outputs[output['OutputKey']] = output['OutputValue']
    return outputs

def printmd(string: str):
    display(Markdown(string))


In [47]:
# Functions to talk to OpenSearch

# Define queries for OpenSearch
def query_docs(query: str, embeddings: BedrockEmbeddings, aoss_client: OpenSearch, index: str, k: int = 3) -> Dict:
    """
    Convert the query into embedding and then find similar documents from AOSS
    """

    # embedding
    query_embedding = embeddings.embed_query(query)

    # query to lookup OpenSearch kNN vector. Can add any metadata fields based filtering
    # here as part of this query.
    query_qna = {
        "size": k,
        "query": {
            "knn": {
            "vector": {
                "vector": query_embedding,
                "k": k
                }
            }
        }
    }

    # OpenSearch API call
    relevant_documents = aoss_client.search(
        body = query_qna,
        index = index
    )
    return relevant_documents


In [48]:
def create_context_for_query(q: str, embeddings: BedrockEmbeddings, aoss_client: OpenSearch, vector_index: str) -> str:
    """
    Create a context out of the similar docs retrieved from the vector database
    by concatenating the text from the similar documents.
    """
    print(f"query -> {q}")
    aoss_response = query_docs(q, embeddings, aoss_client, vector_index)
    context = ""
    for r in aoss_response['hits']['hits']:
        s = r['_source']
        print(f"{s['metadata']}\n{s['text']}")
        context += f"{s['text']}\n"
        print("----------------")
    return context


## Retrieve parameters needed from the AWS CloudFormation stack

In [49]:

outputs = get_cfn_outputs(CFN_STACK_NAME)

region = outputs["Region"]
aoss_collection_arn = outputs['CollectionARN']
aoss_host = f"{os.path.basename(aoss_collection_arn)}.{region}.aoss.amazonaws.com"
aoss_vector_index = outputs['AOSSVectorIndexName']
print(f"aoss_collection_arn={aoss_collection_arn}\naoss_host={aoss_host}\naoss_vector_index={aoss_vector_index}\naws_region={region}")


aoss_collection_arn=arn:aws:aoss:us-east-1:992382836107:collection/okmgwzpu5dkgddbqiai4
aoss_host=okmgwzpu5dkgddbqiai4.us-east-1.aoss.amazonaws.com
aoss_vector_index=sagemaker-readthedocs-io
aws_region=us-east-1


## Setup Embeddings and Text Generation model

We can use LangChain to setup the embeddings and text generation models provided via Amazon Bedrock.

In [80]:
# create a boto3 bedrock client
bedrock_client = boto3.client('bedrock-runtime')

# we will use Anthropic Claude for text generation
claude_llm = ChatBedrock(model_id="anthropic.claude-3-haiku-20240307-v1:0",
                         model_kwargs=dict(temperature=0.5, top_k=250, top_p=1, stop_sequences=[]))

# we will be using the Titan Embeddings Model to generate our Embeddings.
embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-g1-text-02", client=bedrock_client)


## Interface with Amazon OpenSearch Service Serverless
We use the open-source [opensearch-py](https://pypi.org/project/opensearch-py/) package to talk to AOSS.

In [81]:
credentials = boto3.Session().get_credentials()
auth = AWSV4SignerAuth(credentials, region, SERVICE)

client = OpenSearch(
    hosts = [{'host': aoss_host, 'port': 443}],
    http_auth = auth,
    use_ssl = True,
    verify_certs = True,
    connection_class = RequestsHttpConnection,
    pool_maxsize = 20
)


## Use Retrieval Augumented Generation (RAG) for answering queries

Now that we have setup the LLMs through Bedrock and vector database through AOSS, we are ready to answer queries using RAG. The workflow is as follows:

1. Convert the user query into embeddings.

1. Use the embeddings to find similar documents from the vector database.

1. Create a prompt using the user query and similar documents (retrieved from the vector db) to create a prompt.

1. Provide the prompt to the LLM to create an answer to the user query.

## Query 1

Let us first ask the our question to the model without providing any context, see the result and then ask the same question with context provided using document retrieved from AOSS and see if the answer improves!

In [90]:
from langchain_core.messages import HumanMessage, AIMessage
from langchain_core.prompts import ChatPromptTemplate

In [91]:
# Anthropic models need the Human/Assistant terminology used in the prompts, 
# they work better with XML style tags.
prompt_template = ChatPromptTemplate.from_messages([
        ('system', """Answer the question based only on the information provided in few sentences.
                            <context>
                            {context}
                            </context>
                            Include your answer in the <answer></answer> tags. Do not include any preamble in your answer."""),
        ('human', "{question}")

])

In [92]:
# 1. Start with the query
question = "What versions of XGBoost are supported by Amazon SageMaker?"

# 2. Now create a prompt by combining the query and the context (which is empty at this time)
context = ""
prompt = prompt_template.format_messages(context=context, question=question)


In [94]:

# 3. Provide the prompt to the LLM to generate an answer to the query without any additional context provided
response = claude_llm.invoke(prompt).content
printmd(f"<span style='color:red'><b>question={q.strip()}<br>answer={response.strip()}</b></span>\n")


<span style='color:red'><b>question=What versions of XGBoost are supported by Amazon SageMaker?<br>answer=<answer>Amazon SageMaker supports XGBoost versions 0.90 and 1.0.</answer></b></span>


**The answer provided above is incorrect**, as can be seen from the [SageMaker XGBoost Algorithm page](https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html). The supported version numbers are "1.0, 1.2, 1.3, 1.5, and 1.7".

Now, let us see if we can improve upon this answer by using additional information that is available to use in the vector database. **Also notice in the response below that the source of the documents that are being used as context is also being called out (the name of the file in the S3 bucket), this helps create confidence in the response generated by the LLM**.

In [102]:
# 1. Start with the query
question = "What versions of XGBoost are supported by Amazon SageMaker?"

# 2. Create the context by finding similar documents from the knowledge base
context = create_context_for_query(q, embeddings, client, aoss_vector_index)

# 3. Now create a prompt by combining the query and the context
prompt = prompt_template.format_messages(context=context, question=question)

# 4. Provide the prompt to the LLM to generate an answer to the query based on context provided
response = claude_llm(prompt).content

printmd(f"<span style='color:red'><b>question={q.strip()}<br>answer={response.strip()}</b></span>\n")


query -> What versions of XGBoost are supported by Amazon SageMaker?
{"source":"s3://sagemaker-kb-992382836107/sagemaker.readthedocs.io_en_stable_algorithms_tabular_xgboost.html"}
sagemaker                                                                                                                     stable                                                                                                               Filters:                Example               Dev Guide               SDK Guide                                                                                                                                                                       	Using the SageMaker Python SDK 	Use Version 2.x of the SageMaker Python SDK    	APIs    	Frameworks    	Built-in Algorithms	Amazon Estimators 	Tabular	AutoGluon 	CatBoost 	Factorization Machines 	K-Nearest Neighbors 	LightGBM 	LinearLearner 	TabTransformer 	XGBoost     	Text 	Time-series 	Unsupervised 	Vision        	Workflows  

<span style='color:red'><b>question=What versions of XGBoost are supported by Amazon SageMaker?<br>answer=<answer>For information about the XGBoost versions that are supported by Amazon SageMaker, see the AWS documentation.</answer></b></span>


## Query 2

For the subsequent queries we use RAG directly.

In [110]:
# 1. Start with the query
question = "What are the different types of distributed training supported by SageMaker. Give a short summary of each."

# 2. Create the context by finding similar documents from the knowledge base
context = create_context_for_query(question, embeddings, client, aoss_vector_index)

# 3. Now create a prompt by combining the query and the context
prompt = prompt_template.format_messages(context=context, question=question)

# 4. Provide the prompt to the LLM to generate an answer to the query based on context provided
response = claude_llm(prompt).content
printmd(f"<span style='color:red'><b>question={q.strip()}<br>answer={response.strip()}</b></span>\n")


query -> What are the different types of distributed training supported by SageMaker. Give a short summary of each.
{"source":"s3://sagemaker-kb-992382836107/sagemaker.readthedocs.io_en_stable_api_training_distributed.html"}
Distributed Training APIs¶   SageMaker distributed training libraries offer both data parallel and model parallel training strategies. They combine software and hardware technologies to improve inter-GPU and inter-node communications. They extend SageMaker’s training capabilities with built-in options that require only small code changes to your training scripts.
----------------
{"source":"s3://sagemaker-kb-992382836107/sagemaker.readthedocs.io_en_stable_api_training_distributed.html"}
Store    	Amazon SageMaker Model Monitor    	Amazon SageMaker Processing    	Amazon SageMaker Model Building Pipeline                                                                                                                  sagemaker                                           

<span style='color:red'><b>question=What versions of XGBoost are supported by Amazon SageMaker?<br>answer=The different types of distributed training supported by SageMaker are:
<answer>
1. Data Parallel Training: SageMaker's Distributed Data Parallel Library allows you to scale your training by distributing the data across multiple GPUs or instances. It improves inter-GPU and inter-node communication to speed up training.
2. Model Parallel Training: SageMaker's Distributed Model Parallel Library allows you to scale your training by distributing the model across multiple GPUs or instances. It provides APIs to help you adapt your training script to use model parallelism.
</answer></b></span>


## Query 3

In [111]:
# 1. Start with the query
question = "What advantages does SageMaker debugger provide?"

# 2. Create the context by finding similar documents from the knowledge base
context = create_context_for_query(question, embeddings, client, aoss_vector_index)

# 3. Now create a prompt by combining the query and the context
prompt = prompt_template.format_messages(context=context, question=question)

# 4. Provide the prompt to the LLM to generate an answer to the query based on context provided
response = claude_llm(prompt).content

printmd(f"<span style='color:red'><b>question={q.strip()}<br>answer={response.strip()}</b></span>\n")


query -> What advantages does SageMaker debugger provide?
{"source":"s3://sagemaker-kb-992382836107/sagemaker.readthedocs.io_en_stable_amazon_sagemaker_debugger.html"}
SageMaker Debugger provides a set of built-in rules curated by data scientists and engineers at Amazon to identify common problems while training machine learning models. There is also support for using custom rule source codes for evaluation. In the following sections, you’ll learn how to use both the built-in and custom rules while training your model.    Relationship between debugger hook and rules¶   Using SageMaker Debugger is, broadly, a two-pronged approach. On one hand you have the production of debugging data, which is done through the Debugger Hook, and on the other hand you have the consumption of this data, which can be with rules (for continuous analyses) or by using the SageMaker Debugger SDK (for interactive analyses).   The production and consumption of data are defined independently. For example, you cou

<span style='color:red'><b>question=What versions of XGBoost are supported by Amazon SageMaker?<br>answer=According to the context, SageMaker Debugger provides the following advantages:
<answer>
- It provides a set of built-in rules curated by data scientists and engineers at Amazon to identify common problems while training machine learning models.
- It supports using custom rule source codes for evaluation.
- It allows you to configure the debugging hook to produce and store the debugging data that you care about, and employ rules that operate on that particular data, ensuring that the Debugger is utilized to its maximum potential in detecting anomalies.
</answer></b></span>
