## Building Q&A application using Knowledge Bases for Amazon Bedrock Retrieve API and foundation model

This notebook tests claude, claud3 sonnet and claude haiku against same knowledge base, prompts and questions. Here it uses AWS IOT FAQ as an example. 

In [None]:
!pip install langchain --quiet 
!pip install boto3 --quiet 
!pip install botocore --quiet 
!pip install Pillow --quiet 
%pip install anthropic IPython 

In [179]:
import boto3
import json
import time
import base64
from PIL import Image
import io
from IPython.display import Image
from IPython.display import display
import ipywidgets as widgets
from skimage import io
from botocore.client import Config
import pprint
import numpy as np
import matplotlib.pyplot as plt
 

In [180]:
##### Interact with a large language model (LLM) to generate text 
# based on a prompt.
#
# Arguments:
#   prompt: The text prompt to provide to the LLM.
#   llm_type: The name of the LLM to use'. 
#
# Returns:
#   The text generated by the LLM in response to the prompt.
#   
# This function:
# 1. Prints the llm_type for debugging.
# 2. Formats the prompt into the JSON payload expected by each LLM API.
# 3. Specifies the parameters for text generation like max tokens, temp.
# 4. Calls the Bedrock client to invoke the LLM model API. 
# 5. Parses the response to extract the generated text.
# 6. Returns the generated text string.

def interactWithLLM(prompt,llm_type):
	
    if llm_type == 'anthropic.claude-3-sonnet':
        print("**THE LLM TYPE IS -->" + llm_type)
        body = json.dumps({
                          "anthropic_version": "bedrock-2023-05-31",
                          "max_tokens": 1000,
                          "messages": [
                            {
                              "role": "user",
                              "content": [
                                {
                                  "type": "text",
                                  "text": prompt
                                }
                              ]
                            }
                          ]
                        }) 
        modelId = 'anthropic.claude-3-sonnet-20240229-v1:0' # change this to use a different version from the model provider
        accept = 'application/json'
        contentType = 'application/json'
        start_time = time.time()
        response = bedrock_client.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)
         # Record the end time
        end_time = time.time()

        # Calculate the runtime
        runtime = end_time - start_time
        print(f"The runtime of the invoke_model was {runtime:.2f} seconds.")
        
        response_body = json.loads(response.get('body').read())
        response_text_sonnet = response_body.get('content')[0]['text']
        return response_text_sonnet
    elif llm_type == 'anthropic.claude-3-haiku':
        print("**THE LLM TYPE IS -->" + llm_type)
        body = json.dumps({
                          "anthropic_version": "bedrock-2023-05-31",
                          "max_tokens": 1000,
                          "messages": [
                            {
                              "role": "user",
                              "content": [
                                {
                                  "type": "text",
                                  "text": prompt
                                }
                              ]
                            }
                          ]
                        }) 
        modelId = 'anthropic.claude-3-haiku-20240307-v1:0' # change this to use a different version from the model provider
        accept = 'application/json'
        contentType = 'application/json'
        start_time = time.time()
        response = bedrock_client.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)
         # Record the end time
        end_time = time.time()

        # Calculate the runtime
        runtime = end_time - start_time
        print(f"The runtime of the invoke_model was {runtime:.2f} seconds.")
        
        response_body = json.loads(response.get('body').read())
        response_text_haiku = response_body.get('content')[0]['text']
        return response_text_haiku
    elif llm_type == "claude":
        print("**THE LLM TYPE IS -->" + llm_type)
        body = json.dumps(
            {
                "prompt": prompt,
                "max_tokens_to_sample": 300,
                "temperature": 1,
                "top_k": 250,
                "top_p": 0.999,
                "stop_sequences": [],
            }
        )
        modelId = "anthropic.claude-v2:1"  # change this to use a different version from the model provider
        accept = "application/json"
        contentType = "application/json"
        start_time = time.time()
        response = bedrock_client.invoke_model(
            body=body, modelId=modelId, accept=accept, contentType=contentType
        )
         # Record the end time
        end_time = time.time()

        # Calculate the runtime
        runtime = end_time - start_time
        print(f"The runtime of the invoke_model was {runtime:.2f} seconds.")
        response_body = json.loads(response.get("body").read())

        response_text_claude = response_body.get("completion")

        return response_text_claude

In [181]:
pp = pprint.PrettyPrinter(indent=2)

#initialize bedrock client for given region and endpoint. Change as per your region

bedrock_client = boto3.client(
    service_name='bedrock-runtime', 
    region_name='us-west-2'
)

bedrock_config = Config(connect_timeout=120, read_timeout=120, retries={'max_attempts': 0})
bedrock_agent_client = boto3.client("bedrock-agent-runtime",
                              config=bedrock_config)

#### retrieveAndGenerate function is an example of using retrieveAndGenerate API from knowledge base with the supported base models. While calling the function pass on the KBId and ARN of the model. Queries a knowledge base and generates responses based on the retrieved results. The response cites up to five sources but only selects the ones that are relevant to the query.

**Note** : The model needs to be supported for this in retrieveAndGenerate

In [182]:
def retrieveAndGenerate(input, kbId,modelArn):
    return bedrock_agent_client.retrieve_and_generate(
        input={
            'text': input
        },
        retrieveAndGenerateConfiguration={
            'type': 'KNOWLEDGE_BASE',
            'knowledgeBaseConfiguration': {
                'knowledgeBaseId': kbId,
                'modelArn': modelArn
                }
            }
        )


In [184]:
query = "How do you authenticate to AWS IoT?"
start_time = time.time()
response = retrieveAndGenerate(query, "ABC", 'arn:aws:bedrock:us-west-2::foundation-model/anthropic.claude-instant-v1')["output"]["text"]
end_time = time.time()

# Calculate the runtime
runtime = end_time - start_time
pp.pprint(response)
print("total time:")
print(runtime)

('AWS IoT Core requires all clients to authenticate using strong '
 'authentication methods like X.509 certificates, AWS IAM credentials, or '
 'third party authentication via AWS Cognito. Communication is encrypted. You '
 'can authenticate to AWS IoT Core using X.509 certificates generated by AWS '
 'IoT Core or signed by your own certificate authority. If using the AWS SDKs '
 'or CLI, SigV4 authentication is handled automatically. HTTPS requests can '
 'also be authenticated with X.509 certificates.')
total time:
5.03143572807312


#### retrieve Queries a knowledge base and retrieves information from it. You can use the output context from it and pass it on to the model of your choice. 


**Note** : The model access is needed in your account

In [185]:
def retrieve(query, kbId, numberOfResults=3):
    return bedrock_agent_client.retrieve(
        retrievalQuery= {
            'text': query
        },
        knowledgeBaseId=kbId,
        retrievalConfiguration= {
            'vectorSearchConfiguration': {
                'numberOfResults': numberOfResults
            }
        }
    )

In [186]:
start_time = time.time()
response_retrieve = retrieve(query, "ABC")["retrievalResults"]
end_time = time.time()

# Calculate the runtime
runtime = end_time - start_time
pp.pprint(response_retrieve)
print("total time:")
print(runtime)

[ { 'content': { 'text': 'If you are using the AWS SDKs or the    AWS CLI, the '
                         'SigV4 authentication is taken care of for you under '
                         'the hood. HTTPS requests can also be     '
                         'https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/  '
                         'http://docs.aws.amazon.com/iot/latest/developerguide/what-is-aws-iot.html  '
                         'https://aws.amazon.com/iot-core/getting-started/  '
                         'http://docs.aws.amazon.com/iot/latest/developerguide/what-is-aws-iot.html  '
                         'https://signin.aws.amazon.com/signin?redirect_uri=https%3A%2F%2Fconsole.aws.amazon.com%2Fiot%2Fhome%3Fstate%3DhashArgs%2523%26isauthcode%3Dtrue&client_id=arn%3Aaws%3Aiam%3A%3A015428540659%3Auser%2Ficebreaker&forceMobileApp=0  '
                         'https://aws.amazon.com/tools/  '
                         'https://aws.amazon.com/iot-platform/

In [187]:
from langchain.prompts import PromptTemplate

PROMPT_TEMPLATE = """
Human: You are an AWS IoT services expert, and provides answers to questions by using facts based on information when possible. 
Use the following pieces of information to provide a concise answer to the question enclosed in <question> tags. 
If you don't know the answer, just say that you don't know, don't try to make up an answer.
<context>
{context_str}
</context>

<question>
{query_str}
</question>

The response should be specific.

Assistant:"""
claude_prompt = PromptTemplate(template=PROMPT_TEMPLATE, 
                               input_variables=["context_str","query_str"])

In [188]:
# fetch context from the response
def get_contexts(retrievalResults):
    contexts = []
    for retrievedResult in retrievalResults: 
        contexts.append(retrievedResult['content']['text'])
    return contexts

In [189]:
contexts = get_contexts(response_retrieve)
pp.pprint(contexts)

[ 'If you are using the AWS SDKs or the    AWS CLI, the SigV4 authentication '
  'is taken care of for you under the hood. HTTPS requests can also be     '
  'https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/  '
  'http://docs.aws.amazon.com/iot/latest/developerguide/what-is-aws-iot.html  '
  'https://aws.amazon.com/iot-core/getting-started/  '
  'http://docs.aws.amazon.com/iot/latest/developerguide/what-is-aws-iot.html  '
  'https://signin.aws.amazon.com/signin?redirect_uri=https%3A%2F%2Fconsole.aws.amazon.com%2Fiot%2Fhome%3Fstate%3DhashArgs%2523%26isauthcode%3Dtrue&client_id=arn%3Aaws%3Aiam%3A%3A015428540659%3Auser%2Ficebreaker&forceMobileApp=0  '
  'https://aws.amazon.com/tools/  https://aws.amazon.com/iot-platform/sdk/  '
  'http://mqtt.org/      authenticated using X.509 certificates. MQTT messages '
  'to AWS IoT Core are authenticated using X.509    certificates.    With AWS '
  'IoT Core you can use AWS IoT Core generated certificates, as well as

In [190]:
import json
prompt = claude_prompt.format(context_str=contexts, 
                                 query_str=query)

In [191]:
llm_type = 'anthropic.claude-3-sonnet'

final_response = interactWithLLM(prompt,llm_type)
pp.pprint(final_response)

**THE LLM TYPE IS -->anthropic.claude-3-sonnet
The runtime of the invoke_model was 7.07 seconds.
('According to the provided context, there are a few ways to authenticate to '
 'AWS IoT Core:\n'
 '\n'
 '1. If you are using the AWS SDKs or the AWS CLI, the Signature Version 4 '
 '(SigV4) authentication is handled for you automatically under the hood.\n'
 '\n'
 '2. For HTTPS requests, you can authenticate using X.509 certificates.\n'
 '\n'
 '3. For MQTT messages to AWS IoT Core, authentication is done using X.509 '
 'certificates. You can use AWS IoT Core generated certificates or '
 'certificates signed by your preferred Certificate Authority (CA).\n'
 '\n'
 '4. For companion apps, you can use Amazon Cognito which integrates with '
 'identity providers like Facebook and Login with Amazon. Cognito identities '
 'can be authorized to access AWS IoT Core.\n'
 '\n'
 '5. For server applications, you can use IAM roles to access AWS IoT Core.\n'
 '\n'
 'So in summary, the main authentication m

In [192]:
llm_type = 'anthropic.claude-3-haiku'

final_response = interactWithLLM(prompt,llm_type)
pp.pprint(final_response)

**THE LLM TYPE IS -->anthropic.claude-3-haiku
The runtime of the invoke_model was 2.35 seconds.
('Based on the information provided, there are a few ways to authenticate to '
 'AWS IoT:\n'
 '\n'
 '1. Using the AWS SDKs or AWS CLI, the SigV4 authentication is handled '
 'automatically under the hood.\n'
 '\n'
 '2. For HTTPS requests, you can use X.509 certificates for authentication.\n'
 '\n'
 '3. For MQTT messages to AWS IoT Core, you can use X.509 certificates signed '
 'either by AWS IoT Core or your preferred Certificate Authority (CA).\n'
 '\n'
 '4. For companion apps, you can use Amazon Cognito to authenticate using '
 'end-user identities, which can be managed by your own identity store or a '
 'third-party identity provider like Facebook or Login with Amazon.\n'
 '\n'
 '5. For server applications, you can use IAM roles to access AWS IoT Core.\n'
 '\n'
 'The key points are that AWS IoT requires strong authentication using X.509 '
 'certificates, AWS IAM credentials, or third-part

In [193]:
llm_type = 'claude'

final_response = interactWithLLM(prompt,llm_type)
pp.pprint(final_response)

**THE LLM TYPE IS -->claude
The runtime of the invoke_model was 3.26 seconds.
(' <response>\n'
 'To authenticate to AWS IoT, you can use:\n'
 '\n'
 '- X.509 certificates\n'
 '- AWS IAM credentials\n'
 '- Third party authentication via AWS Cognito\n'
 '\n'
 'The AWS SDKs and AWS CLI handle SigV4 authentication under the hood. HTTPS '
 'requests can also be authenticated using X.509 certificates. MQTT messages '
 'to AWS IoT Core are authenticated using X.509 certificates. \n'
 '</response>')
