# Benchmarking responses through Mistral Instruct and Bedrock Knowledge Bases on Latency, accuracy and relevancy - All Via Boto3 SDK

In [2]:
#install knowledge base sdk
%pip install --upgrade pip
%pip install boto3 --force-reinstall
%pip install botocore --force-reinstall
%pip install botocore --force-reinstall
%pip install langchain --force-reinstall --quiet

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
[0mNote: you may need to restart the kernel to use updated packages.
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting boto3
  Downloading boto3-1.34.9-py3-none-any.whl.metadata (6.6 kB)
Collecting botocore<1.35.0,>=1.34.9 (from boto3)
  Downloading botocore-1.34.9-py3-none-any.whl.metadata (5.6 kB)
Collecting jmespath<2.0.0,>=0.7.1 (from boto3)
  Downloading jmespath-1.0.1-py3-none-any.whl (20 kB)
Collecting s3transfer<0.11.0,>=0.10.0 (from boto3)
  Downloading s3transfer-0.10.0-py3-none-any.whl.metadata (1.7 kB)
Collecting python-dateutil<3.0.0,>=2.1 (from botocore<1.35.0,>=1.34.9->boto3)
  Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m247.7/247.7 kB[0m [31m122.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting urllib3<2.1,>=1.25.4 (from botocore<1.35.0,>=1.34.9->boto3)
  Downloading urllib3-2.0.7-py

### Configure your bedrock client:

In [3]:
import boto3
import pprint
from botocore.client import Config
from langchain.llms.bedrock import Bedrock
from IPython.display import Markdown, display
from langchain.embeddings import BedrockEmbeddings

bedrock_config = Config(connect_timeout=120, read_timeout=120, retries={'max_attempts': 0})
bedrock_client = boto3.client('bedrock-runtime')
agent_client = boto3.client('bedrock-agent')
bedrock_agent_client = boto3.client("bedrock-agent-runtime",
                              region_name='us-east-1',
                              config=bedrock_config)

# we will be using the Titan Embeddings Model to generate our Embeddings.
embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-g1-text-02", client=bedrock_client)

## Creating the KB using the API for knowledge bases

In [None]:
role_arn = ''  # Replace with your Role ARN
embedding_model_arn = ''  # Corrected Embedding Model ARN

# Choose your vector store type and configure accordingly
# Example for Amazon OpenSearch Service
storage_configuration = {
    'opensearchServerlessConfiguration': {
        'collectionArn': '',  # Replace with your Collection ARN
        'fieldMapping': {
            'metadataField': '',  # Replace with your Metadata Field
            'textField': '',  # Replace with your Text Field
            'vectorField': ''  # Replace with your Vector Field
        },
        'vectorIndexName': ''  # Replace with your Vector Index Name
    },
    'type': ''  # Corrected type
}

# Creating the knowledge base
try:
    medical_KB = agent_client.create_knowledge_base(
        name='medical_KB', 
        description='KB that contains information on medicines and health',
        roleArn=role_arn,
        knowledgeBaseConfiguration={
            'type': 'VECTOR',  # Corrected type
            'vectorKnowledgeBaseConfiguration': {
                'embeddingModelArn': embedding_model_arn
            }
        },
        storageConfiguration=storage_configuration
    )

    # Pretty print the response
    pprint.pprint(medical_KB)

except Exception as e:
    print(f"Error occurred: {e}")


## Create a data source that you can attach to this KB:

In [None]:
# Define the S3 configuration for your data source
s3_configuration = {
    'bucketArn': 'arn:aws:s3:::medicaldata2039',  # Replace with the ARN of your S3 bucket
    'inclusionPrefixes': ['*']  # Assuming you want to include all files in the bucket
}

# Define the data source configuration
data_source_configuration = {
    's3Configuration': s3_configuration,
    'type': 'S3'  # Type of data source, in this case, S3
}

# Replace with your knowledge base ID
knowledge_base_id = ''

# Create the data source
try:
    data_source_response = agent_client.create_data_source(
        knowledgeBaseId=knowledge_base_id,
        name='MedicalDataSource',
        description='DataSource for medical data',
        dataSourceConfiguration=data_source_configuration
    )

    # Pretty print the response
    pprint.pprint(data_source_response)

except Exception as e:
    print(f"Error occurred: {e}")


Creating a data source in Amazon Bedrock involves configuring a connection to an external storage system where your data is hosted. In this case, the data is stored in an Amazon S3 bucket named medicaldata2039. The process starts by defining an S3 configuration, which includes the bucket's ARN and inclusion prefixes. The inclusion prefixes are used to specify which files in the bucket should be included; this can range from a specific file name to a wildcard '*' to include all files.

Once the S3 configuration is set, it's incorporated into a larger data source configuration, which is then used to create the data source through the Amazon Bedrock API. The API call requires details like the name and description of the data source, as well as the ID of the knowledge base to which this data source is to be added.

After the data source is successfully created and linked to the knowledge base, the next step is to initiate a synchronization process. This process involves Amazon Bedrock scanning the specified S3 bucket, based on the inclusion prefixes, and ingesting the data into the knowledge base. During this synchronization, the data is processed, and potentially, embeddings are created based on the configured embedding model. This makes the data searchable and retrievable through the knowledge base.

Once synchronization is complete, the content from the S3 bucket is available in the knowledge base. You can then perform queries against this knowledge base to retrieve relevant information based on your search criteria. This integration allows for a seamless connection between your stored data in S3 and the powerful search and retrieval capabilities offered by Amazon Bedrock, making it a robust solution for managing and accessing large volumes of data efficiently.

## Retrieving content and evaluating using OpenSource Frameworks

In [None]:
import requests
import boto3
import json
from botocore.auth import SigV4Auth
from botocore.awsrequest import AWSRequest

# Replace these variables with your actual values
knowledge_base_id = ''
api_endpoint = 'https://bedrock-agent-runtime.us-east-1.amazonaws.com'
query_text = 'health spending'
number_of_results = 3

# Construct the URL
url = f"{api_endpoint}/knowledgebases/{knowledge_base_id}/retrieve"

# Construct the payload
payload = {
    "retrievalConfiguration": {
        "vectorSearchConfiguration": {
            "numberOfResults": number_of_results
        }
    },
    "retrievalQuery": {
        "text": query_text
    }
}

# Create a request object
request = AWSRequest(method="POST", url=url, data=json.dumps(payload), headers={'Content-Type': 'application/json'})

# Use Boto3 to get the current session's credentials
session = boto3.Session()
credentials = session.get_credentials()

# Create a SigV4Auth object with 'bedrock' as the service name
auth = SigV4Auth(credentials, 'bedrock', session.region_name)

# Sign the request
auth.add_auth(request)

# Make the POST request with the signed headers
response = requests.post(url, headers=dict(request.headers), data=request.body)

# Check if the request was successful
if response.status_code == 200:
    # Process the response
    results = response.json()
    print("Retrieval Results:", results)
else:
    print("Error:", response.status_code, response.text)


## Now that you have created your KB, let's retreive and generate from it

In [4]:
import boto3
import pprint
from botocore.client import Config

pp = pprint.PrettyPrinter(indent=2)

bedrock_config = Config(connect_timeout=120, read_timeout=120, retries={'max_attempts': 0})
bedrock_client = boto3.client('bedrock-runtime')
bedrock_agent_client = boto3.client("bedrock-agent-runtime",
                              config=bedrock_config)

model_id = "anthropic.claude-instant-v1" 
region_id = "" 
kb_id = ""

## Now, let's retrieve and generate responses and store them in a dictionary

In [5]:
def retrieveAndGenerate(input, kbId, sessionId=None, model_id = "anthropic.claude-instant-v1", region_id = ""):
    model_arn = f'arn:aws:bedrock:{region_id}::foundation-model/{model_id}'
    if sessionId:
        return bedrock_agent_client.retrieve_and_generate(
            input={
                'text': input
            },
            retrieveAndGenerateConfiguration={
                'type': 'KNOWLEDGE_BASE',
                'knowledgeBaseConfiguration': {
                    'knowledgeBaseId': kbId,
                    'modelArn': model_arn
                }
            },
            sessionId=sessionId
        )
    else:
        return bedrock_agent_client.retrieve_and_generate(
            input={
                'text': input
            },
            retrieveAndGenerateConfiguration={
                'type': 'KNOWLEDGE_BASE',
                'knowledgeBaseConfiguration': {
                    'knowledgeBaseId': kbId,
                    'modelArn': model_arn
                }
            }
        )

In [6]:
query = "What are trends in health insurance coverage?"
response = retrieveAndGenerate(query, kb_id,model_id=model_id,region_id=region_id)
generated_text = response['output']['text']
pp.pprint(generated_text)

('The primary driver of declining enrollment in private health insurance has '
 'been the increasing cost of health care, contributing to the rising '
 'proportion of uninsured Americans. Approximately 16% of Americans lack '
 'health insurance at any given time.')


## Retrieving and Generating for a bunch of questions on medical health documentation for benchmarking

In [7]:
questions = {
    1: "What are some ways in which an individual’s health can be maintained or improved?",
    2: "How does the current American health care system contribute to innovation in treatments?",
    3: "What are the key objectives of the Administration's policies in health care reform?",
    4: "How does consumer-directed health insurance plans fit into the Administration's health care policies?",
    5: "What proposal did the President make in the State of the Union Address regarding health insurance?",
    6: "What factors contribute to health alongside health care services?",
    7: "How has the trend of obesity changed in the United States since the late 1970s?",
    8: "What are some common conditions that affect job productivity through absenteeism and presenteeism?",
    9: "What has been the trend in national health care spending in the United States since 1960?",
    10: "How has life expectancy in the United States changed since 1900, and what pattern is observed from birth and from age 65?"
}


In [8]:
import time
import pandas as pd

async def generate_responses(model_id, region_id, kb_id, questions):
    responses = []

    for qid, query in questions.items():
        start_time = time.time()
        
        # If retrieveAndGenerate is not an async function, call it directly
        response = retrieveAndGenerate(query, kb_id, model_id=model_id, region_id=region_id)
        
        end_time = time.time()
        latency = end_time - start_time
        generated_text = response['output']['text']
        
        responses.append({
            "Question ID": qid,
            "Question": query,
            "Response": generated_text,
            "Inference Latency (s)": latency,
            "Model": "anthropic-clause-instant"
        })

    return pd.DataFrame(responses)

model_id = "anthropic.claude-instant-v1" 
region_id = "" 
kb_id = ""
results_df = await generate_responses(model_id, region_id, kb_id, questions)

# Display or save the results
print(results_df)
results_df.to_csv("query_responses_latency.csv", index=False)


   Question ID                                           Question  \
0            1  What are some ways in which an individual’s he...   
1            2  How does the current American health care syst...   
2            3  What are the key objectives of the Administrat...   
3            4  How does consumer-directed health insurance pl...   
4            5  What proposal did the President make in the St...   
5            6  What factors contribute to health alongside he...   
6            7  How has the trend of obesity changed in the Un...   
7            8  What are some common conditions that affect jo...   
8            9  What has been the trend in national health car...   
9           10  How has life expectancy in the United States c...   

                                            Response  Inference Latency (s)  \
0  Individual's health can be maintained or impro...               5.500159   
1  The American health care system contributes to...               5.462244   
2  

## more metrics


In [54]:
import time
import pandas as pd

def count_words(text):
    return len(text.split())

async def generate_responses(model_id, region_id, kb_id, questions):
    responses = []

    for qid, query in questions.items():
        start_time = time.time()

        # Call the function to get a response
        response = retrieveAndGenerate(query, kb_id, model_id=model_id, region_id=region_id)

        end_time = time.time()
        latency = end_time - start_time
        generated_text = response['output']['text']

        input_word_count = count_words(query)
        output_word_count = count_words(generated_text)
        word_throughput = output_word_count / latency if latency > 0 else 0
        tps = 1 / latency if latency > 0 else 0

        responses.append({
            "Question ID": qid,
            "Question": query,
            "Response": generated_text,
            "Input Word Count": input_word_count,
            "Output Word Count": output_word_count,
            "Word Throughput (words/s)": word_throughput,
            "Transactions per Second (TPS)": tps,
            "Inference Latency (s)": latency,
            "Model": model_id
        })

    return pd.DataFrame(responses)

# Example usage
model_id = "anthropic.claude-instant-v1" 
region_id = "" 
kb_id = ""
results_df = await generate_responses(model_id, region_id, kb_id, questions)

# Display or save the results
print(results_df)
results_df.to_csv("query_responses_latency.csv", index=False)


   Question ID                                           Question  \
0            1  What are some ways in which an individual’s he...   
1            2  How does the current American health care syst...   
2            3  What are the key objectives of the Administrat...   
3            4  How does consumer-directed health insurance pl...   
4            5  What proposal did the President make in the St...   
5            6  What factors contribute to health alongside he...   
6            7  How has the trend of obesity changed in the Un...   
7            8  What are some common conditions that affect jo...   
8            9  What has been the trend in national health car...   
9           10  How has life expectancy in the United States c...   

                                            Response  Input Word Count  \
0  Individual's health can be maintained or impro...                14   
1  The American health care system contributes to...                13   
2  The key objecti

In [None]:
import pandas as pd
import asyncio

async def main():
    questions = {
        1: "What are some ways in which an individual’s health can be maintained or improved?",
        2: "How does the current American health care system contribute to innovation in treatments?",
        3: "What are the key objectives of the Administration's policies in health care reform?",
        4: "How does consumer-directed health insurance plans fit into the Administration's health care policies?",
        5: "What proposal did the President make in the State of the Union Address regarding health insurance?",
        6: "What factors contribute to health alongside health care services?",
        7: "How has the trend of obesity changed in the United States since the late 1970s?",
        8: "What are some common conditions that affect job productivity through absenteeism and presenteeism?",
        9: "What has been the trend in national health care spending in the United States since 1960?",
        10: "How has life expectancy in the United States changed since 1900, and what pattern is observed from birth and from age 65?"
    }

    # Generate responses for the first model
    model_id_1 = "anthropic.claude-instant-v1"
    results_df_1 = await generate_responses(model_id_1, region_id, kb_id, questions)

    # Generate responses for the second model
    model_id_2 = "anthropic.claude-v2"
    results_df_2 = await generate_responses(model_id_2, region_id, kb_id, questions)

    # Combine results
    combined_results = pd.concat([results_df_1, results_df_2])

    # Calculate and print average latency for each model
    avg_latency_1 = results_df_1["Inference Latency (s)"].mean()
    avg_latency_2 = results_df_2["Inference Latency (s)"].mean()
    
    print(f"Average Latency for {model_id_1}: {avg_latency_1} seconds")
    print(f"Average Latency for {model_id_2}: {avg_latency_2} seconds")

    # Optionally, save combined results
    combined_results.to_csv("combined_query_responses_latency.csv", index=False)

# Run the main function
await main()


## Mistral Instruct using LLaMa index: Generating retrieval on same questions on the same data to benchmark and test

In [21]:
!pip install nvidia-smi

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
[0m

In [22]:
## store the pdf data for mistral RAG
!pip install pypdf
!pip install python-dotenv

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
[0mLooking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
[0m

In [23]:
!pip -q install git+https://github.com/huggingface/transformers
!pip install -q datasets loralib sentencepiece
!pip install -q einops accelerate langchain bitsandbytes
!pip install sentence_transformers

## Utilizing LLaMa-index
!pip install llama-index

[0mLooking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
[0mLooking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
[0m

In [24]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext
from llama_index.llms import HuggingFaceLLM

In [25]:
## Now let's import the data on the pdf
documents = SimpleDirectoryReader("medical_datapdf").load_data()

In [26]:
## prompt engineer it

from llama_index.prompts.prompts import SimpleInputPrompt
system_prompt = "You are a medical assistant. You will answer all medical questions accurately based on the instructions and context you have."


# This will wrap the default prompts that are internal to llama-index
query_wrapper_prompt = "<|USER|>{query_str}<|ASSISTANT|>"

In [33]:
import json
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri

try:
	role = sagemaker.get_execution_role()
except ValueError:
	iam = boto3.client('iam')
	role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

# Hub Model configuration. https://huggingface.co/models
hub = {
	'HF_MODEL_ID':'mistralai/Mistral-7B-Instruct-v0.1',
	'SM_NUM_GPUS': json.dumps(1)
}



# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
	image_uri=get_huggingface_llm_image_uri("huggingface",version="1.1.0"),
	env=hub,
	role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
	initial_instance_count=1,
	instance_type="ml.g5.2xlarge",
	container_startup_health_check_timeout=300,
  )
  
# send request
predictor.predict({
	"inputs": "My name is Julien and I like to",
})

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml
INFO:sagemaker.image_uris:Defaulting to only available Python version: py39
Defaulting to only available Python version: py39
Defaulting to only available Python version: py39
Defaulting to only available Python version: py39
INFO:sagemaker.image_uris:Defaulting to only supported image scope: gpu.
Defaulting to only supported image scope: gpu.
Defaulting to only supported image scope: gpu.
Defaulting to only supported image scope: gpu.
INFO:sagemaker:Creating model with name: huggingface-pytorch-tgi-inference-2023-12-28-17-52-08-640
Creating model with name: huggingface-pytorch-tgi-inference-2023-12-28-17-52-08-640
Creating model with name: huggingface-pytorch-tgi-inference-2023-12-28-17-52-08-640
Creating model with name: huggingface-pytorch-tgi-inference-2023-12-28-17-52-08-640
INFO:sagemak

[{'generated_text': "My name is Julien and I like to play guitar. I'm a beginner and I'm trying to learn how to play."}]

### Creating embeddings using the hugging face all-mpnet-base-v2 sentence transformers model

In [43]:
!pip install pdfplumber

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting pdfplumber
  Downloading pdfplumber-0.10.3-py3-none-any.whl.metadata (38 kB)
Collecting pdfminer.six==20221105 (from pdfplumber)
  Downloading pdfminer.six-20221105-py3-none-any.whl (5.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.6/5.6 MB[0m [31m128.4 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Collecting pypdfium2>=4.18.0 (from pdfplumber)
  Downloading pypdfium2-4.25.0-py3-none-manylinux_2_17_x86_64.whl.metadata (47 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m47.8/47.8 kB[0m [31m117.4 MB/s[0m eta [36m0:00:00[0m
Downloading pdfplumber-0.10.3-py3-none-any.whl (48 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.0/49.0 kB[0m [31m41.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pypdfium2-4.25.0-py3-none-manylinux_2_17_x86_64.whl (3.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.0/3.0 MB[0m [31m2

In [44]:
import pdfplumber

def extract_text_from_pdf(pdf_path):
    with pdfplumber.open(pdf_path) as pdf:
        text = ''
        for page in pdf.pages:
            text += page.extract_text()
    return text

pdf_text = extract_text_from_pdf('medical_datapdf/ERP-2008-chapter4 (1).pdf')


In [46]:
query = "What are trends in health insurance coverage?"

data_for_model = {
    "inputs": {
        "question": query,     # Your question
        "context": pdf_text    # The context from the PDF
    }
}

In [49]:
# # Send the request to the SageMaker endpoint
response = predictor.predict(json.dumps(data_for_model))

print(response)

You can test further for benchmarks as you create a dictionary here to make sure it works