# Retrieval-Augmented Generation: Question Answering based on Custom Dataset with Open-sourced [LangChain](https://python.langchain.com/en/latest/index.html) Library



In this notebook we will demonstrate how to use mutiple large language models like **Falcon 7b** and **Llama-2 7b Chat** to answer questions using a library of documents as a reference, by using document embeddings and retrieval. The embeddings are generated from **GPT-J-6B** embedding model. 

**This notebook serves a template such that you can easily replace the example dataset by your own to build a custom question and asnwering application.**

## Use RAG based approach with [LangChain](https://python.langchain.com/en/latest/index.html) and SageMaker endpoints to build a simplified question and answering application.


We plan to use document embeddings to fetch the most relevant documents in our document knowledge library and combine them with the prompt that we provide to LLM.

To achieve that, we will do following.

1. **Generate embedings for each of document in the knowledge library with SageMaker GPT-J-6B embedding model.**
2. **Identify top K most relevant documents based on user query.**
    - 2.1 **For a query of your interest, generate the embedding of the query using the same embedding model.**
    - 2.2 **Search the indexes of top K most relevant documents in the embedding space using in-memory Faiss search.**
    - 2.3 **Use the indexes to retrieve the corresponded documents.**
3. **Combine the retrieved documents with prompt and question and send them into SageMaker LLM.**



Note: The retrieved document/text should be large enough to contain enough information to answer a question; but small enough to fit into the LLM prompt -- maximum sequence length of 1024 tokens. 

---
To build a simiplied QA application with LangChain, we need: 
1. Wrap up our SageMaker endpoints for embedding model and LLM into `langchain.embeddings.SagemakerEndpointEmbeddings` and `langchain.llms.sagemaker_endpoint.SagemakerEndpoint`. That requires a small overwritten of `SagemakerEndpointEmbeddings` class to make it compatible with SageMaker embedding mdoel.
2. Prepare the dataset to build the knowledge data base. 

---

## Step 1. Deploy large language model (LLM) and embedding model in SageMaker JumpStart

To better illustrate the idea, let's first deploy all the models that are required to perform the demo. You can choose either deploying all inference models as the large language model (LLM) to compare their model performances, or select **subset** of the models based on your preference. To do that, you need modify the `_MODEL_CONFIG_` python dictionary.

In [None]:
!pip install --upgrade pip

In [None]:
!pip install --upgrade sagemaker --quiet
!pip install ipywidgets==7.0.0 --quiet
!pip install langchain==0.0.148 --quiet
!pip install faiss-cpu --quiet

In [None]:
!pip install transformers -q

In [None]:
!pip install langchain -q

In [None]:
#import the required libraries
import time
import sagemaker, boto3, json
from sagemaker.session import Session
from sagemaker.model import Model
from sagemaker import image_uris, model_uris, script_uris, hyperparameters
from sagemaker.predictor import Predictor
from sagemaker.utils import name_from_base
from typing import Any, Dict, List, Optional
from langchain.embeddings import SagemakerEndpointEmbeddings
from langchain.llms.sagemaker_endpoint import ContentHandlerBase

sagemaker_session = Session()
aws_role = sagemaker_session.get_caller_identity_arn()
aws_region = boto3.Session().region_name
sess = sagemaker.Session()
model_version = "*"

Deploy SageMaker endpoint(s) for large language models and GPT-J 6B embedding model. Please uncomment the entries as below if you want to deploy multiple LLM models to compare their performance.

In [None]:
_MODEL_CONFIG_ = {
     #"huggingface-text2text-flan-t5-xxl": {
     #    "instance type": "ml.g5.12xlarge",
     #    "env": {"SAGEMAKER_MODEL_SERVER_WORKERS": "1", "TS_DEFAULT_WORKERS_PER_MODEL": "1"},
     #    "parse_function": parse_response_model_flan_t5,
     #    "prompt": """Answer based on context:\n\n{context}\n\n{question}""",
     #    "endpoint_name": "yoar-d3-rag-huggingface-text2text-flan--2023-07-17-15-04-45-378",
     #    "input_key":"text_inputs",
     #},
    "huggingface-textembedding-gpt-j-6b": {
       "instance type": "ml.g5.12xlarge",
        "env": {"SAGEMAKER_MODEL_SERVER_WORKERS": "1", "TS_DEFAULT_WORKERS_PER_MODEL": "1"},
        "endpoint_name":"agupta-d3-rag-huggingface-textembedding-2023-07-31-14-05-13-066",
       
        
    },
    #"huggingface-llm-falcon-40b-instruct-bf16": {
    #    "instance type": "ml.g5.12xlarge",
    #    "env": {"SAGEMAKER_MODEL_SERVER_WORKERS": "1", "TS_DEFAULT_WORKERS_PER_MODEL": "1"},
    #    "parse_function": parse_response_model_falcon,
    #    "endpoint_name":"jumpstart-dft-hf-llm-falcon-40b-instruct-bf16-1",
    #   "prompt": """Please answer the question below based on this context and  If you cannot find reference for the question in the context, please answer that you Dont know:\n\n{context}\n\n{question}""",
    #    "input_key": "inputs"
    #},
    
    "meta-textgeneration-llama-2-7b": {
        "instance type": "ml.g5.2xlarge",
        "env": {"SAGEMAKER_MODEL_SERVER_WORKERS": "1", "TS_DEFAULT_WORKERS_PER_MODEL": "1"},
        "endpoint_name":"jumpstart-dft-agupta-meta-textgeneration-llama-2-7b-f",
        "prompt": """Please answer the question below based on this context:\n\n{context}\n\n{question}""",
    },
    
    
    # "huggingface-llm-falcon-7b-instruct-bf16": {
    #     "instance type": "ml.g5.12xlarge",
    #     "env": {"SAGEMAKER_MODEL_SERVER_WORKERS": "1", "TS_DEFAULT_WORKERS_PER_MODEL": "1"},
    # },
    # "huggingface-textgeneration1-bloomz-7b1-fp16": {
    #     "instance type": "ml.g5.12xlarge",
    #     "env": {},
    #     "parse_function": parse_response_multiple_texts_bloomz,
    #     "prompt": """question: \"{question}"\\n\nContext: \"{context}"\\n\nAnswer:""",
    # },
    # "huggingface-text2text-flan-ul2-bf16": {
    #     "instance type": "ml.g5.24xlarge",
    #     "env": {
    #         "SAGEMAKER_MODEL_SERVER_WORKERS": "1",
    #         "TS_DEFAULT_WORKERS_PER_MODEL": "1"
    #     },
}

In [None]:
newline, bold, unbold = "\n", "\033[1m", "\033[0m"

# **<span style="color:red">Do not run the below block if the models are already deployed</span>.**

In [None]:

for model_id in _MODEL_CONFIG_:
    endpoint_name = name_from_base(f"agupta-d3-rag-{model_id}")
    inference_instance_type = _MODEL_CONFIG_[model_id]["instance type"]

    # Retrieve the inference container uri. This is the base HuggingFace container image for the default model above.
    deploy_image_uri = image_uris.retrieve(
        region=None,
        framework=None,  # automatically inferred from model_id
        image_scope="inference",
        model_id=model_id,
        model_version=model_version,
        instance_type=inference_instance_type,
    )
    # Retrieve the model uri.
    model_uri = model_uris.retrieve(
        model_id=model_id, model_version=model_version, model_scope="inference"
    )
    print("Setting up")
    model_inference = Model(
        image_uri=deploy_image_uri,
        model_data=model_uri,
        role=aws_role,
        predictor_cls=Predictor,
        name=endpoint_name,
        env=_MODEL_CONFIG_[model_id]["env"],
    )
    print("Deploy begin")
    model_predictor_inference = model_inference.deploy(
        initial_instance_count=1,
        instance_type=inference_instance_type,
        predictor_cls=Predictor,
        endpoint_name=endpoint_name,
    )
    print(f"{bold}Model {model_id} has been deployed successfully.{unbold}{newline}")
    _MODEL_CONFIG_[model_id]["endpoint_name"] = endpoint_name


## Step2: Ask a question to LLM without providing the context

To better illustrate why we need retrieval-augmented generation (RAG) based approach to solve the question and anwering problem. Let's directly ask the model a question and see how they respond.

#### Llama2 Chat: 7b 

In [None]:
# function to create a payload for the Llama-2 Chat Model
def create_payload(query=None,context=None):
    if context and query:
        prompt = """Context is\n\n{context}\n\nQuestion is:\n\n{question}"""
        text_input = prompt.replace("{context}", context)
        text_input = text_input.replace("{question}", query)
        system_content="""You are an expert who answers questions only 
        from the context being provided and use your expertise to extract a relevant and correct answer""" 
    elif query:
        text_input = query
        system_content="You are a chat bot who answers questions"
    else:
        text_input = ""  # or you can set it to None or some default value
        system_content="You are a chat bot who answers questions"
        

    payload = {
        "inputs": [
          [
           {"role": "system", "content": system_content},
           {"role": "user", "content": text_input}
          ]
        ],
        "parameters":{
            "max_new_tokens": 1000,
            # "return_full_text": False,
            # "do_sample": False,
            # "top_k":5
        }
    }
    
    return payload

In [None]:
#query fucntion for LLAMA2 7b Chat Model

endpoint_name = _MODEL_CONFIG_["meta-textgeneration-llama-2-7b"]["endpoint_name"]

def query_endpoint(payload):
    client = boto3.client('runtime.sagemaker')
    response = client.invoke_endpoint(EndpointName=endpoint_name, ContentType='application/json', Body=json.dumps(payload).encode('utf-8'),CustomAttributes='accept_eula=true')
    model_predictions = json.loads(response['Body'].read())
    #print(model_predictions)
    generated_texts = model_predictions[0]['generation']
    generated_text=generated_texts['content']
    print (
        f"{bold}{generated_text}{unbold}{newline}")


In [None]:
#Llama 2 7b Chat
question="Which instances can I use with Managed Spot Training in SageMaker?"
payload=create_payload(question)
query_endpoint(payload)

You can see the generated answer is wrong or doesn't make much sense. 

## Step 3: Improve the answer to the same question using **prompt engineering** with insightful context


To better answer the question well, we provide extra contextual information, combine it with a prompt, and send it to model together with the question. Below is an example.

In [None]:
#Answering based on context with LLama2 7B chat model

question="Which instances can I use with Managed Spot Training in SageMaker?"
context="""Managed Spot Training can be used with all instances supported in Amazon SageMaker. Managed Spot Training is supported in all AWS Regions where Amazon SageMaker is currently available."""
payload=create_payload(question,context)

query_endpoint(payload)

The output from above tells us the chance to get the correct response significantly correlates with the insightful context you send into the LLM. 

**<span style="color:red">Now, the question becomes where can I find the insightful context based on the user query? The answer is to use a pre-stored knowledge data base with retrieval augmented generation, as shown below.</span>.**


## Step 4:  Use RAG based approach with [LangChain](https://python.langchain.com/en/latest/index.html) and SageMaker endpoints to build a simplified question and answering application.

### Step 4.1: Wrap Sagemaker endpoints for embedding and inference models 

To use the SageMaker LLM endpoint with LangChain, we use langchain.llms.sagemaker_endpoint.SagemakerEndpoint, which abstracts the SageMaker LLM endpoint. We need to perform a transformation for the request and response payload as shown in the following code for the LangChain SageMaker integration. Note that you may need to adjust the code in ContentHandler based on the content_type and accepts format of the LLM model that you choose to use.

Wrap up our SageMaker endpoints for embedding model into `langchain.embeddings.SagemakerEndpointEmbeddings`. That requires a small overwritten of `SagemakerEndpointEmbeddings` class to make it compatible with SageMaker embedding mdoel.

In [None]:
from langchain.embeddings.sagemaker_endpoint import EmbeddingsContentHandler


class SagemakerEndpointEmbeddingsJumpStart(SagemakerEndpointEmbeddings):
    def embed_documents(self, texts: List[str], chunk_size: int = 5) -> List[List[float]]:
        """Compute doc embeddings using a SageMaker Inference Endpoint.

        Args:
            texts: The list of texts to embed.
            chunk_size: The chunk size defines how many input texts will
                be grouped together as request. If None, will use the
                chunk size specified by the class.

        Returns:
            List of embeddings, one for each text.
        """
        results = [] # To store the results of embeddings
        _chunk_size = len(texts) if chunk_size > len(texts) else chunk_size

        for i in range(0, len(texts), _chunk_size):
            response = self._embedding_func(texts[i : i + _chunk_size])
            print
            results.extend(response)
        return results


class ContentHandler(EmbeddingsContentHandler):
    content_type = "application/json"
    accepts = "application/json"

    def transform_input(self, prompt: str, model_kwargs={}) -> bytes:
        # Converts input string and model arguments to JSON and encodes it as bytes
        input_str = json.dumps({"text_inputs": prompt, **model_kwargs})
        return input_str.encode("utf-8")

    def transform_output(self, output: bytes) -> str:
        # Decodes the JSON response from the model and extracts embeddings
        response_json = json.loads(output.read().decode("utf-8"))
        embeddings = response_json["embedding"]
        return embeddings


content_handler = ContentHandler()

embeddings = SagemakerEndpointEmbeddingsJumpStart(
    endpoint_name=_MODEL_CONFIG_["huggingface-textembedding-gpt-j-6b"]["endpoint_name"],
    region_name=aws_region,
    content_handler=content_handler,
)

Next, we wrap up our SageMaker endpoints for LLama2 into `langchain.llms.sagemaker_endpoint.SagemakerEndpoint`. 

#### **<span style="color:red">The below block only works for Llama2 Chat. If you want to wrap any other LLM, please make the necessary changes to the Content Handler Class</span>.** 

In [None]:
from langchain.llms.sagemaker_endpoint import LLMContentHandler, SagemakerEndpoint


class ContentHandler(LLMContentHandler):
    content_type = "application/json"
    accepts = "application/json"

    def transform_input(self, prompt: str, model_kwargs: dict) -> bytes:
        input_str = json.dumps({"inputs" : [[{"role" : "system",
        "content" : """You are a helpful, respectful and honest MBA Graduate Teaching Assistant. 
        Always answer as helpfully as possible, while being safe.  
        Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. 
        Please ensure that your responses are socially unbiased and positive in nature.
        If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. 
        If you don't know the answer to a question, please don't share false information."""},
                                             
        {"role" : "user", "content" : prompt}]],
        "parameters" : {**model_kwargs}})
        return input_str.encode('utf-8')
    
    def transform_output(self, output: bytes) -> str:
        response_json = json.loads(output.read().decode("utf-8"))
        return response_json[0]["generation"]["content"]
    


#### Instantiate a LangChain SageMaker Endpoint Object

In [None]:
#content handler class for LLama2 7B Chat Model

parameters={ "max_new_tokens": 1500, 
            "top_p": 0.9, 
            "temperature": 0.6
            
    }

content_handler = ContentHandler()

llm=SagemakerEndpoint(
     endpoint_name=_MODEL_CONFIG_["meta-textgeneration-llama-2-7b"]["endpoint_name"], 
     region_name=aws_region, 
     model_kwargs=parameters,
     endpoint_kwargs={"CustomAttributes": 'accept_eula=true'},
     content_handler=content_handler
 )

#### Create a Prompt Template

In [None]:
from langchain import PromptTemplate
template = "{content}"

prompt = PromptTemplate.from_template(template)

#### <b>Combine your SageMaker endpoint and prompt template to create an LLM chain</b>

The most basic type of chain in LangChain is the LLM chain, which combines an LLM with a prompt template. An LLM chain is instantiated with details related to your LLM and the prompt template you would like to use. You can then run the LLM chain by passing it text. The LLM chain will format that text based on the associated prompt template, and then pass the formatted text to the LLM, and provide the response of the LLM back to you.

In [None]:
from langchain import LLMChain
llm_chain = LLMChain(
     llm=llm,
     prompt=prompt
 )

In [None]:
result=llm_chain.run({"What factors do you think are important for a company to consider when determining the appropriate amount of leverage (debt) to use? How do lenders and borrowers view this factor differently?"})
print(result)

#### <b>Test the LLM hosted on the SageMaker Endpoint</b>

In [None]:
result=llm_chain.run({"What is a balance sheet?"})
print(result)

## Step 4.2: Ingesting the knowledge database

### Initiate a boto3 client to connect to S3 for getting the data

In [None]:
import boto3
import os

In [None]:
def load_S3_data(bucket,s3_dir,local_dir):
    s3 = boto3.client('s3') #Configure AWS Credentials using AWS CLI

    bucket_name = bucket
    prefix = s3_dir
    local_directory = local_dir #specify the directory where you want to store the data from the s3 bucket

    paginator = s3.get_paginator('list_objects_v2')

    for page in paginator.paginate(Bucket=bucket_name, Prefix=prefix):
        for obj in page['Contents']:
            if obj['Key'].endswith('/') or obj['Key'].endswith('.DS_Store'):  # Skip if the key is a directory or a .DS_Store file.
                continue
            target = os.path.join(local_directory, os.path.relpath(obj['Key'], prefix))

            # make sure all necessary directories exist
            os.makedirs(os.path.dirname(target), exist_ok=True)

            # download file
            s3.download_file(bucket_name, obj['Key'], target)

In [None]:
#Uncomment the below lines if data has not been loaded from S3 into the local directory yet. 
#load_s3_data("d3-generative-ai","data/processed/curated_data/", "../accounting_data/") #load acounting data

In [None]:
!pip install unstructured

In [None]:
from langchain.document_loaders import DirectoryLoader

In [None]:
#Specify the path of the folder containing the data
directory="../accounting_data"

In [None]:
loader = DirectoryLoader(directory)

In [None]:
documents = loader.load()

In [None]:
len(documents)

## Step 4.3: Feeding data into Vector Databse and building the context based Question Answering Application

In [None]:
!pip install tokenizers
!pip install tiktoken -q

In [None]:
from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain.vectorstores import Chroma, AtlasDB, FAISS
from langchain.text_splitter import CharacterTextSplitter
from langchain import PromptTemplate
from langchain.chains.question_answering import load_qa_chain

In [None]:
# split the documents into chunks
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

In [None]:
# Firstly, we generate embedings for each of document in the knowledge library with SageMaker GPT-J-6B embedding model
docsearch = FAISS.from_documents(docs, embeddings)

In [None]:
#add note about retriever

In [None]:
# expose the index in a retriever interface
retriever = docsearch.as_retriever(search_type="similarity", search_kwargs={"k":10})

In [None]:
#create a prompt template for generating questions on the list of summaries
from langchain.prompts import PromptTemplate

prompt_template = """Generate 10 questions from the provided context for an accounting exam on these topics: {question}\n Context is: \n{context}"""
Question_Prompt = PromptTemplate.from_template(prompt_template)

In [None]:
prompt_template="""
Use the following pieces of context to answer the question at the end. 
If you don't know the answer, just say that you don't know, don't try to make up an answer.
\n\n{context}\n\n
Question: Generate 10 questions from the provided context for an accounting exam on these topics: {question}
\nHelpful Answer:
"""
Question_Prompt = PromptTemplate.from_template(prompt_template)

In [None]:
# create a chain to generrate questions 
q = RetrievalQA.from_chain_type(
    llm=llm, chain_type="stuff", retriever=retriever, return_source_documents=True,chain_type_kwargs={"prompt":Question_Prompt})

In [None]:
#print out the template of the question answering chain
print(q.combine_documents_chain.llm_chain.prompt.template)

### Extracting the list of summaries which will be used to iterate over the chunks of documents

In [None]:
#Uncomment the below line to load summary excel file from s3
#load_s3_data("d3-generative-ai","data/processed/Summary/", "../Summary/") #Load the summary excel sheet

In [None]:
import pandas as pd

excel_file = '../Summary/Summary_Per_Class.xlsx'

# Read the Excel sheet into a DataFrame
data = pd.read_excel(excel_file)

In [None]:
#print out the datatypes of the columns in the dataframe
data.dtypes

In [None]:
#drop the null values
data = data.dropna()

In [None]:
#convert the Summary column from Object type to string
data['Summary'] = data['Summary'].astype('string')

In [None]:
data.dtypes

#### The Summary column has comma separated values. In order to iterate over the data, we create new list of summaries which has 4  values each. These comma separated values correspond to one topic/term 

In [None]:
# Initialize a list to store grouped values
summaries = []

# Process each row in the DataFrame
for index, row in data.iterrows():
    comma_values = row['Summary'].split(',')  # Replace 'Column_Name' with the actual column name
    
    # Group the comma-separated values into chunks of four
    for i in range(0, len(comma_values), 4):
        # Join the values and append to the list
        summaries.append(','.join(comma_values[i:i+4]))

In [None]:
len(summaries)

### Generate the question by iterating over the summaries 

In [None]:
#create a function to add the generated questions with sources in a dataframe
def add_to_dataframe(df,s,i,sources):
    # Split the string by lines and filter the lines that start with numbered bullet points
    rows = [line.split('. ', 1)[-1] for line in s.split('\n') if line.strip() and line.split(' ')[0].replace('.', '').isdigit()]
    # Append rows to the 'Questions' and 'SUmmary' column
    df = pd.concat([df, pd.DataFrame({'Summary': [i]*len(rows),'Question': rows,'Question_Sources': [sources]*len(rows)})], ignore_index=True)
    # Return the dataframe
    return df

### Testing with a subset of summaries

In [None]:
sub=summaries[:3]

In [None]:
sub

In [None]:
import re
df = pd.DataFrame(columns=['Summary','Question','Question_Sources'])
questions=[]
question_sources=[]
summary=[]
for i in sub:
        rows=[]
        result = q({"query": i})
        response = result['result']
        print(response)
        sources=result['source_documents']
        # Split the text into lines
        lines = response.split('\n')
        # Extract lines that contain a question mark
        rows = [line for line in lines if '?' in line]
        # Remove any leading formatting by keeping only the part of the line that starts with an uppercase or lowercase letter
        cleaned_rows = [re.sub(r'^[^a-zA-Z]*(?:Question\s+\d+)?[^a-zA-Z]*', '', row, flags=re.IGNORECASE) for row in rows]
        if cleaned_rows:
            for row in cleaned_rows:
                questions.append(row)
                summary.append(i)
                question_sources.append(sources)
        else:
            questions.append(response)
            summary.append(i)
            question_sources.append(sources)

df['Summary'] = summary
df['Question'] = questions
df['Question_Sources']= question_sources

In [None]:
df

In [None]:
df.shape

In [None]:
#store the dataframe in a csv
df.to_csv("questions.csv")

### Generate answer to the questions provided by the model

In [None]:
#create a new retreiever or use the existing one for fetching chunks to answer the generated questions
retriever = docsearch.as_retriever(search_type="similarity", search_kwargs={"k":10})


In [None]:
# create a chain to answer questions 
qa = RetrievalQA.from_chain_type(
    llm=llm, chain_type="stuff", retriever=retriever, return_source_documents=True)

In [None]:
#print the template 
print(qa.combine_documents_chain.llm_chain.prompt.template)

In [None]:
#testing with a subset
df=df.iloc[0:5]

In [None]:
# Create empty lists to store the results
results_col = []
response_times_col=[]
sources_col=[]

# Iterate through each question
for question in df['Question']:

    #  Measure the response time
    start_time = time.time()

    # Call the llm chain
    result = qa({"query": question})
    response = result['result']
    
    # Calculate the response time
    response_time = time.time() - start_time
    
    sources=result['source_documents']
    # Append the row data, response, and response time to the results list
    results_col.append(response)
    response_times_col.append(response_time)
    sources_col.append(sources)
    
    
df['Answer_With_Context'] = results_col
df['Response_Time_Answers_With_Context'] = response_times_col
df['Answer_Sources']=sources_col

In [None]:
df

In [None]:
#store the results in csv
df.to_csv("prompt_responses.csv")

### Generating Answers without Context

In [None]:
template = "{content}"
prompt = PromptTemplate.from_template(template)
llm_chain = LLMChain(
     llm=llm,
     prompt=prompt
 )
#Create empty lists to store the results
general_answers = []
response_times_col=[]

for question in df['Question']:
    #  Measure the response time
    start_time = time.time()
    # Call the llm chain
    response = llm_chain.run({question})

    # Calculate the response time
    response_time = time.time() - start_time

    # Append the row data, response, and response time to the results list
    general_answers.append(response)
    response_times_col.append(response_time)

#Add the lists as new columns in the dataframe
df['General_Answers'] = general_answers
df['Response_Times_General_Answers'] = response_times_col

In [None]:
df

In [None]:
#store the results in csv
df.to_csv("all_prompt_responses.csv")

## **<span style="color:red"> This marks the end of the notebook. The following blocks of code are part of the experimentation process </span>**

### Query Function for Falcon 40B Model

#### Falcon 40B Model

In [None]:
# function to create a payload for the Falcon40b Model
def create_payload_falcon(query=None,context=None):
    if context and query:
        prompt = """Please answer the question below based on the provided context and If you cannot find reference for the question in the context, 
        please answer that you Dont know:\n\nContext is: \n\n{context}\n\nQuestion is:\n\n{question}"""
        text_input = prompt.replace("{context}", context)
        text_input = text_input.replace("{question}", query)
    
    elif query:
        text_input = query
    else:
        text_input = ""  # or you can set it to None or some default value
        

    payload = {
    "inputs": text_input,
    "parameters":{
        "max_new_tokens": 100,
        # "return_full_text": False,
        # "do_sample": False,
        # "top_k":5
        }
    }
    
    return payload

In [None]:
#query function for falcon model

endpoint_name = 'jumpstart-dft-hf-llm-falcon-40b-instruct-bf16-1'

def query_endpoint_falcon(payload):
    client = boto3.client('runtime.sagemaker')
    response = client.invoke_endpoint(EndpointName=endpoint_name, ContentType='application/json', Body=json.dumps(payload).encode('utf-8'))
    model_predictions = json.loads(response['Body'].read())
    generated_text = model_predictions[0]['generated_text']
    print (
        f"{bold}{generated_text}{unbold}{newline}")


In [None]:
question="Which instances can I use with Managed Spot Training in SageMaker?"
payload=create_payload_falcon(question)
query_endpoint_falcon(payload)

**<span style="color:red">Running this section will override 'documents' variable from the above code. </span>**

### Documents in .csv format

Now, let's download the example data and prepare it for demonstration. We will use [Amazon SageMaker FAQs](https://aws.amazon.com/sagemaker/faqs/) as knowledge library. The data are formatted in a CSV file with two columns Question and Answer. We use the Answer column as the documents of knowledge library, from which relevant documents are retrieved based on a query. 


In [None]:

original_data = "s3://jumpstart-cache-prod-us-east-2/training-datasets/Amazon_SageMaker_FAQs/"

!mkdir -p rag_data
!aws s3 cp --recursive $original_data rag_data


For the case when you have data saved in multiple subsets. The following code will read all files that end with `.csv` and concatenate them together. Please ensure each `csv` file has the same format.

In [None]:

import glob
import os
import pandas as pd

all_files = glob.glob(os.path.join("rag_data/", "*.csv"))

df_knowledge = pd.concat(
    (pd.read_csv(f, header=None, names=["Question", "Answer"]) for f in all_files),
    axis=0,
    ignore_index=True,
)


Drop the `Question` column as it is not used in this demonstration.

In [None]:
df_knowledge.drop(["Question"], axis=1, inplace=True)

In [None]:
df_knowledge.head(5)

In [None]:
df_knowledge.to_csv("rag_data/processed_data.csv", header=False, index=False)

In [None]:
loader = CSVLoader(file_path="rag_data/processed_data.csv")

In [None]:
documents = loader.load()

### Alternate approach to creating a FAISS Index

### Method 3 : VectorstoreIndexCreator

It exposes a higher-level interface to let you get started in few lines of code. The following code shows how the VectorstoreIndexCreator class in LangChain is used to create a concise implementation of question answering with RAG. Next, we use the query method on the created index and pass the user’s question and SageMaker endpoint LLM. LangChain selects the top four closest documents (K=4) and passes the relevant context extracted from the documents to generate an accurate response.

In [None]:
index_creator = VectorstoreIndexCreator(
    vectorstore_cls=FAISS,
    embedding=embeddings,
    text_splitter=CharacterTextSplitter(chunk_size=800, chunk_overlap=50),
)

In [None]:
index = index_creator.from_loaders([loader])

In [None]:
question="What is a Balance Sheet"

In [None]:
question

In [None]:
index.query(question=question, llm=llm)

## **<span style="color:red">Run this section only if you want to use Pinecone vector database for testing with test data</span>** ##

### Testing Pinecone as our Vector database

In [None]:
!pip install pinecone-client -q

In [None]:
#importing libraries
import os
import pinecone
from langchain.document_loaders import DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Pinecone
from langchain.llms import OpenAI
from langchain.chains.question_answering import load_qa_chain
from langchain.indexes import VectorstoreIndexCreator
from langchain.vectorstores import Chroma, AtlasDB, FAISS
from langchain.text_splitter import CharacterTextSplitter

In [None]:
#splitting the documents into chunks before storing in the database
from langchain.text_splitter import RecursiveCharacterTextSplitter
def split_docs(documents, chunk_size=1000, chunk_overlap=20):
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
    docs = text_splitter.split_documents(documents)
    return docs

docs = split_docs(documents)
print(len(docs))

In [None]:
#check the dimensionality of the embeddings  for creating a database on Pinecone.
query_result = embeddings.embed_query("Hello world")
len(query_result)

In [None]:
!pip install python-dotenv

In [None]:
import os
from dotenv import load_dotenv

In [None]:
pinecone_key=os.getenv("PINECONE_API_KEY")

Before running the following cell, you need to create an Index on Pinecone. Provide a name for the index and the dimensionality of the embddings being stored. 

In [None]:
pinecone.init(
    api_key=pinecone_key,
    environment="us-west4-gcp-free" #change the environment acording to your Pinecone Index
)

index_name = "qna" #this name would be the same as the name you provided while creating the index.

index_p = Pinecone.from_documents(docs, embeddings, index_name=index_name)

In [None]:
#this function performs a search for chunks of documents which might be relevant to answer the question being asked.
def get_similar_docs(query, k=4, score=False):
    if score:
        similar_docs = index_p.similarity_search_with_score(query, k=k)
    else:
        similar_docs = index_p.similarity_search(query, k=k)
    return similar_docs

In [None]:
query="What are the objectives of accounting"
similar_docs = get_similar_docs(query,score=True)

In [None]:
similar_docs

In [None]:
context=""
for doc in similar_docs:
    # Extract the 'page_content' from the Document object
    page_content = doc[0].page_content

    # Append the 'page_content' to the context_variable
    context += page_content + "\n"

In [None]:
context

In [None]:
import tiktoken
def num_tokens_from_string(string: str, encoding_name: str) -> int:
    """Returns the number of tokens in a text string."""
    encoding = tiktoken.get_encoding(encoding_name)
    num_tokens = len(encoding.encode(string))
    return num_tokens

In [None]:
num_tokens=num_tokens_from_string(context,"cl100k_base")

In [None]:
num_tokens

In [None]:
#Falcon 40b Model
query="What are the objectives of accounting"
payload=create_payload_falcon(query,context)
query_endpoint_falcon(payload)

In [None]:
#Llama 2 7b Chat (prompt2)
payload=create_payload(query,context)
query_endpoint(payload)

In [None]:
#Llama 2 7b Chat 
query="Give me a list of all question answer pairs from the provided context that capture all the information in the context."
payload=create_payload(query,context)
query_endpoint(payload)

In [None]:
#llama 2 7b Chat (prompt1)
query_endpoint(payload)

In [None]:
#llama2 13b chat(prompt1)
query_endpoint_13(payload)

In [None]:
#llama2 13b chat (prompt2)
payload=create_payload(query,context)
query_endpoint_13(payload)

In [None]:
prompt_template = """Answer based on context:\n\n{context}\n\n{question}"""

PROMPT = PromptTemplate(template=prompt_template, input_variables=["context", "question"])

In [None]:
chain = load_qa_chain(llm=sm_llm, prompt=PROMPT)

### Method 1: Load QA Chain

In this section, we show you an approach to implement RAG using SageMaker and LangChain. This approach offers the flexibility to configure top K parameters for a relevancy search in the documents. It also allows you to use the LangChain feature of prompt templates, which allow you to easily parameterize the prompt creation instead of hard coding the prompts.

In the following code, we explicitly use FAISS to generate embedding for each of the document in the knowledge library with the SageMaker GPT-J-6B embedding model. Then we identify the top K (K=3) most relevant documents based on the user query.

In [None]:
summary="""M&A deal, revenues per square foot, revenues/sf, roe, return on equity, dupont, adjusted dupont, ratio analysis"""

Based on the question above, we then **identify top K most relevant documents based on user query**.

In [None]:
similar_docs = docsearch.similarity_search(summary, k=10)

In [None]:
def get_context(documents):
    context = [doc.page_content for doc in documents]
    return context
context=get_context(similar_docs)

In [None]:
#define a method to count the number of tokens being retrieved from the documents

import tiktoken
def num_tokens_from_string(string: str, encoding_name: str) -> int:
    """Returns the number of tokens in a text string."""
    encoding = tiktoken.get_encoding(encoding_name)
    num_tokens = len(encoding.encode(string))
    return num_tokens
num_of_tokens=0
for doc in similar_docs:
    num_of_tokens+=num_tokens_from_string(doc.page_content,"cl100k_base")
num_of_tokens

Load_qa_chain provides the most generic interface for answering questions. It loads a chain that you can do QA for your input documents and uses ALL of the text in the documents.

In [None]:
#Using chain_type=stuff
chain = load_qa_chain(llm=llm,chain_type="stuff")
question="Generate 10 questions from the context provided.Also provide answers to those questions from the context only."
result=chain.run(input_documents=similar_docs,question=question)

In [None]:
print(result)

In [None]:
#Using chain_type=map_reduce
chain = load_qa_chain(llm=llm,chain_type="map_reduce")
question="Generate 10 questions and answer pairs from the context provided."
result=chain.run(input_documents=similar_docs,question=question)
print(result)

In [None]:
#Using chain_type=refine
chain = load_qa_chain(llm=llm,chain_type="refine")
question="Generate 10 questions and answer pairs from the context provided.."
result=chain.run(input_documents=similar_docs,question=question)
print(result)

### Method 2: RetrievalQA

RetrievalQA chain uses load_qa_chain under the hood. We retrieve the most relevant chunk of text and feed those to the language model.

In [None]:
# expose the index in a retriever interface
retriever = docsearch.as_retriever(search_type="similarity", search_kwargs={"k":10})
# create a chain to answer questions 
qa = RetrievalQA.from_chain_type(
    llm=llm, chain_type="stuff", retriever=retriever, return_source_documents=True)
query = """Generate 10 questions from the provided context on these topics: 
M&A deal, revenues per square foot, revenues/sf, roe, return on equity, dupont, adjusted dupont, ratio analysis.
Also provide answers to those questions from the context only."""
result = qa({"query": query})
result_text = result['result']
print(result_text)

In [None]:
for i in range(len(result['source_documents'])):
    print (result['source_documents'][i].page_content)
    print('\n')
    print('Source is')
    print (result['source_documents'][i].metadata['source'])
    print('\n')

In [None]:
# expose the index in a retriever interface
retriever = docsearch.as_retriever(search_type="similarity", search_kwargs={"k":10})
# create a chain to answer questions 
qa = RetrievalQA.from_chain_type(
    llm=llm, chain_type="stuff", retriever=retriever, return_source_documents=True)
query = """Generate 10 questions from the provided context on these topics: 
M&A deal, revenues per square foot, revenues/sf, roe, return on equity, dupont, adjusted dupont, ratio analysis."""
result = qa({"query": query})
result_text = result['result']
print(result_text)

In [None]:
import csv

def slice_string_to_csv(s, filename):
    with open(filename, 'w', newline='') as file:
        writer = csv.writer(file)
        for line in s.split('\n'):
            # Check if the line starts with a numbered bullet point
            if line.strip() and line.split(' ')[0].replace('.', '').isdigit():
                writer.writerow([line])

In [None]:
filename="questions.csv"
slice_string_to_csv(result_text,filename)

In [3]:
%run sagemaker_proc.py

sagemaker role arn: arn:aws:iam::275461957965:role/service-role/AmazonSageMaker-ExecutionRole-20230627T145146
sagemaker bucket: sagemaker-us-east-1-275461957965
sagemaker session region: us-east-1


INFO:sagemaker:Creating model with name: huggingface-pytorch-inference-2023-08-17-19-55-38-863
INFO:sagemaker:Creating transform job with name: huggingface-pytorch-inference-2023-08-17-19-56-17-000


transcript_data/transcript_2016_presentation.jsonl uploaded to s3://d3-data-bucket/labs/digital-value/project-ai-transformation-classifier/data/conference_call_data/data/transcript_2016_presentation.jsonl


ResourceLimitExceeded: An error occurred (ResourceLimitExceeded) when calling the CreateTransformJob operation: The account-level service limit 'ml.p3.2xlarge for transform job usage' is 1 Instances, with current utilization of 1 Instances and a request delta of 1 Instances. Please use AWS Service Quotas to request an increase for this quota. If AWS Service Quotas is not available, contact AWS support to request an increase for this quota.