### Introduction to Retrieval Augmented Generation (RAG)

In this section, we'll learn how Retrieval Augmented Generation (RAG) works. 

RAG combines the power of large language models with a retrieval mechanism to provide more accurate and contextually relevant responses. It enhances the capabilities of LLMs by incorporating external knowledge bases, allowing the model to pull in precise information as needed.

We'll take the following steps:

1. **Set up the LLM**: Just like in previous tutorial, we'll start by setting up the language model.
2. **Ask a Question to the LLM**: We'll pose a question to the LLM and observe how it responds using its internal knowledge.
3. **Set up the Knowledge Base**: We'll establish a knowledge base (as shown in the video) that the LLM can use to find relevant information.
4. **Retrieve Information**: We'll use the knowledge base to retrieve relevant information to answer questions more accurately.
5. **Evaluate Responses**: We'll use a small dataset to evaluate RAG responses using the metrics provided by the RAGas library

Let's get started!


### Step 1: Installation and Setup

In [None]:
#Run only once and then restart kernel
%pip install -qU boto3 awscli botocore
%pip install ragas==0.1.21

In [None]:
# restart kernel
from IPython.core.display import HTML
HTML("<script>Jupyter.notebook.kernel.restart()</script>")

import json
import boto3
import pprint as pp
from botocore.exceptions import ClientError
from botocore.client import Config

from datasets import Dataset 
from ragas.metrics import *
from ragas import evaluate
import pandas as pd


from langchain_community.chat_models.bedrock import BedrockChat
from langchain.embeddings import BedrockEmbeddings

### Step 2: Create Bedrock Clients

Retrieval Augmented Generation (RAG) relies on a vector database to store context vectors and retrieve the most relevant ones when a query is made. Amazon Bedrock offers an integrated tool called Knowledge Bases, which simplifies this process. 

Knowledge Bases for Amazon Bedrock is a fully managed capability that supports the entire RAG workflow, from data ingestion to retrieval and prompt augmentation. This eliminates the need for custom integrations with data sources and complex data flow management. Additionally, you can ask questions and summarize data from a single document without needing to set up a vector database.

You can learn more about Bedrock Knowledge Bases [here](https://aws.amazon.com/bedrock/knowledge-bases).

In this step, we'll create Bedrock Knowledge Base clients—instances that enable us to access and utilize Knowledge Bases.


In [None]:
# Create boto3 session
sts_client = boto3.client('sts')
boto3_session = boto3.session.Session()
region_name = boto3_session.region_name

# Create bedrock agent clients
bedrock_config = Config(connect_timeout=120, read_timeout=120, retries={'max_attempts': 0}, region_name=region_name)
chat_client = boto3.client(service_name='bedrock-runtime')
bedrock_agent_client = boto3_session.client("bedrock-agent-runtime",
                              config=bedrock_config)


### Step 3: Choose your LLM

Bedrock offers a wide selection of LLMs to choose from. 

Here are the available options: https://docs.aws.amazon.com/bedrock/latest/userguide/models-supported.html

Let's start by testing Claude.


In [None]:
# we will be using Anthropic Claude 3 Haiku throughout the notebook
model_id = "anthropic.claude-v2" 
region_id = boto3_session.region_name #region is required for running the model
model_arn = f'arn:aws:bedrock:{region_id}::foundation-model/{model_id}'



### Step 4: Test your LLM
We'll use the function below to test our LLM. This function takes an input prompt and returns the answer. It has already been used in the previous tutorial.
In this section, we're asking the model about some iPhone 15 specs to help us make a better decision. Let's see what the model says.


In [None]:
def llm_answer_generator(question_with_prompt):
    """
    This function is used to invoke Amazon Bedrock using the finalized prompt that was created by the prompt_finder(question)
    function.
    :param question_with_prompt: This is the finalized prompt that includes semantically similar prompts, chat history,
    and the users question all in a proper multi-shot format.
    :return: The final answer to the users question.
    """
    # body of data with parameters that is passed into the bedrock invoke model request
    # TODO: TUNE THESE PARAMETERS AS YOU SEE FIT
    body = json.dumps({"prompt": question_with_prompt,
                       "max_tokens_to_sample": 8191,
                       "temperature": 0,
                       "top_k": 250,
                       "top_p": 0.5,
                       "stop_sequences": []
                       })
    contentType = 'application/json'
    # Invoking the bedrock model with your specifications
    response = chat_client.invoke_model(body=body,
                                    modelId=model_id,
                                    contentType=contentType)
    # the body of the response that was generated
    response_body = json.loads(response.get('body').read())
    # retrieving the specific completion field, where you answer will be
    answer = response_body.get('completion')
    # returning the answer as a final result, which ultimately gets returned to the end user
    return answer


In [None]:
#Prompting the LLM
messages = '''Human: is the latest iphone 15 splash resistant?
Assistant:'''
print(llm_answer_generator(messages))

### Step 5: Create a Knowledge Base in the UI

As noted before, the model currently doesn't have updates about the latest iPhone. Utilizing Bedrock Knowledge Bases (KBs) allows us to ingest relevant information and answer questions effectively.
- *Follow the video and upload the folder data/kb_data to S3 as shown in the video to consume all the knowlege sources required for this video*

- Knowledge Bases for Amazon Bedrock offer fully managed capabilities for implementing Retrieval Augmented Generation (RAG) workflows. This includes ingestion, retrieval, and prompt augmentation, all without the need for custom integrations to data sources or management of data flows. Alternatively, you can also ask questions and summarize data from a single document without setting up a vector database.
- KBs help create a repository of structured information that the model can retrieve and use to generate responses.
- Directions for creating and managing Knowledge Bases are provided in the videos.


### Step 6: Create Bedrock Gaurdrails in the UI 
Bedrock Gaurdrails help establish safety controls and guidelines for the generative AI applications, ensuring they adhere to ethical and policy-based standards.
Directions for creating and managing gaurdrails are provided in the videos

### Step 7: Test the Knowledge Base Using retrieve_and_generate() API
`retrieve_and_generate` in Bedrock combines these configurations to effectively fetch and integrate relevant knowledge from a Knowledge Base, using it to generate informed responses tailored to specific queries or prompts.

Here's an explanation of the key parameters used in the `retrieve_and_generate` function:

- **query**: This parameter takes in the input text or query for which you want to retrieve information.
  
- **kb_id**: Specifies the ID of the Knowledge Base (KB) that you've created or want to use for retrieving relevant information.

- **model_arn**: ARN (Amazon Resource Name) of the language model or model configuration that will be used for generating responses based on the retrieved information.

- **max_results**: Determines the maximum number of results (top N chunks) that the retrieval process will fetch from the Knowledge Base. This helps in finding the most relevant information related to the query.

- **prompt_template**: This parameter defines the template or structure of the prompt that the generation process will use. It specifies how the retrieved information will be integrated into the response generation, ensuring that the generated answers are contextually relevant and coherent.

- **retrievalConfiguration**: Specifies the configuration settings for the retrieval process:
  - **vectorSearchConfiguration**: Configures how the retrieval system searches for relevant documents based on vector similarities. `numberOfResults` within this configuration determines how many top documents should be retrieved.
  
- **generationConfiguration**: Defines how the generated response will be structured:
  - **promptTemplate**: Specifies the template for generating responses. It ensures that the retrieved information is appropriately used to generate coherent answers to the query.


In [None]:
def retrieve_and_generate(query, kb_id, model_arn, max_results):
    """
    Function to perform Retrieval Augmented Generation (RAG) using Bedrock.

    Parameters:
    - query: Input text or query for which information is to be retrieved and generated.
    - kb_id: ID of the Knowledge Base (KB) used for retrieval.
    - model_arn: ARN (Amazon Resource Name) of the language model for response generation.
    - max_results: Maximum number of top documents to retrieve from the KB.
    - prompt_template: Template for generating responses based on retrieved information.

    Returns:
    - response: Generated response including text output and context citations.
    """
    # Call Bedrock's retrieve_and_generate function with specified configurations
    response = bedrock_agent_client.retrieve_and_generate(
    input={
        'text': query  # Input query to retrieve and generate information
    },
    retrieveAndGenerateConfiguration={
        'type': 'KNOWLEDGE_BASE',  # Type of retrieval/generation operation
        'knowledgeBaseConfiguration': {
            'knowledgeBaseId': kb_id,  # ID of the Knowledge Base used
            'modelArn': model_arn,  # ARN of the language model used
            'retrievalConfiguration': {
                'vectorSearchConfiguration': {
                    'numberOfResults': max_results  # Number of top documents to retrieve
                }
            },
            'generationConfiguration': {
                'guardrailConfiguration': {
                    'guardrailId': 'gtv1a46m6odg',
                    'guardrailVersion': '1'
                }
            }
        }
    }
)
    return response



def print_generation_results(response, print_context=True):
    """
    Function to print the generated text and retrieved context from the response.

    Parameters:
    - response: Response object containing generated text and context citations.
    - print_context: Flag to indicate whether to print retrieved context (default is True).
    """
    generated_text = response['output']['text']  # Extract generated text from response
    print('Generated FM response:\n')
    pp.pprint(generated_text)  # Pretty print the generated text
    
    
    citations = response["citations"]  # Extract citations or references from the response
    contexts = []
    for citation in citations:
        retrievedReferences = citation["retrievedReferences"]
        for reference in retrievedReferences:
            contexts.append(reference["content"]["text"])  # Extract and append retrieved context from references
    if print_context is True:
        print('\n\n\nRetrieved Context:\n')
        pp.pprint(contexts)  # Pretty print the retrieved context
    return generated_text, contexts


### Step 8: Test Pipeline End to End
#### Your KB should have all the data from data/kb_data folder, if you've not done that, please read step 5 again and upload the data


In [None]:

#Define the Knowledge Base ID
kb_id = "QEJY4GZNL3"

#Define the query or question to ask the Knowledge Base
query = "is iphone 15 splash resistant?"
results = retrieve_and_generate(query=query, kb_id=kb_id, model_arn=model_arn, max_results=5)

# Print the generated response text and retrieved context (if enabled)
print_generation_results(results)


### Step 9: Evaluating the RAG Pipeline
In this step, we will assess the performance of our RAG pipeline using a sample dataset. We'll utilize a smaller subset named Wiki-Eval, which includes questions pertaining to various historical events.

In [None]:
#load the dataset
wiki_dataset = pd.read_parquet("./data/wiki_eval.parquet")
wiki_dataset

In [None]:
llm_answers = []
llm_contexts = []
#generate answers using the RAG LLM we setup in the previous steps
for question in wiki_dataset["question"]:
    answer = retrieve_and_generate(query=question, kb_id=kb_id, model_arn=model_arn, max_results=5)
    llm_response, retrieved_context = print_generation_results(answer, print_context=False)
    llm_answers.append(llm_response)
    llm_contexts.append(retrieved_context)
    

In [None]:
#update dataset with generated response and context
wiki_dataset["answer"] = llm_answers
wiki_dataset["contexts"] = llm_contexts
#format the dataset for evaluation
dict_default = wiki_dataset.to_dict(orient='list')
dataset = Dataset.from_dict(dict_default)


In [None]:
#setup LLMs and embeddings for evaluation
llm_for_evaluation = BedrockChat(model_id="anthropic.claude-3-sonnet-20240229-v1:0", client=chat_client)
bedrock_embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-text-v1",client=chat_client)

In [None]:
'''the below is a list of metrics provided by RAGas, you can read them up here: 
https://docs.ragas.io/en/latest/concepts/metrics/index.html'''

from ragas.metrics import (
    faithfulness,
    answer_relevancy,
    context_recall,
    context_precision,
    context_entity_recall,
    answer_similarity,
    answer_correctness
)

from ragas.metrics.critique import (
harmfulness, 
maliciousness, 
coherence, 
correctness, 
conciseness
)





In [None]:
#run evaluation
result = evaluate(
    dataset = dataset, 
    metrics=[harmfulness,coherence], #you can choose any of the above metrics too
    llm=llm_for_evaluation,
    embeddings=bedrock_embeddings,
)



In [None]:
#see results
result.to_pandas()