## Loading Generated Synthetic Datasets

In this section, we load synthetic datasets that have been generated for testing purposes. 

In [None]:
import pandas as pd

df = pd.read_json("data/sample_qa_dataset.jsonl", lines=True)
df.head()

In [None]:
from datasets import Dataset
import ast
import re

def clean_string(s):
    s = re.sub(r'[^\x00-\x7F]+', '', s)
    s = s.replace("'", '"')
    return s

def convert_to_list(example):
    cleaned_context = clean_string(example["contexts"])
    try:
        contexts = ast.literal_eval(cleaned_context)
    except:
        contexts = cleaned_context
    return {"contexts": contexts}


subset_length = 10  # Change 
test_dataset = Dataset.from_pandas(df.head(subset_length))

test_dataset = test_dataset.map(convert_to_list)
print(test_dataset)

## RAG Pipeline Setting

The test dataset is used to simulate real-world queries in a RAG pipeline, which combines document retrieval with response generation. 

Here, we are using the default settings for the KnowledgeBase in Amazon Bedrock as part of the RAG configuration.

_1. To utilize the code below for testing, the KnowledgeBase must be pre-configured in the Amazon Bedrock console_

_2. If you have a specific RAG pipeline you want to evaluate, please modify the cells below accordingly_

### Context Retrieval

In this section, we will test the system’s ability to retrieve relevant context from the KnowledgeBase using the provided queries. 

This is a critical step in the RAG pipeline, as the accuracy of the context retrieved has a direct impact on the quality of the generated responses. 

In [None]:
# RAG implementation sample 1 (Replace with RAG pipeline for evaluation)
from libs.bedrock_kb_util import context_retrieval_from_kb

question = test_dataset[0]['question']
search_result = context_retrieval_from_kb(question, 3, 'us-west-2', 'CNDSUOPKAS', 'SEMANTIC')
print("search_result[0]:", search_result[0])

contexts = "\n--\n".join([result['content'] for result in search_result])
print("context:", contexts)

In [None]:
import boto3
from botocore.config import Config

model_id = "anthropic.claude-3-5-sonnet-20240620-v1:0"
region = 'us-west-2'

retry_config = Config(
    region_name=region,
    retries={"max_attempts": 10, "mode": "standard"}
)
boto3_client = boto3.client("bedrock-runtime", config=retry_config)

### Answer Generation

Here, we are generating answers based on the retrieved context. 

In [None]:
def generate_answer(question, contexts):
    system_prompt = """You are an AI assistant that uses retrieved context to answer questions accurately. 
    Follow these guidelines:
    1. Use the provided context to inform your answers.
    2. If the context doesn't contain relevant information, say "I don't have enough information to answer that."
    3. Be concise and to the point in your responses."""

    user_prompt = f"""Context: {contexts}

    Question: {question}

    Please answer the question based on the given context."""

    response = boto3_client.converse(
        modelId=model_id,
        messages=[{'role': 'user', 'content': [{'text': user_prompt}]}],
        system=[{'text': system_prompt}]
    )

    answer = response['output']['message']['content'][0]['text']
    return answer

generate_answer(question, contexts)

### Full Process for All Sample Questions

This section runs the entire pipeline, from context retrieval to answer generation, across a set of sample questions

In [None]:
from time import sleep

kb_region = 'us-west-2'
kb_id = 'CNDSUOPKAS'
top_k = 3

def process_item(item):
    sleep(5)  # Prevent throttling
    question = item['question']
    search_result = context_retrieval_from_kb(question, top_k, kb_region, kb_id, 'SEMANTIC')

    contexts = [result['content'] for result in search_result]
    answer = generate_answer(question, "\n--\n".join(contexts))

    return {
        'question': item['question'],
        'ground_truth': item['ground_truth'],
        'original_contexts': item['contexts'],
        'retrieved_contexts': contexts,
        'answer': answer
    }

updated_dataset = test_dataset.map(process_item)

### Saving Intermediate Results to File

In [None]:
import json
output_file = "data/sample_processed_qa_dataset.jsonl"

with open(output_file, 'w', encoding='utf-8') as f:
    for item in updated_dataset:
        json.dump(item, f, ensure_ascii=False)
        f.write('\n')

print(f"Dataset saved to {output_file}")

Data Format Verification

In [None]:
import json
from datasets import Dataset

input_file = "data/sample_processed_qa_dataset.jsonl"
def read_jsonl(file_path):
    with open(file_path, 'r', encoding='utf-8') as f:
        for line in f:
            yield json.loads(line.strip())

updated_dataset = Dataset.from_list(list(read_jsonl(input_file)))

item = updated_dataset[0]
print(f"Question: {item['question']}\n\n")
print(f"Answer: {item['answer']}")

## Evaluation for Each Metric

We now evaluate the system based on various metrics. 

For detailed implementations, refer to the `libs/custom_ragas.py` file. 

This script contains the specific evaluation criteria that we use to assess the performance of the RAG pipeline across different dimensions, such as accuracy and relevance.

In [None]:
from libs.custom_ragas import (
    evaluate,
    AnswerRelevancy, 
    Faithfulness, 
    ContextRecall,
    ContextPrecision
)

llm_id = "anthropic.claude-3-5-sonnet-20240620-v1:0"
emb_id = "amazon.titan-embed-text-v2:0"
region = "us-west-2"

metrics = [AnswerRelevancy, Faithfulness, ContextRecall, ContextPrecision]

def map_dataset(example):
    return {
        "user_input": example["question"],
        "retrieved_contexts": example["retrieved_contexts"],
        "referenced_contexts": example["original_contexts"],
        "response": example["answer"],
        "reference": example["ground_truth"]
    }

dataset = updated_dataset.map(map_dataset)
results = evaluate(dataset, metrics, llm_id, emb_id, region)

print("Average Scores:")
print(results['average_scores'])

print("\nDetailed Results:")
for row in results['detailed_results']:
    print(row)

In [None]:
json_results = {
    'average_scores': results['average_scores'],
    'detailed_results': results['detailed_results']
}

json_filename = "data/sample_ragas_result.json"

with open(json_filename, 'w', encoding='utf-8') as f:
    json.dump(json_results, f, ensure_ascii=False, indent=4)

print(f"Results saved to {json_filename}")
print(json_results)
