# Knowledge Base Parameter Optimization using RAGAS
This notebook implements evaluation and optimization of Amazon Bedrock Knowledge bases parameters using the RAGAS framework

In [27]:
!pip install ragas
!pip install datasets
!pip install pandas
!pip install boto3
!pip install langchain
!pip install langchain-aws
!pip install nltk
!pip install seaborn



In [28]:
# Import required libraries
import pandas as pd
import boto3
from datetime import datetime
from langchain_aws import ChatBedrockConverse, BedrockEmbeddings
from datasets import Dataset
from ragas import evaluate
from ragas.metrics import SemanticSimilarity
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
import itertools
import matplotlib.pyplot as plt
import seaborn as sns

## Define parameters and permutations to optimize over
We'll optimize over the number of results to retrieve from the knowledge base, and the model we'll use to generate the answer.
More query time hypter-parameters to try attempt to optimize over include:
- custom prompt
- Semantic vs Hybrid serach

Additionally there are index time parameters you could optimize, for example:
- Chunking strategy
- Embedding model. 

To change these you'll need to recreate the KB data source, or the entire knowledge base. This can be done via API in a loop. See example code [here](https://github.com/aws-samples/amazon-bedrock-workshop/blob/main/02_KnowledgeBases_and_RAG/0_create_ingest_documents_test_kb.ipynb).

In [37]:
# This is a knowledge base created via the AWS console with all defaults and has an S3 datasource that indexes this
# filie https://docs.aws.amazon.com/pdfs/whitepapers/latest/aws-overview/aws-overview.pdf
BEDROCK_KNOWLEDGE_BASE_ID = "GXHTSVCWZI"
REGION_NAME = "us-east-1"
DEFAULT_LLM_MODEL_ID = 'anthropic.claude-3-haiku-20240307-v1:0'

# Parameters to optimize
NUM_RESULTS_OPTIONS = [1, 3, 7]
MODEL_OPTIONS = [
    'mistral.mistral-small-2402-v1:0',
    'ai21.jamba-1-5-mini-v1:0',
    'cohere.command-r-v1:0',
]

# Generate all possible parameter combinations
parameter_combinations = list(itertools.product(NUM_RESULTS_OPTIONS, MODEL_OPTIONS))
parameter_combinations


[(1, 'mistral.mistral-small-2402-v1:0'),
 (1, 'ai21.jamba-1-5-mini-v1:0'),
 (1, 'cohere.command-r-v1:0'),
 (3, 'mistral.mistral-small-2402-v1:0'),
 (3, 'ai21.jamba-1-5-mini-v1:0'),
 (3, 'cohere.command-r-v1:0'),
 (7, 'mistral.mistral-small-2402-v1:0'),
 (7, 'ai21.jamba-1-5-mini-v1:0'),
 (7, 'cohere.command-r-v1:0')]

In [38]:
# The questions we'll be asking the knowledge base and the answers we expect to get back ("ground truth")
test_data_aws_services = [
    {
        "question": "What is AWS Lambda and how does it work?",
        "ground_truth": "AWS Lambda is a serverless compute service that runs code in response to events without managing servers. It automatically scales and only charges for actual compute time used."
    },
    {
        "question": "What is Amazon S3's durability guarantee?",
        "ground_truth": "Amazon S3 provides 99.999999999% (11 9's) durability for objects stored in all S3 storage classes across multiple Availability Zones."
    },
    {
        "question": "How does AWS Direct Connect differ from VPN?",
        "ground_truth": "AWS Direct Connect provides dedicated physical connections to AWS, while VPN creates encrypted tunnels over the public internet. Direct Connect offers more consistent network performance and lower latency."
    },
    {
        "question": "What is Amazon Aurora and its key benefits?",
        "ground_truth": "Amazon Aurora is a MySQL/PostgreSQL-compatible database offering up to 5x performance of MySQL and 3x of PostgreSQL, with automated scaling, backup, and fault tolerance built-in."
    },
    {
        "question": "How does AWS Shield protect against DDoS attacks?",
        "ground_truth": "AWS Shield provides automatic DDoS protection for all AWS customers at the network/transport layer (Standard) and additional protection with advanced monitoring for higher-level attacks (Advanced)."
    },
    {
        "question": "What is Amazon EKS and its primary use case?",
        "ground_truth": "Amazon Elastic Kubernetes Service (EKS) is a managed Kubernetes service for running containerized applications at scale, eliminating the need to manage the Kubernetes control plane."
    },
    {
        "question": "How does AWS CloudFormation enable Infrastructure as Code?",
        "ground_truth": "AWS CloudFormation allows you to define infrastructure using templates (JSON/YAML), enabling automated, version-controlled deployment and management of AWS resources."
    },
    {
        "question": "Which AWS service should I used to store my applicative passwords?",
        "ground_truth": "For storing application passwords securely in AWS use AWS Secrets Manager."
    },
    {
        "question": "How do replace a spare tire?",
        "ground_truth": "Park on flat surface, loosen lug nuts, jack up car, remove flat tire, mount spare, tighten lug nuts in star pattern, lower car, verify lug nut tightness."
    },
    {
        "question": "What is Amazon SageMaker's core functionality?",
        "ground_truth": "Amazon SageMaker generates animations of flying shawarmas using serverless technology."
    }
]

In [39]:
''' 
This function return Langchain LLM and Embedding wrapper with Bedrock LLMs and Embeddings.
'''
def get_bedrock_llm_and_embeddings_for_ragas(llm_model = DEFAULT_LLM_MODEL_ID):
    config = {
        "region_name": REGION_NAME,
        "llm": llm_model,
        "embeddings": "amazon.titan-embed-text-v1",
        "temperature": 0.1,
    }

    bedrock_llm = ChatBedrockConverse(
        region_name=config["region_name"],
        model=config["llm"],
        temperature=config["temperature"],
    )

    bedrock_embeddings = BedrockEmbeddings(
        region_name=config["region_name"],
        model_id=config["embeddings"],
    )

    return LangchainLLMWrapper(bedrock_llm), LangchainEmbeddingsWrapper(bedrock_embeddings)

In [40]:
bedrock_runtime = boto3.client(
    service_name = 'bedrock-agent-runtime',
    region_name = REGION_NAME
)

def query_knowledge_base(question :str, model_arn :str, number_of_results :int): 
    try:
        response = bedrock_runtime.retrieve_and_generate(
            input={'text': question},
            retrieveAndGenerateConfiguration={
                'type': 'KNOWLEDGE_BASE',
                'knowledgeBaseConfiguration': {
                    'knowledgeBaseId': BEDROCK_KNOWLEDGE_BASE_ID,
                    'modelArn': model_arn,
                    'retrievalConfiguration':{
                        'vectorSearchConfiguration': {
                            'numberOfResults': number_of_results
                        }
                    }
                }
            }
        )
        
        return {
            "output": response["output"]["text"],
            "citations": [ref['content']['text'] for citation in response.get('citations', [])
                         for ref in citation.get('retrievedReferences', [])
                         if ref.get('content', {}).get('text')]
        }
    except Exception as e:
        print(f"Error: {str(e)}")
        return None

In [41]:
def generate_answers(test_data :str, model_arn : str = DEFAULT_LLM_MODEL_ID, number_of_results : int = 3):
    print('Generating answers')
    answers = []
    for item in test_data:
        response = query_knowledge_base(
            question = item["question"], 
            model_arn = model_arn, 
            number_of_results = number_of_results)
                    
        if response:
            answers.append({
                "question": item["question"],
                "answer": response["output"],
                "ground_truth": item["ground_truth"],
                "retrieved_contexts": response["citations"]
            })
    return answers


def evaluate_knowledge_base(answers):
    dataset = Dataset.from_pandas(pd.DataFrame(answers))
    
    metrics = [
        SemanticSimilarity(),
    ]

    llm, embeddings = get_bedrock_llm_and_embeddings_for_ragas()
    print('Evaluating answers')
    results = evaluate(
        dataset=dataset,
        metrics=metrics,
        llm=llm,
        embeddings=embeddings
    )

    return results

In [55]:
import statistics
def evaluate_parameter_combination(num_results, model_id, test_data):
    """Evaluate a specific parameter combination"""
    answers = generate_answers(test_data, model_id, num_results)
    results = evaluate_knowledge_base(answers)
    
    return {
        'num_results': num_results,
        'model_id': model_id,
        'semantic_similarity': statistics.mean(results['semantic_similarity']),
    }

In [56]:
def optimize_parameters(test_data):
    """Run optimization across all parameter combinations"""
    results = []
    
    for num_results, model_id in parameter_combinations:
        print(f"Testing combination: {num_results} results, {model_id}")
        result = evaluate_parameter_combination(num_results, model_id, test_data)
        results.append(result)
    
    # Convert results to DataFrame for analysis
    df_results = pd.DataFrame(results)
    
    # Find optimal combination based on semantic similarity
    optimal_row = df_results.loc[df_results['semantic_similarity'].idxmax()]
    
    return df_results, optimal_row

## Run evaluation

In [57]:
# Run optimization
results_df, optimal_params = optimize_parameters(test_data_aws_services)

Testing combination: 1 results, mistral.mistral-small-2402-v1:0
Generating answers
Evaluating answers


Evaluating: 100%|██████████| 10/10 [00:00<00:00, 40.19it/s]


Testing combination: 1 results, ai21.jamba-1-5-mini-v1:0
Generating answers
Evaluating answers


Evaluating: 100%|██████████| 10/10 [00:00<00:00, 46.66it/s]


Testing combination: 1 results, cohere.command-r-v1:0
Generating answers
Evaluating answers


Evaluating: 100%|██████████| 10/10 [00:00<00:00, 43.43it/s]


Testing combination: 3 results, mistral.mistral-small-2402-v1:0
Generating answers
Evaluating answers


Evaluating: 100%|██████████| 10/10 [00:00<00:00, 38.09it/s]


Testing combination: 3 results, ai21.jamba-1-5-mini-v1:0
Generating answers
Evaluating answers


Evaluating: 100%|██████████| 10/10 [00:00<00:00, 47.12it/s]


Testing combination: 3 results, cohere.command-r-v1:0
Generating answers
Evaluating answers


Evaluating: 100%|██████████| 10/10 [00:00<00:00, 20.52it/s]


Testing combination: 7 results, mistral.mistral-small-2402-v1:0
Generating answers
Evaluating answers


Evaluating: 100%|██████████| 10/10 [00:00<00:00, 47.25it/s]


Testing combination: 7 results, ai21.jamba-1-5-mini-v1:0
Generating answers
Evaluating answers


Evaluating: 100%|██████████| 10/10 [00:00<00:00, 45.91it/s]


Testing combination: 7 results, cohere.command-r-v1:0
Generating answers
Evaluating answers


Evaluating: 100%|██████████| 10/10 [00:00<00:00, 44.54it/s]


## Analyze evaluation results
### Show overall results

In [58]:
print("\nOptimal Parameters:")
print(f"Number of results: {optimal_params['num_results']}")
print(f"Model ID: {optimal_params['model_id']}")
print(f"Semantic Similarity Score: {optimal_params['semantic_similarity']:.4f}")

# Display full results sorted by semantic similarity
print("\nAll Results (sorted by semantic similarity):")
display(results_df.sort_values('semantic_similarity', ascending=False))


Optimal Parameters:
Number of results: 7
Model ID: cohere.command-r-v1:0
Semantic Similarity Score: 0.8668

All Results (sorted by semantic similarity):


Unnamed: 0,num_results,model_id,semantic_similarity
8,7,cohere.command-r-v1:0,0.866808
2,1,cohere.command-r-v1:0,0.82637
3,3,mistral.mistral-small-2402-v1:0,0.810855
0,1,mistral.mistral-small-2402-v1:0,0.809441
6,7,mistral.mistral-small-2402-v1:0,0.809415
5,3,cohere.command-r-v1:0,0.808495
7,7,ai21.jamba-1-5-mini-v1:0,0.38279
1,1,ai21.jamba-1-5-mini-v1:0,0.343071
4,3,ai21.jamba-1-5-mini-v1:0,0.329163
