## Evaluating Fixed vs. Semantic Chunking in Knowledge Bases for RAG

### Overview

This notebook demonstrates Retrieval-Augmented Generation (RAG) using a predefined ground truth to evaluate the effectiveness of fixed and semantic chunking knowledge bases (created in Lab 1). We utilize the Amazon Bedrock Nova lite model for generating responses and FloTorch for evaluating these responses against the ground truth after retrieving information from the respective knowledge bases.

### Prerequisites

1. Ensure that all prerequisites outlined in the `4.1 Prerequisites.ipynb` notebook have been completed.

#### Load env variables

In [None]:
import json
with open("variables.json", "r") as f:
    variables = json.load(f)

variables

#### Load prompt.json

prompt.json file includes the following:

* system_prompt
* examples for n shot learning
* user_prompt

In [None]:
prompt_file_path = './data/prompt.json'
with open(prompt_file_path, 'r') as f:
    prompt = json.load(f)

#### Retrieve Bedrock Knowledge Base IDs

This step fetches the IDs of all Bedrock knowledge bases created during Lab 1. It assumes that knowledge bases using both fixed and semantic chunking methods exist. Furthermore, if knowledge bases employing hierarchical or custom chunking were also created, this notebook will also be able to include them in the evaluation.

In [None]:
bedrock_kbs = {}

kb_types = ['kbFixedChunk', 'kbSemanticChunk', 'kbHierarchicalChunk', 'kbCustomChunk']
bedrock_kbs = {variables[kb_type]:kb_type for kb_type in kb_types if variables.get(kb_type)}

bedrock_kbs

### Experiment Config

We will use the following fixed parameters for retrieval and inference:

* **KNN (k-Nearest Neighbors):** 5
* **Rerank Model:** Amazon Rerank
* **N-Shot Prompt:** 1
* **Inference Model:** Nova-Lite (via Bedrock)
* **Temperature:** 0.1


In [None]:
exp_config_data = {
    "temp_retrieval_llm": "0.1",
    "gt_data": variables["s3_ground_truth_path"],
    "rerank_model_id": "amazon.rerank-v1:0",
    "retrieval_service": "bedrock",
    "knn_num": "5",
    "retrieval_model": "us.amazon.nova-lite-v1:0",
    "aws_region": variables['regionName'],
    "n_shot_prompt_guide_obj": prompt,
    "n_shot_prompts": 1
}

#### Load ground truth data

We utilize FloTorch core's S3StorageProvider and JSONReader to load ground truth data for evaluating the RAG pipeline.

In [None]:
from flotorch_core.storage.storage_provider_factory import StorageProviderFactory
from flotorch_core.reader.json_reader import JSONReader
from flotorch_rag_utils import Question

gt_data = exp_config_data['gt_data']
storage = StorageProviderFactory.create_storage_provider(gt_data)
gt_data_path = storage.get_path(gt_data)
json_reader = JSONReader(storage)
questions = json_reader.read_as_model(gt_data_path, Question)

#### Initialize Vector Storage

The `VectorStorageFactory` serves as a central factory for creating and managing various vector storage implementations. It instantiates the appropriate backend based on the provided configuration parameters.

**Parameters:**

* `knowledge_base` (*bool*): A boolean flag indicating whether vector storage is enabled.
* `use_bedrock_kb` (*bool*): Determines the vector storage implementation to use:
    * `True`: Utilizes Bedrock Knowledge Base.
    * `False`: Utilizes a provisioned OpenSearch cluster.
* `embedding` (*BaseEmbedding*): An instance of the embedding model to be used for generating vector embeddings.
* `opensearch_host` (*Optional[str]*): The hostname or IP address of the OpenSearch server. **Required only when `use_bedrock_kb` is `False`.**
* `opensearch_port` (*Optional[int]*): The port number for connecting to the OpenSearch server. **Required only when `use_bedrock_kb` is `False`.**
* `opensearch_username` (*Optional[str]*): The username for OpenSearch authentication. **Required only when `use_bedrock_kb` is `False`.**
* `opensearch_password` (*Optional[str]*): The password for OpenSearch authentication. **Required only when `use_bedrock_kb` is `False`.**
* `index_id` (*Optional[str]*): The identifier for the OpenSearch index. **Required only when `use_bedrock_kb` is `False`.**
* `knowledge_base_id` (*Optional[str]*): The identifier for the Bedrock Knowledge Base. **Required only when `use_bedrock_kb` is `True`.**
* `aws_region` (*str*): The AWS region for Bedrock Knowledge Base operations. Defaults to `"us-east-1"`. **Required only when `use_bedrock_kb` is `True`.**

**Returns:**

(*VectorStorage*): An instance of either `OpenSearchClient` or `BedrockKnowledgeBaseStorage` based on the value of the `use_bedrock_kb` flag.

In [None]:
from flotorch_core.storage.db.vector.vector_storage_factory import VectorStorageFactory

def vector_storage(bedrock_kb_id: str, exp_config_data: dict):
    vector_storage = VectorStorageFactory.create_vector_storage(
            knowledge_base=True,
            use_bedrock_kb=True,
            embedding=None,
            knowledge_base_id=bedrock_kb_id,
            aws_region=exp_config_data.get("aws_region")
        )
    return vector_storage

#### Initialize BedrockReranker

The `BedrockReranker` class enables the reranking of documents using reranking models available through Amazon Bedrock. This is beneficial for refining the order of retrieved documents to enhance their relevance to a given query.

**Parameters:**

* `region` (*str*): The AWS region where the Bedrock service is accessible.
* `rerank_model_id` (*str*): The identifier of the specific Bedrock reranking model to be used.
* `bedrock_client` (*Optional[boto3.client]*): An optional, pre-configured Bedrock agent runtime client.



In [None]:
from flotorch_core.rerank.rerank import BedrockReranker

reranker = BedrockReranker(exp_config_data.get("aws_region"), exp_config_data.get("rerank_model_id")) \
    if exp_config_data.get("rerank_model_id").lower() != "none" \
    else None 

#### Initialize Bedrock Inferencer

Creates and returns an appropriate `Inferencer` instance depending on service and the model 

#### Parameters

- `gateway_enabled`: *(bool)* – Enables FloTorch LLM gateway-based invocation if set to `True`.
- `gateway_url`: *(str)* – URL endpoint for the FloTorch LLM Gateway.
- `gateway_api_key`: *(str)* – API key for authenticating requests to the FloTorch LLM gateway.
- `retrieval_service`: *(str)* – Name of the retrieval service (e.g., bedrock, sagemaker).
- `retrieval_model`: *(str)* – The model to use for inference (e.g., `anthropic.claude-v2`).
- `aws_region`: *(str)* – AWS region for service provisioning (e.g., `us-east-1`).
- `iam_role`: *(str)* – IAM role ARN for SageMaker invocation permissions.
- `n_shot_prompts`: *(int)* – Number of few-shot examples to include in prompt.
- `temp_retrieval_llm`: *(float)* – Temperature setting for the language model.
- `n_shot_prompt_guide_obj`: *(Any)* – Few-shot guide object for prompt engineering.

---

#### Behavior

- If `gateway_enabled` is `True`, connects to the FloTorch LLM Gateway using credentials.
- If disabled, falls back to direct model invocation through supported services like AWS Bedrock or AWS SageMaker.
- Supports dynamic few-shot prompting and custom temperature configuration.

---

#### Outcome

Returns a fully configured `Inferencer` object capable of generating answers or completions for queries using the selected language model.

In [None]:
from flotorch_core.inferencer.inferencer_provider_factory import InferencerProviderFactory

inferencer = InferencerProviderFactory.create_inferencer_provider(
    False,"","",
    exp_config_data.get("retrieval_service"),
    exp_config_data.get("retrieval_model"), 
    exp_config_data.get("aws_region"), 
    variables['bedrockExecutionRoleArn'],
    int(exp_config_data.get("n_shot_prompts", 0)), 
    float(exp_config_data.get("temp_retrieval_llm", 0)), 
    exp_config_data.get("n_shot_prompt_guide_obj")
)

#### Execute RAG on All Available Bedrock Knowledge Bases

Perform the retrieval, reranking, and inference steps using the `flotorch-core` library.

In [None]:


from flotorch_rag_utils import rag_with_flotorch

rag_response_dict = {}

# The evaluation process duration is dependent on the volume of questions and the number of knowledge bases being evaluated. 
# Larger evaluations require more time, generally around 5-6 minutes.
for bedrock_kb_id in bedrock_kbs:
    vector_storage(bedrock_kb_id, exp_config_data)
    responses = rag_with_flotorch(exp_config_data, vector_storage, reranker, inferencer, questions)
    rag_response_dict[bedrock_kbs[bedrock_kb_id]] = responses


#### Write the results to a JSON file

In [None]:
import json

filename = f"./results/ragas_evaluation_responses_for_different_kbs.json"

# Save to JSON with proper formatting
with open(filename, 'w', encoding='utf-8') as f:
    json.dump(rag_response_dict, f, indent=4, ensure_ascii=False)