# Retrieval and Generation using Bedrock Models for Ground Truth

### Overview

This notebook provides a practical demonstration of Retrieval-Augmented Generation (RAG) with Amazon Bedrock foundational models, utilizing FloTorch for ground truth evaluation. It walks through the process of fetching pertinent information from a knowledge base and subsequently generating responses grounded in the retrieved context.

### Load env variables

In [None]:
import json
with open("../Lab 1/variables.json", "r") as f:
    variables = json.load(f)

variables

### Load prompt.json

In [None]:
prompt_file_path = '../data/prompt.json'
with open(prompt_file_path, 'r') as f:
    prompt = json.load(f)

### Fetch all available bedrock kb id's

**Important:** This step assumes that your knowledge base has already been created in Lab 1. Please ensure that you have completed the knowledge base creation as part of Lab 1 before proceeding.

In [None]:
bedrock_kbs = [] 

kb_types = ['kbFixedChunk', 'kbSemanticChunk', 'kbHierarchicalChunk', 'kbCustomChunk']
bedrock_kbs = [variables[kb_type] for kb_type in kb_types if variables.get(kb_type)]

bedrock_kbs

### Experiment Config

For consistent evaluation across different knowledge bases, we will use the following fixed parameters for retrieval and inference:

* **KNN (k-Nearest Neighbors):** 5
* **Rerank Model:** Amazon Rerank
* **N-Shot Prompt:** 1
* **Inference Model:** Nova-Lite (via Bedrock)
* **Temperature:** 0.1


In [None]:
exp_config_data = {
    "temp_retrieval_llm": "0.1",
    "gt_data": variables["s3_ground_truth_path"],
    "rerank_model_id": "amazon.rerank-v1:0",
    "retrieval_service": "bedrock",
    "knn_num": "5",
    "retrieval_model": "us.amazon.nova-lite-v1:0",
    "aws_region": "us-east-1",
    "n_shot_prompt_guide_obj": prompt,
    "n_shot_prompts": 1
}

### Load ground truth data

In [None]:
from flotorch_core.storage.storage_provider_factory import StorageProviderFactory
from flotorch_core.reader.json_reader import JSONReader
from flotorch_core.chunking.chunking import Chunk
from pydantic import BaseModel

class Question(BaseModel):
    question: str
    answer: str

    def get_chunk(self) -> Chunk:
        return Chunk(data=self.question)


gt_data = exp_config_data['gt_data']
storage = StorageProviderFactory.create_storage_provider(gt_data)
gt_data_path = storage.get_path(gt_data)
json_reader = JSONReader(storage)
questions = json_reader.read_as_model(gt_data_path, Question)

### Initialize Reranker

In [None]:
from flotorch_core.rerank.rerank import BedrockReranker

reranker = BedrockReranker(exp_config_data.get("aws_region"), exp_config_data.get("rerank_model_id")) \
    if exp_config_data.get("rerank_model_id").lower() != "none" \
    else None 

### Initialize Bedrock Inferencer

Creates and returns an appropriate `Inferencer` instance depending on service and the model 

### Parameters

- `gateway_enabled`: *(bool)* – Enables FloTorch LLM gateway-based invocation if set to `True`.
- `gateway_url`: *(str)* – URL endpoint for the FloTorch LLM Gateway.
- `gateway_api_key`: *(str)* – API key for authenticating requests to the FloTorch LLM gateway.
- `retrieval_service`: *(str)* – Name of the retrieval service (e.g., bedrock, sagemaker).
- `retrieval_model`: *(str)* – The model to use for inference (e.g., `anthropic.claude-v2`).
- `aws_region`: *(str)* – AWS region for service provisioning (e.g., `us-east-1`).
- `iam_role`: *(str)* – IAM role ARN for SageMaker invocation permissions.
- `n_shot_prompts`: *(int)* – Number of few-shot examples to include in prompt.
- `temp_retrieval_llm`: *(float)* – Temperature setting for the language model.
- `n_shot_prompt_guide_obj`: *(Any)* – Few-shot guide object for prompt engineering.

---

### Behavior

- If `gateway_enabled` is `True`, connects to the specified API Gateway using credentials.
- If disabled, falls back to direct model invocation through supported services like AWS Bedrock.
- Supports dynamic few-shot prompting and custom temperature configuration.

---

### Outcome

Returns a fully configured `Inferencer` object capable of generating answers or completions for queries using the selected language model.

In [None]:
from flotorch_core.inferencer.inferencer_provider_factory import InferencerProviderFactory

inferencer = InferencerProviderFactory.create_inferencer_provider(
    False,"","",
    exp_config_data.get("retrieval_service"),
    exp_config_data.get("retrieval_model"), 
    exp_config_data.get("aws_region"), 
    variables['bedrockExecutionRoleArn'],
    int(exp_config_data.get("n_shot_prompts", 0)), 
    float(exp_config_data.get("temp_retrieval_llm", 0)), 
    exp_config_data.get("n_shot_prompt_guide_obj")
)

### RAG with ground truth

In [None]:
def rag_with_flotorch(vector_storage, questions: list[Question]):
    """
        Process a list of questions through the RAG pipeline.

        Args:
            vector_storage: Initialized vector storage 
            questions: List of Question objects to process

        Returns:
            List[Dict[str, Any]]: List of responses containing metadata and answers
            
        Raises:
            ValueError: If vector storage search fails
            RuntimeError: If text generation fails
    """
    responses_list = []
    for question in questions:
        try:
            inference_response = {}
            # fetch documents from bedrock knowledge bases
            question_chunk = question.get_chunk()
            response = vector_storage.search(question_chunk, int(exp_config_data.get("knn_num")))
            vector_response = response.to_json()['result']
            
            # rerank if selected
            if reranker:
                vector_response = reranker.rerank_documents(question_chunk.data, vector_response)
            
            # send for inferencing
            metadata, answer = inferencer.generate_text(question.question, vector_response)

            inference_response["metadata"] = metadata
            inference_response["answer"] = answer
            inference_response["question"] = question.question
            inference_response["retrieved_contexts"] = vector_response
            responses_list.append(inference_response)
        except Exception as e:
            # Log the error and continue with next question
            print(f"Error processing question: {question.question}. Error: {str(e)}")
            responses_list.append({
                "metadata": {"error": str(e)},
                "answer": "Failed to process question"
            })

    return responses_list

In [None]:
from flotorch_core.storage.db.vector.vector_storage_factory import VectorStorageFactory

rag_response_dict = {}
for bedrock_kb in bedrock_kbs:
    vector_storage = VectorStorageFactory.create_vector_storage(
        knowledge_base=True,
        use_bedrock_kb=True,
        embedding=None,
        knowledge_base_id=bedrock_kb,
        aws_region=exp_config_data.get("aws_region")
    )

    responses = rag_with_flotorch(vector_storage, questions)
    rag_response_dict[bedrock_kb] = responses

### Write the results to a JSON file

In [None]:
import json

filename = f"../results/rag_evaluation_responses.json"

# Save to JSON with proper formatting
with open(filename, 'w', encoding='utf-8') as f:
    json.dump(rag_response_dict, f, indent=4, ensure_ascii=False)