# Notebook Overview

Up to this point, previous labs have guided you through the process of building knowledge bases, constructing RAG pipelines with various configurations, posing questions, and generating answers. However, a crucial question remains: how do we determine the optimal pipeline configuration? How do we objectively evaluate their performance? This is the focus of the Lab 4 notebooks, which will walk you through the evaluation process.

This notebook sets up the prerequisites for evaluating Retrieval-Augmented Generation (RAG) pipelines using FloTorch. It involves installing the FloTorch core library, loading necessary variables from a previous lab, uploading the ground truth dataset to Amazon S3, and creating a results directory for storing output files.

# Introduction to FloTorch

#### What is FloTorch?

FloTorch is an open-source tool designed to streamline the optimization of Generative AI workloads on AWS. With just a few clicks, it enables rapid evaluation of Retrieval-Augmented Generation (RAG) pipelines, focusing on key metrics like accuracy, cost, and latency.

#### FloTorch Demo
[![FloTorch Demo](https://img.youtube.com/vi/00000000000/0.jpg)](https://flotorch-public.s3.us-east-1.amazonaws.com/media/FloTorch-Demo.mp4)

#### Getting Started with FloTorch

There are two primary ways to experience FloTorch:

1.  **AWS Marketplace or Git Repository (Full Application):** For a comprehensive experience with a user-friendly interface, you can install FloTorch either through the AWS Marketplace: [https://aws.amazon.com/marketplace/pp/prodview-z5zcvloh7l3ky] or by cloning the git repository: [https://github.com/FissionAI/FloTorch](https://github.com/FissionAI/FloTorch). This method sets up the complete FloTorch stack within your AWS account, providing an intuitive UI for fine-tuning RAG hyperparameters and running experiments at scale.

2.  **Python Package (Programmatic Access):** If you prefer a programmatic approach, we offer the `flotorch-core` Python package. This package includes a range of functionalities:
    1.  Utilities for reading PDF files.
    2.  Tools for chunking, embedding, and indexing data into VectorStorage.
    3.  Functions for performing inference using Amazon Bedrock and SageMaker.
    4.  Capabilities for evaluation leveraging the RAGAS framework.

# Prerequisites

* **Install `flotorch-core`:** Ensure the FloTorch core Python package is installed in your environment.
* **Ground Truth Data:** The JSON file containing questions and their corresponding ground truth answers. This data is crucial for evaluating the accuracy of your RAG pipelines. Located in the `data/ground_truth_data` folder.
* **Prompt File:** The `prompt.json` file, which defines the prompts used for querying your knowledge bases. Located in the `data/prompt.json` folder.
* **Results Folder:** Create a `results` folder in your working directory to store the output files generated during the evaluation process.

**Important Note:** This lab (Lab 4) builds upon the Knowledge Bases created in Lab 1. Specifically, it expects that you have run the following notebooks from Lab 1:

* `1.1 Prerequisites.ipynb`
* `1.2 Knowledge Base with Fixed Chunking.ipynb`
* `1.3 Knowledge base with Semantic Chunking.ipynb`

# Setup and Installation

In [None]:
# Suppress Warnings
import warnings
warnings.filterwarnings("ignore")

In [None]:
# Install FloTorch Core Package
print("Installing flotorch-core...")
!pip install FloTorch-core
print("flotorch-core installed successfully!")

In [None]:
# Load Variables from Previous Lab
import json
with open("../Lab 1/variables.json", "r") as f:
    variables = json.load(f)

# Display Loaded Variables
print(variables)

#### Upload Ground Truth Dataset

In [None]:
# Import Necessary Libraries
import os
import boto3

# Initialize S3 Client
s3 = boto3.client("s3", region_name=variables["regionName"])

# Define Function to Upload Directory to S3
def upload_directory(path, bucket_name, data_s3_prefix):
    """Uploads all files from a local directory to a specified S3 bucket and prefix."""
    for root, dirs, files in os.walk(path):
        for file in files:
            local_path = os.path.join(root, file)
            relative_path = os.path.relpath(local_path, path)
            s3_key = f"{data_s3_prefix}/{relative_path}"  # Construct the full S3 object key
            try:
                s3.upload_file(local_path, bucket_name, s3_key)
                print(f"Uploaded {local_path} to s3://{bucket_name}/{s3_key}")
            except Exception as e:
                print(f"Error uploading {local_path}: {e}")

In [None]:
# Define Paths and Upload Ground Truth Data
ground_truth_data_path = "./data/ground_truth_data"
s3_key_prefix = "ground_truth_data"
s3_bucket_name = variables["s3Bucket"]

# Upload the entire ground_truth_data directory
upload_directory(ground_truth_data_path, s3_bucket_name, s3_key_prefix)

# Construct the S3 path to the ground truth file
ground_truth_path = f"s3://{s3_bucket_name}/{s3_key_prefix}/ground_truth.json"

# Store the S3 path in the variables dictionary
variables["s3_ground_truth_path"] = ground_truth_path

# Print the S3 path to confirm
print(f"Ground truth data uploaded to: {ground_truth_path}")

#### Create Results Folder (if it doesn't exist)

In [None]:
# Create Results Directory
import os

# Define the path to the results folder
results_dir = "./results"

# Check if the directory exists, and create it if it doesn't
if not os.path.exists(results_dir):
    os.makedirs(results_dir)
    print(f"Created directory: {results_dir}")
else:
    print(f"Directory already exists: {results_dir}")

### Export Variables for the Next Lab

> **Note**: We are saving all the important configuration variables to a JSON file. This allows easy access to these variables in subsequent notebooks, ensuring consistency and avoiding the need to recreate resources in each lab of the workshop.

In [None]:
# Export Variables to JSON File
import json

# Define the output file name
output_file = "variables.json"

# Write the variables dictionary to a JSON file with indentation for readability
with open(output_file, "w") as f:
    json.dump(variables, f, indent=4)

# Print a confirmation message
print(f"Variables saved to {output_file}")