<h1 align="center">
    <img 
        src="./img/Microsoft-Logo.png" 
        width="400"/>
</h1>
<h1 align="center">
    <b>Practical Guide</b>
</h1>
<h4 align="center">
    for the creation of an AI Solution using an accelerator from the <a href="https://www.ds-toolkit.com/">Data Science Toolkit</a>
</h4>

# What to expect

* **Challenge 1:** *Create your own AI solution*
* **Challenge 2:** *Evaluate the quality of the AI solution*
* **Challenge 3:** *Create explanations to have insights on how to improve the quality of the AI solution*

# Challenge 1: *Create your own AI solution*

Here we are going to create a RAG based Copilot to answer questions about 6 sustainability reports of Microsoft, Apple, Amazon, Google, Meta and Netflix from 2022 or 2021. The documents are part of the  [Mini Esg Bench Dataset](https://llamahub.ai/l/llama_datasets/Mini%20ESG%20Bench%20Dataset?from=llama_datasets).

## Challenge 1 - Step 1:  Let's, first, install the required packages and libraries.

> This process will take around **1 minute** to complete. It is going to be done in a quite mode, only errors will be displayed if they occur. If you like to see what is going to be installed look at the [requirements.txt](./requirements.txt) file.

In summary two main tools will be installed that will be used in this notebook:

* **Llama Index**. Which will be used to download the dataset and to create the Semantic Index.
  > It is also possible to use **Azure AI Search** to create the semantic index, but since it is going to be an small index, just to simplify its creation in-memory, we are going to use Llama Index in this example.
* **RAGAS**. Ragas is going to be used to calculate the quality metrics for the Copilot that we are going to create.

In [1]:
%%time
!pip install -q -q -r requirements.txt

CPU times: user 47.9 ms, sys: 15 ms, total: 62.9 ms
Wall time: 3.14 s


## Challenge 1 - Step 2: *Let's import the libraries to be used in this notebook*

In [2]:
import os
import pandas as pd

# To create the RAG based copilot
from llama_index.core.llama_dataset import download_llama_dataset, LabelledRagDataset
from llama_index.llms.azure_openai import AzureOpenAI
from llama_index.embeddings.azure_openai import AzureOpenAIEmbedding
from llama_index.core import VectorStoreIndex, Settings
from ragas.metrics import (
    Faithfulness,
    ContextPrecision,
    ContextRecall
)

# To calculate the Generative AI quality metrics
from ragas.llms import LlamaIndexLLMWrapper
from ragas.embeddings import LlamaIndexEmbeddingsWrapper
from ragas.dataset_schema import SingleTurnSample, EvaluationDataset
from ragas.evaluation import evaluate
from ragas.run_config import RunConfig

# To create a simple PDF visualization tool
import pymupdf
import matplotlib.pyplot as plt
import ipywidgets as widgets
from IPython.display import display, Markdown, clear_output


## Challenge 1 - Step 3: *Let's download the documents and test questions*

> This step takes around **1 minute** to complete depending on the internet connection.

In [3]:
rag_dataset, documents = download_llama_dataset(
    llama_dataset_class="MiniEsgBenchDataset", 
    download_dir="./data",
    show_progress=True
)

100%|██████████| 6/6 [00:04<00:00,  1.34it/s]
Loading files: 100%|██████████| 6/6 [00:48<00:00,  8.16s/file]


### Little tool to visualize the reports just downloaded

The following cell creates a simple tool to quickly visualize the PDF files just downloaded. If you like to take a look at the downloaded reports manually, just navigate to the `data/source_files` folder.

In [4]:
data_source_path = "./data/source_files"

# List of pdf files just downloaded
pdf_files = [os.path.join(data_source_path, file_name) for file_name in os.listdir(data_source_path)]

# Function to render a specific page of a PDF
def render_pdf_page(pdf_path, page_number=0):
    # Open the PDF file
    pdf_document = pymupdf.open(pdf_path)
    
    # Ensure the page number is valid
    if page_number < 0 or page_number >= len(pdf_document):
        raise ValueError("Invalid page number.")
    
    # Get the page and render it as an image
    page = pdf_document[page_number]
    pix = page.get_pixmap()
    pdf_document.close()
    
    # Display the image using Matplotlib
    plt.figure(figsize=(10, 8))
    plt.imshow(pix.pil_image())
    plt.axis("off")
    plt.show()

# Function to update the displayed page
def update_page(step):
    global current_page
    pdf_document = pymupdf.open(dropdown.value)
    total_pages = len(pdf_document)
    pdf_document.close()
    
    # Update the current page index
    current_page += step
    if current_page < 0:
        current_page = 0
    elif current_page >= total_pages:
        current_page = total_pages - 1
    
    with output:
        output.clear_output()
        render_pdf_page(dropdown.value, current_page)

# Function to reset the viewer when a new PDF is selected
def reset_viewer(change):
    global current_page
    current_page = 0  # Reset to the first page
    with output:
        output.clear_output()
        render_pdf_page(dropdown.value, current_page)

# Create widgets
dropdown = widgets.Dropdown(
    options=pdf_files,
    description="Select PDF:",
    style={"description_width": "initial"}
)

prev_button = widgets.Button(description="Previous Page")
next_button = widgets.Button(description="Next Page")
output = widgets.Output()

# Attach event listeners
prev_button.on_click(lambda _: update_page(-1))
next_button.on_click(lambda _: update_page(1))
dropdown.observe(reset_viewer, names="value")

# Initial display
reset_viewer(None)

# Display widgets and output
display(widgets.VBox([dropdown, widgets.HBox([prev_button, next_button]), output]))

VBox(children=(Dropdown(description='Select PDF:', options=('./data/source_files/Meta-2021-Sustainability-Repo…

## Challenge 1 - Step 4: *Let's create the semantic index*

> This process can take up to **3 minutes** to complete
> 
> **TODO:** Explain how the Semantic index works

In [5]:
embed_model = AzureOpenAIEmbedding(
    model='text-embedding-3-small', # Update with the embeddings deployment name
    api_key=os.environ['OPENAI_API_KEY'],
    api_version=os.environ['OPENAI_API_VERSION'],
    azure_endpoint=os.environ['AZURE_OPENAI_ENDPOINT']
)

Settings.embed_model = embed_model

index = VectorStoreIndex.from_documents(
    documents=documents,
    show_progress=True
)


Parsing nodes:   0%|          | 0/455 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/512 [00:00<?, ?it/s]

## Challenge 1 - Step 5: *Let's create the copilot*

> **TODO:** Explain how the copilot is created 

In [6]:
llm = AzureOpenAI(
    engine="gpt-4o", # Update with the language model deployment name 
    model="gpt-4o", # Update with the language model name
    temperature=0.0,
    api_key=os.environ['OPENAI_API_KEY'],
    api_version=os.environ['OPENAI_API_VERSION'],
    azure_endpoint=os.environ['AZURE_OPENAI_ENDPOINT']
)

Settings.llm = llm

query_engine = index.as_query_engine()

### Let's play with the copilot just created

In the test dataset, we have not only downloaded the source data (PDFs), but also 50 example questions to ask to the copilot. The following are some examples of the questions:

In [7]:
num_samples = 3
samples = rag_dataset.to_pandas().loc[:num_samples-1, 'query'].values

display(Markdown("\n".join([ "* " + sample for sample in samples])))

* Can you provide for me the three highlights for the GHG emissions section of the Advancing Carbon-Free Energy Performance Highlights?
* What percentage of waste from Google's offices globally were diverted away from landfills in 2021?
* Can you present me with the performance highlights for Empowering Users With Technology?

> The following is a simple tool that allows to ask questions to the new copilot

In [10]:
# Function to get a response
def get_response(user_input):
    return query_engine.query(user_input).response

# Interactive chatbot function
def chatbot():
    output = widgets.Output()
    text_box = widgets.Text(
        placeholder="Type your message here",
        description="You:",
        style={'description_width': 'initial'}
    )
    submit_button = widgets.Button(description="Send")
    
    # Function to handle submission
    def on_submit(_):
        with output:
            clear_output(wait=True)
            user_message = text_box.value
            if user_message.strip():  # Process non-empty input
                bot_response = get_response(user_message)
                print(f"You: {user_message}")
                print(f"ESG Bot: {bot_response}")
            text_box.value = ""  # Clear the text box after submission
    
    # Link submission to pressing Enter in the text box
    def on_enter(change):
        if change["name"] == "value" and change["new"] == "":
            on_submit(None)
    
    # Attach event handlers
    text_box.observe(on_enter, names="value")
    submit_button.on_click(on_submit)
    
    display(widgets.VBox([text_box, submit_button, output]))

# Run the chatbot
chatbot()

VBox(children=(Text(value='', description='You:', placeholder='Type your message here', style=TextStyle(descri…

# Challenge 2: *Evaluation of the quality of our new Copilot*

**TODO:** Add description of how to measure the quality of a GenAI solution 

## Challenge 2 - Step 1: *Let's take a look at the test dataset*

First, let's see an example of each instance of the test dataset that we downloaded together with the PDF files.

In [11]:
instance_idx = 0

def create_instance_md(k, v):
    md_str = f"**{k}:**\n\n"
    if k == "reference_contexts":
        return md_str + "\n".join([f"* {c}\n" for c in v])
    return md_str + f"{v}"

display(Markdown("\n\n".join([create_instance_md(k, v) for k,v in rag_dataset.to_pandas().iloc[instance_idx].items()])))

**query:**

Can you provide for me the three highlights for the GHG emissions section of the Advancing Carbon-Free Energy Performance Highlights?

**reference_contexts:**

* GHG emissions
65%
cumulative GHG
emissions reduction
From 2011 to 2021, our renewable energy purchasing resulted in a cumulative 65% reduction in our Scope 1 and Scope 2 emissions, as compared with a business-as-usual scenario in which we didn’t procure renewable energy via PPAs.
81%
decrease
in carbon intensity
From 2011 to 2021, our carbon intensity per unit of revenue
decreased by 81%.
15 years
of carbon neutrality
Google has been carbon neutral
for our operations since 2007. Because of our purchases
of renewable energy and
procurement of high-quality
carbon credits, we have compensated for all our
operational GHG emissions.


**reference_answer:**

Sure, they are: 
1. 65% cumulative GHG emissions reduction
2. 81% decrease in carbon intensity
3. 15 years of carbon neutrality

**reference_answer_by:**

human

**query_by:**

human

### Full dataset
In total, the test dataset have 50 instances like the one detailed before, Let's take a look at the full dataset

In [30]:
rag_dataset.to_pandas()

Unnamed: 0,query,reference_contexts,reference_answer,reference_answer_by,query_by
0,Can you provide for me the three highlights fo...,[GHG emissions\n65%\ncumulative GHG\nemissions...,"Sure, they are: \n1. 65% cumulative GHG emissi...",human,human
1,What percentage of waste from Google's offices...,"[64%\nlandfill diversion\nIn 2021, we reached ...",Sixty-four percent.,human,human
2,Can you present me with the performance highli...,[EMPOWERING USERS WITH TECHNOLOGY\nProducts\nT...,Sure! The Performance Highlights for Empowerin...,human,human
3,What was the listed key achievement regarding ...,[We’ve been a leader on sustainability and cli...,"In 2017, Google became the first major company...",human,human
4,Did Google reach its intended Waste target und...,[Target: Achieve UL 2799 Zero Waste to Landfil...,"No, this target has not been met in 2021. Howe...",human,human
5,How many EV charging locations were there on G...,"[200,000\nEV charging locations\non Google Map...",200000,human,human
6,On what page of the report can I find the perf...,[EMPOWERING USERS WITH TECHNOLOGY\nProducts\nT...,The performance highlights for Empowering User...,human,human
7,Can you please provide for me the glossary of ...,[Glossary\nCFE: carbon-free energyCO2e: carbon...,"Sure, here is the glossary:\nGlossary\nCFE: ca...",human,human
8,On what page can I find details about Amazons ...,[Contents\nIntroduction\n2 About Amazon\n3 Ope...,You can find information on driving climate so...,human,human
9,"For the listed Renewable Energy goals, by when...",[Renewable Energy\nGoal: Power our operations ...,Amazon set the goal of becoming powered by 100...,human,human


## Challenge 2 - Step 2: *Let's calculate the responses*

Now, let's calculate the **responses** using our new copilot for each of the elements in the test dataset. Also, to be able to calculate the quality metrics, we need to keep the **retrieved contexts**.  The retrieved context are the chunks of data retrieved from the semantic index to be used to answer the question.

> This process could take hours depending on the thoughtput of the LLM used. Just to keep the practical guide in a reasonable time, let's calculate the **responses** and **retrieved contexts** for a small sample of questions. We have the full dataset already pre-calculated.

In [12]:
sample_size = 5
sub_dataset = LabelledRagDataset(examples=rag_dataset.examples[:sample_size])
sub_dataset.to_pandas()

Unnamed: 0,query,reference_contexts,reference_answer,reference_answer_by,query_by
0,Can you provide for me the three highlights fo...,[GHG emissions\n65%\ncumulative GHG\nemissions...,"Sure, they are: \n1. 65% cumulative GHG emissi...",human,human
1,What percentage of waste from Google's offices...,"[64%\nlandfill diversion\nIn 2021, we reached ...",Sixty-four percent.,human,human
2,Can you present me with the performance highli...,[EMPOWERING USERS WITH TECHNOLOGY\nProducts\nT...,Sure! The Performance Highlights for Empowerin...,human,human
3,What was the listed key achievement regarding ...,[We’ve been a leader on sustainability and cli...,"In 2017, Google became the first major company...",human,human
4,Did Google reach its intended Waste target und...,[Target: Achieve UL 2799 Zero Waste to Landfil...,"No, this target has not been met in 2021. Howe...",human,human


Now, let's calculate the responses for such **sub dataset**:

In [13]:
%%time
predictions = sub_dataset.make_predictions_with(
    predictor = query_engine,
    show_progress = True
)

100%|██████████| 5/5 [01:01<00:00, 12.31s/it]

CPU times: user 336 ms, sys: 8.07 ms, total: 344 ms
Wall time: 1min 1s





Let's look at the results:

> Here, since we are already preparing the data for the next challenge the column names changed:
> * `user_input` is the same **query** in the previous format.
> * `retrieved_contexts` is a list that includes the chucks of data used by our copilot to answer the question.
> * `response` is the response to the question created by our new copilot.
> * `reference` is the **reference answer** from our previous format and refers to the expected answer created by a human.

In [14]:
list_of_samples = []

for idx in range(len(sub_dataset.examples)):
    list_of_samples.append(
        SingleTurnSample (
            user_input = sub_dataset.examples[idx].query,
            reference = sub_dataset.examples[idx].reference_answer,
            response = predictions.predictions[idx].response,
            retrieved_contexts = predictions.predictions[idx].contexts
        )
    )

ragas_evaluation_dataset = EvaluationDataset(list_of_samples)
ragas_evaluation_dataset.to_pandas()

Unnamed: 0,user_input,retrieved_contexts,response,reference
0,Can you provide for me the three highlights fo...,"[31. In 2018, to align with industry best prac...",The three highlights for the GHG emissions sec...,"Sure, they are: \n1. 65% cumulative GHG emissi..."
1,What percentage of waste from Google's offices...,[Performance highlights\nThe following section...,"In 2021, 78% of waste from Google's global dat...",Sixty-four percent.
2,Can you present me with the performance highli...,"[Education\nFor more than 40 years, we’ve work...",The performance highlights for empowering user...,Sure! The Performance Highlights for Empowerin...
3,What was the listed key achievement regarding ...,[Our approach\nWe believe that every business ...,There is no listed key achievement for Google ...,"In 2017, Google became the first major company..."
4,Did Google reach its intended Waste target und...,[BUILDING BETTER DEVICES AND SERVICES\nTarget ...,"Yes, in 2021, Google achieved the UL 2799 Zero...","No, this target has not been met in 2021. Howe..."


# Challenge 3: *Evaluate the quality of our new copilot*

**TODO:** Explain how to evaluate the quality of a GenAI solution with a summary of the most used metrics

## Challenge 3 - Step 1: *Initialize LLM and Embeddings models*

The first step is to initialize the LLM and Embeddings models to be used to calculate the GenAI metrics:

In [15]:
evaluator_llm = LlamaIndexLLMWrapper(llm)
evaluator_embeddings = LlamaIndexEmbeddingsWrapper(embed_model)

## Challenge 3 - Step 2: *Calculate the GenAI metrics*

The process to calculate the GenAI metrics for all the questions in the test dataset could take hours depending on the throughtput of the LLM and Embeddings model used. To keep the time of this guide we are going to calculate only 3 metrics for the sub dataset created before.

> This process can take up to **5 minutes** to complete

In [16]:
%%time

metrics = [
    Faithfulness(llm=evaluator_llm),
    ContextPrecision(llm=evaluator_llm),
    ContextRecall(llm=evaluator_llm)
]
ragas_evaluation_result = evaluate(
    dataset=ragas_evaluation_dataset,
    metrics=metrics,
    llm=evaluator_llm,
    embeddings=evaluator_embeddings,
    run_config=RunConfig(timeout=1800, max_wait=180, max_retries=20),
    show_progress=True,
    batch_size=5
)

Evaluating:   0%|          | 0/15 [00:00<?, ?it/s]

Batch 1/3:   0%|          | 0/5 [00:00<?, ?it/s]

CPU times: user 544 ms, sys: 48 ms, total: 592 ms
Wall time: 5min 12s


The following are the results of the calculation of the GenAI metrics

In [17]:
df_ragas_result = ragas_evaluation_result.to_pandas()
df_ragas_result

Unnamed: 0,user_input,retrieved_contexts,response,reference,faithfulness,context_precision,context_recall
0,Can you provide for me the three highlights fo...,"[31. In 2018, to align with industry best prac...",The three highlights for the GHG emissions sec...,"Sure, they are: \n1. 65% cumulative GHG emissi...",1.0,0.0,0.0
1,What percentage of waste from Google's offices...,[Performance highlights\nThe following section...,"In 2021, 78% of waste from Google's global dat...",Sixty-four percent.,1.0,0.0,0.0
2,Can you present me with the performance highli...,"[Education\nFor more than 40 years, we’ve work...",The performance highlights for empowering user...,Sure! The Performance Highlights for Empowerin...,1.0,0.0,0.0
3,What was the listed key achievement regarding ...,[Our approach\nWe believe that every business ...,There is no listed key achievement for Google ...,"In 2017, Google became the first major company...",1.0,1.0,1.0
4,Did Google reach its intended Waste target und...,[BUILDING BETTER DEVICES AND SERVICES\nTarget ...,"Yes, in 2021, Google achieved the UL 2799 Zero...","No, this target has not been met in 2021. Howe...",0.666667,1.0,0.5


### Full dataset results

Let's take a look at the full table with all the responses, retrieved contexts and metrics already calculated:

In [19]:
df_test_dataset = pd.read_json('./test-dataset.json', orient='records')
df_test_dataset

Unnamed: 0,user_input,retrieved_contexts,response,reference,faithfulness,context_precision,context_recall
0,Can you provide for me the three highlights fo...,"[31. In 2018, to align with industry best prac...",The three highlights for the GHG emissions sec...,"Sure, they are: \n1. 65% cumulative GHG emissi...",1.0,0.0,0.0
1,What percentage of waste from Google's offices...,[Performance highlights\nThe following section...,"In 2021, 78% of waste from Google's global dat...",Sixty-four percent.,1.0,0.0,0.0
2,Can you present me with the performance highli...,"[Education\nFor more than 40 years, we’ve work...",The performance highlights for empowering user...,Sure! The Performance Highlights for Empowerin...,1.0,0.0,0.0
3,What was the listed key achievement regarding ...,[Our approach\nWe believe that every business ...,There is no listed key achievement for Google ...,"In 2017, Google became the first major company...",1.0,1.0,1.0
4,Did Google reach its intended Waste target und...,[BUILDING BETTER DEVICES AND SERVICES\nTarget ...,"Yes, in 2021, Google achieved the UL 2799 Zero...","No, this target has not been met in 2021. Howe...",0.666667,1.0,1.0
5,How many EV charging locations were there on G...,[This guidance does not recognize existing ren...,The provided context does not specify the numb...,200000,1.0,0.0,0.0
6,On what page of the report can I find the perf...,"[Employee Recruitment, Inclusion and Performan...",The performance highlights for the Empowering ...,The performance highlights for Empowering User...,0.0,0.0,0.0
7,Can you please provide for me the glossary of ...,[GRI INDEX\nGRI 304 - Biodiversity\nGRI 103 Ma...,I'm unable to provide the glossary of the docu...,"Sure, here is the glossary:\nGlossary\nCFE: ca...",0.5,0.0,0.0
8,On what page can I find details about Amazons ...,[IntroductionSustainability\nDriving Climate S...,You can find details about Amazon's climate so...,You can find information on driving climate so...,0.0,0.0,0.0
9,"For the listed Renewable Energy goals, by when...",[IntroductionSustainability\nDriving Climate S...,Amazon intends to have all operations powered ...,Amazon set the goal of becoming powered by 100...,1.0,1.0,0.0
