* 2.73 million of tokens from the llm
* 3.19K of llm requests
* 746.86K of tokens from the embeddings model
* 212 embeddings requests
* ~10 minutes for the creation of the index
* ~60 minutes for the execution of the RAG to get the Bot answers
* ~4 hours for the calculation of the RAGAS metrics

In [35]:
from llama_index.core.llama_dataset import download_llama_dataset
from dotenv import load_dotenv
from llama_index.llms.azure_openai import AzureOpenAI
from llama_index.embeddings.azure_openai import AzureOpenAIEmbedding
import os
from llama_index.core import VectorStoreIndex, Settings
from ragas.metrics import (
    Faithfulness,
    AnswerRelevancy,
    ContextPrecision,
    ContextRecall
)
from ragas.llms import LlamaIndexLLMWrapper
from ragas.embeddings import LlamaIndexEmbeddingsWrapper
from ragas.dataset_schema import SingleTurnSample, EvaluationDataset
from ragas.evaluation import evaluate
from ragas.run_config import RunConfig

In [11]:
load_dotenv()

True

In [20]:
rag_dataset, documents = download_llama_dataset(
    llama_dataset_class="DocugamiKgRagSec10Q", 
    download_dir="./data",
    show_progress=True
)

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:46<00:00,  2.31s/it]
Loading files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:59<00:00,  2.98s/file]


In [21]:
rag_dataset.to_pandas().sample(5)

Unnamed: 0,query,reference_contexts,reference_answer,reference_answer_by,query_by
81,How did Apple's inventory levels in the Q3 202...,,The Q3 2022 report does not provide specific c...,ai (gpt-4-turbo (with human review)),human
7,How does Apple's R&D expenditure in the most r...,,"In the most recent quarter ended July 1, 2023,...",ai (gpt-4-turbo (with human review)),human
49,What legal actions or potential liabilities ar...,,The quarterly reports reveal the following leg...,ai (gpt-4-turbo (with human review)),human
71,"For the latest quarter, what was the total rev...",,The total revenue generated from Apple's iPhon...,ai (gpt-4-turbo (with human review)),human
26,How has Intel's total net sales fluctuated ove...,,Intel's total net sales have fluctuated as fol...,ai (gpt-4-turbo (with human review)),human


In [22]:
embed_model = AzureOpenAIEmbedding(
    model='text-embedding-3-small',
    api_key=os.environ['OPENAI_API_KEY'],
    api_version=os.environ['OPENAI_API_VERSION'],
    azure_endpoint=os.environ['AZURE_OPENAI_ENDPOINT']
)

llm = AzureOpenAI(
    engine="gpt-4o", 
    model="gpt-4o", 
    temperature=0.0,
    api_key=os.environ['OPENAI_API_KEY'],
    api_version=os.environ['OPENAI_API_VERSION'],
    azure_endpoint=os.environ['AZURE_OPENAI_ENDPOINT']
)

In [23]:
Settings.embed_model = embed_model
Settings.llm = llm

In [24]:
index = VectorStoreIndex.from_documents(
    documents=documents,
    show_progress=True
)
query_engine = index.as_query_engine()

Parsing nodes:   0%|          | 0/1037 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1158 [00:00<?, ?it/s]

In [25]:
predictions = rag_dataset.make_predictions_with(
    predictor = query_engine,
    show_progress = True,
    batch_size = 20
)

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [05:10<00:00, 15.52s/it]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [06:13<00:00, 18.69s/it]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [06:06<00:00, 18.34s/it]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [06:01<00:00, 18.05s/it]
100%|███████████████████

In [41]:
list_of_samples = []

for idx in range(len(rag_dataset.examples)):
    list_of_samples.append(
        SingleTurnSample (
            user_input = rag_dataset.examples[idx].query,
            reference = rag_dataset.examples[idx].reference_answer,
            response = predictions.predictions[idx].response,
            retrieved_contexts = predictions.predictions[idx].contexts
        )
    )

ragas_evaluation_dataset = EvaluationDataset(list_of_samples)
ragas_evaluation_dataset.to_pandas().sample(5)

Unnamed: 0,user_input,retrieved_contexts,response,reference
0,How has Apple's total net sales changed over t...,[Products and Services Performance\nThe follow...,Apple's total net sales experienced a decrease...,"Based on the provided documents, Apple's total..."
1,What are the major factors contributing to the...,[Gross Margin\nProducts and Services gross mar...,The major factors contributing to the change i...,In the most recent 10-Q for the quarter ended ...
2,Has there been any significant change in Apple...,[Operating Expenses\nOperating expenses for th...,"Yes, there has been a significant change in Ap...","Yes, there has been a change in Apple's operat..."
3,How has Apple's revenue from iPhone sales fluc...,[Products and Services Performance\nThe follow...,Apple's iPhone sales revenue showed a slight i...,The revenue from iPhone sales for Apple has fl...
4,Can any trends be identified in Apple's Servic...,[Note 2 – Revenue\nNet sales disaggregated by ...,"Yes, a trend can be identified in Apple's Serv...","Based on the provided documents, there is a tr..."
...,...,...,...,...
190,"For Amazon's Q1 2023 10-Q, align the details o...",[Table of Contents\npayments of short-term deb...,"In Amazon's Q1 2023 10-Q, the financial statem...","In Amazon's Q1 2023 10-Q, the details of debt ..."
191,Analyze how Amazon's effective tax rate report...,[Table of Contents\nNote 7 — INCOME TAXES\nOur...,Amazon's effective tax rate for the most recen...,The effective tax rate for Amazon as reported ...
192,"From Amazon's Q3 2023 10-Q, how does the opera...",[Table of Contents\nOperating Expenses\nInform...,The operational expenses section provides deta...,The operational expenses section in Amazon's Q...
193,"In the latest 10-Q, how does the revenue from ...","[See Item 7 of Part II, “Management’s Discussi...",The provided context does not include specific...,The latest 10-Q does not provide specific info...


In [29]:
evaluator_llm = LlamaIndexLLMWrapper(llm)
evaluator_embeddings = LlamaIndexEmbeddingsWrapper(embed_model)

In [42]:
metrics = [
    Faithfulness(llm=evaluator_llm),
    ContextPrecision(llm=evaluator_llm),
    ContextRecall(llm=evaluator_llm)
]
ragas_evaluation_result = evaluate(
    dataset=ragas_evaluation_dataset,
    metrics=metrics,
    llm=evaluator_llm,
    embeddings=evaluator_embeddings,
    run_config=RunConfig(timeout=1800, max_wait=180, max_retries=20),
    show_progress=True,
    batch_size=20
)

Evaluating:   0%|          | 0/585 [00:00<?, ?it/s]

Batch 1/30:   0%|          | 0/20 [00:00<?, ?it/s]

In [45]:
ragas_evaluation_result.to_pandas().sample(5)

Unnamed: 0,user_input,retrieved_contexts,response,reference,faithfulness,context_precision,context_recall
100,What were the main contributors to NVIDIA's ne...,[Second Quarter of Fiscal Year 2024 Summary\nT...,"In the second quarter of 2023, the main contri...",The main contributors to NVIDIA's net income i...,0.777778,1.0,0.285714
119,What were the cash flow from operations figure...,[Table of Contents\nA Quarter in Review\nTotal...,The cash flow from operations for Intel as per...,The cash flow from operations figure for Intel...,1.0,0.5,0.5
16,Outline the risk elements associated with Micr...,[PART IIItem 1A\n \nBesides software developme...,The risk elements associated with Microsoft's ...,The risk elements associated with Microsoft's ...,1.0,0.0,0.0
78,What changes in debt structure or interest exp...,[Note 5 – Income Taxes\nEuropean Commission St...,The context does not provide specific details ...,Apple disclosed in its Q1 2023 10-Q that as of...,1.0,0.0,0.0
152,"In Microsoft's Q1 2023 10-Q, what relationship...",[PART IItem 2\n \n \nOperating income increase...,"In Microsoft's Q1 2023 10-Q, the research and ...",In the provided context of Microsoft's Q1 2023...,0.833333,0.0,1.0


In [46]:
ragas_evaluation_result.to_pandas().to_json('./test-dataset.json', orient='records', indent=4)