`# TODO:`

* Documentation of this notebook
* Create another example with a different dataset

* 2.73 million of tokens from the llm
* 3.19K of llm requests
* 746.86K of tokens from the embeddings model
* 212 embeddings requests
* ~10 minutes for the creation of the index
* ~60 minutes for the execution of the RAG to get the Bot answers
* ~4 hours for the calculation of the RAGAS metrics

In [1]:
from llama_index.core.llama_dataset import download_llama_dataset
from dotenv import load_dotenv
from llama_index.llms.azure_openai import AzureOpenAI
from llama_index.embeddings.azure_openai import AzureOpenAIEmbedding
import os
from llama_index.core import VectorStoreIndex, Settings
from ragas.metrics import (
    Faithfulness,
    AnswerRelevancy,
    ContextPrecision,
    ContextRecall
)
from ragas.llms import LlamaIndexLLMWrapper
from ragas.embeddings import LlamaIndexEmbeddingsWrapper
from ragas.dataset_schema import SingleTurnSample, EvaluationDataset
from ragas.evaluation import evaluate
from ragas.run_config import RunConfig

In [2]:
load_dotenv()

True

In [3]:
rag_dataset, documents = download_llama_dataset(
    llama_dataset_class="PaulGrahamEssayDataset", 
    download_dir="./data",
    show_progress=True
)

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.87it/s]
Loading files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  5.45file/s]


In [8]:
rag_dataset.to_pandas()

Unnamed: 0,query,reference_contexts,reference_answer,reference_answer_by,query_by
0,"In the essay, the author mentions his early ex...",[What I Worked On\n\nFebruary 2021\n\nBefore c...,The first computer the author used for program...,ai (gpt-4),ai (gpt-4)
1,The author switched his major from philosophy ...,[What I Worked On\n\nFebruary 2021\n\nBefore c...,The two specific influences that led the autho...,ai (gpt-4),ai (gpt-4)
2,"In the essay, the author discusses his initial...",[I couldn't have put this into words when I wa...,The two main influences that initially drew th...,ai (gpt-4),ai (gpt-4)
3,The author mentions his shift of interest towa...,[I couldn't have put this into words when I wa...,The author shifted his interest towards Lisp a...,ai (gpt-4),ai (gpt-4)
4,"In the essay, the author mentions his interest...",[So I looked around to see what I could salvag...,"The author in the essay is Paul Graham, who wa...",ai (gpt-4),ai (gpt-4)
5,The author discusses his decision to write a b...,[So I looked around to see what I could salvag...,The author decided to write a book on Lisp hac...,ai (gpt-4),ai (gpt-4)
6,"In the essay, the author mentions a quick deci...","[I didn't want to drop out of grad school, but...",The author decided to attempt writing his diss...,ai (gpt-4),ai (gpt-4)
7,The author describes the atmosphere and practi...,"[I didn't want to drop out of grad school, but...","According to the author's account, the student...",ai (gpt-4),ai (gpt-4)
8,"In the essay, the author discusses his experie...","[We actually had one of those little stoves, f...","In the essay, the author explains that paintin...",ai (gpt-4),ai (gpt-4)
9,The author shares his work experience at a com...,"[We actually had one of those little stoves, f...","Interleaf, the company where the author worked...",ai (gpt-4),ai (gpt-4)


In [5]:
embed_model = AzureOpenAIEmbedding(
    model='text-embedding-3-small',
    api_key=os.environ['OPENAI_API_KEY'],
    api_version=os.environ['OPENAI_API_VERSION'],
    azure_endpoint=os.environ['AZURE_OPENAI_ENDPOINT']
)

llm = AzureOpenAI(
    engine="gpt-4o", 
    model="gpt-4o", 
    temperature=0.0,
    api_key=os.environ['OPENAI_API_KEY'],
    api_version=os.environ['OPENAI_API_VERSION'],
    azure_endpoint=os.environ['AZURE_OPENAI_ENDPOINT']
)

In [6]:
Settings.embed_model = embed_model
Settings.llm = llm

In [7]:
index = VectorStoreIndex.from_documents(
    documents=documents,
    show_progress=True
)
query_engine = index.as_query_engine()

Parsing nodes:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/22 [00:00<?, ?it/s]

In [9]:
predictions = rag_dataset.make_predictions_with(
    predictor = query_engine,
    show_progress = True,
    batch_size = 20
)

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [08:24<00:00, 25.24s/it]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [5:01:21<00:00, 904.08s/it]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [43:48<00:00, 657.25s/it]


In [10]:
list_of_samples = []

for idx in range(len(rag_dataset.examples)):
    list_of_samples.append(
        SingleTurnSample (
            user_input = rag_dataset.examples[idx].query,
            reference = rag_dataset.examples[idx].reference_answer,
            response = predictions.predictions[idx].response,
            retrieved_contexts = predictions.predictions[idx].contexts
        )
    )

ragas_evaluation_dataset = EvaluationDataset(list_of_samples)
ragas_evaluation_dataset.to_pandas().sample(5)

Unnamed: 0,user_input,retrieved_contexts,response,reference
0,"In the essay, the author mentions his early ex...",[What I Worked On\n\nFebruary 2021\n\nBefore c...,The first computer the author used for program...,The first computer the author used for program...
42,"In the essay by Paul Graham, he discusses the ...",[[17] Another problem with HN was a bizarre ed...,"In the essay, Paul Graham uses the example of ...","In the essay, Paul Graham uses the concept of ..."
18,"In the context of the essay, how did the autho...",[I kept the code tight and didn't have to inte...,The author's lack of business knowledge inadve...,The author's lack of business knowledge inadve...
41,Paul Graham uses the example of advanced alien...,[[17] Another problem with HN was a bizarre ed...,Paul Graham uses the analogy of advanced alien...,Paul Graham uses the analogy of advanced alien...
26,"In the essay by Paul Graham, he discusses his ...",[One of the most conspicuous patterns I've not...,"In the essay, several key events and individua...","In Paul Graham's essay, several key events and..."


In [11]:
evaluator_llm = LlamaIndexLLMWrapper(llm)
evaluator_embeddings = LlamaIndexEmbeddingsWrapper(embed_model)

In [12]:
metrics = [
    Faithfulness(llm=evaluator_llm),
    ContextPrecision(llm=evaluator_llm),
    ContextRecall(llm=evaluator_llm)
]
ragas_evaluation_result = evaluate(
    dataset=ragas_evaluation_dataset,
    metrics=metrics,
    llm=evaluator_llm,
    embeddings=evaluator_embeddings,
    run_config=RunConfig(timeout=1800, max_wait=180, max_retries=20),
    show_progress=True,
    batch_size=20
)

Evaluating:   0%|          | 0/132 [00:00<?, ?it/s]

Batch 1/7:   0%|          | 0/20 [00:00<?, ?it/s]

Exception raised in Job[101]: AttributeError('StringIO' object has no attribute 'classifications')


In [22]:
df_ragas_result = ragas_evaluation_result.to_pandas()

# Removing NULL values
df_ragas_result = df_ragas_result[(
    ~df_ragas_result['faithfulness'].isnull()
)&(
    ~df_ragas_result['context_precision'].isnull()
)&(
    ~df_ragas_result['context_recall'].isnull()
)].reset_index(drop=True)

df_ragas_result

Unnamed: 0,user_input,retrieved_contexts,response,reference,faithfulness,context_precision,context_recall
0,"In the essay, the author mentions his early ex...",[What I Worked On\n\nFebruary 2021\n\nBefore c...,The first computer the author used for program...,The first computer the author used for program...,1.0,1.0,1.0
1,The author switched his major from philosophy ...,[What I Worked On\n\nFebruary 2021\n\nBefore c...,The author developed an interest in AI due to ...,The two specific influences that led the autho...,1.0,1.0,1.0
2,"In the essay, the author discusses his initial...",[All that seemed left for philosophy were edge...,The author's initial interest in AI was influe...,The two main influences that initially drew th...,1.0,1.0,1.0
3,The author mentions his shift of interest towa...,[All that seemed left for philosophy were edge...,The author shifted his interest towards Lisp b...,The author shifted his interest towards Lisp a...,1.0,1.0,1.0
4,"In the essay, the author mentions his interest...","[Its brokenness did, as so often happens, gene...","During his time in grad school, the author att...","The author in the essay is Paul Graham, who wa...",0.642857,1.0,0.866667
5,The author discusses his decision to write a b...,[All that seemed left for philosophy were edge...,The author decided to write a book on Lisp hac...,The author decided to write a book on Lisp hac...,0.615385,1.0,1.0
6,"In the essay, the author mentions a quick deci...","[Its brokenness did, as so often happens, gene...",The author made a quick decision to attempt to...,The author decided to attempt writing his diss...,1.0,0.5,1.0
7,The author describes the atmosphere and practi...,[If he even knew about the strange classes I w...,The author describes the atmosphere at the Acc...,"According to the author's account, the student...",1.0,1.0,1.0
8,"In the essay, the author discusses his experie...",[The students and faculty in the painting depa...,The author describes painting still lives as d...,"In the essay, the author explains that paintin...",0.916667,1.0,1.0
9,The author shares his work experience at a com...,[The students and faculty in the painting depa...,Interleaf had added a unique feature to their ...,"Interleaf, the company where the author worked...",0.5,0.0,0.571429


In [23]:
df_ragas_result.to_json('./test-dataset.json', orient='records', indent=4)