In [2]:
import os
from dotenv import load_dotenv

In [3]:
load_dotenv()

True

In [4]:
# data
from datasets import load_dataset

fiqa_eval = load_dataset("explodinggradients/fiqa", "ragas_eval")
fiqa_eval

Downloading data:   0%|          | 0.00/115k [00:00<?, ?B/s]

Generating baseline split:   0%|          | 0/30 [00:00<?, ? examples/s]

DatasetDict({
    baseline: Dataset({
        features: ['question', 'ground_truths', 'answer', 'contexts'],
        num_rows: 30
    })
})

# Metrics
Ragas provides you with a few metrics to evaluate the different aspects of your RAG systems namely

metrics to evaluate retrieval: offers context_precision and context_recall which give you the measure of the performance of your retrieval system.
metrics to evaluate generation: offers faithfulness which measures hallucinations and answer_relevancy which measures how to the point the answers are to the question.
The harmonic mean of these 4 aspects gives you the ragas score which is a single measure of the performance of your QA system across all the important aspects.

now lets import these metrics and understand more about what they denote

In [6]:
from ragas.metrics import (
    answer_relevancy,
    faithfulness,
    context_recall,
    context_precision,
)
from ragas.metrics.critique import harmfulness

Here you can see that we are using 4 metrics, but what do the represent?

context_precision - a measure of how relevant the retrieved context is to the question. Conveys quality of the retrieval pipeline. \
answer_relevancy - a measure of how relevent the answer is to the question. \
faithfulness - the factual consistancy of the answer to the context base on the question. \
context_recall: measures the ability of the retriever to retrieve all the necessary information needed to answer the question. \
harmfulness (AspectCritique) - in general, AspectCritique is a metric that can be used to quantify various aspects of the answer. \

Aspects like harmfulness, maliciousness, coherence, correctness, concisenes are available by default but you can easily define your own.

In [7]:
from ragas import evaluate

result = evaluate(
    fiqa_eval["baseline"],
    metrics=[
        context_precision,
        faithfulness,
        answer_relevancy,
        context_recall,
        harmfulness,
    ],
)

evaluating with [context_precision]


100%|███████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:20<00:00, 10.12s/it]


evaluating with [faithfulness]


100%|███████████████████████████████████████████████████████████████████████████████████████████| 2/2 [01:31<00:00, 45.83s/it]


evaluating with [answer_relevancy]


100%|███████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:41<00:00, 20.73s/it]


evaluating with [context_recall]


100%|███████████████████████████████████████████████████████████████████████████████████████████| 2/2 [01:24<00:00, 42.24s/it]


evaluating with [harmfulness]


100%|███████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:14<00:00,  7.31s/it]


In [9]:
df = result.to_pandas()
df.head(10)

Unnamed: 0,question,ground_truths,answer,contexts,context_precision,faithfulness,answer_relevancy,context_recall,harmfulness
0,How to deposit a cheque issued to an associate...,[Have the check reissued to the proper payee.J...,\nThe best way to deposit a cheque issued to a...,[Just have the associate sign the back and the...,1.0,1.0,0.982504,0.857143,0
1,Can I send a money order from USPS as a business?,[Sure you can. You can fill in whatever you w...,"\nYes, you can send a money order from USPS as...",[Sure you can. You can fill in whatever you w...,0.0,0.833333,0.995249,1.0,0
2,1 EIN doing business under multiple business n...,[You're confusing a lot of things here. Compan...,"\nYes, it is possible to have one EIN doing bu...",[You're confusing a lot of things here. Compan...,0.0,1.0,0.94892,1.0,0
3,Applying for and receiving business credit,"[""I'm afraid the great myth of limited liabili...",\nApplying for and receiving business credit c...,[Set up a meeting with the bank that handles y...,0.0,1.0,0.912943,1.0,0
4,401k Transfer After Business Closure,[You should probably consult an attorney. Howe...,\nIf your employer has closed and you need to ...,[The time horizon for your 401K/IRA is essenti...,0.0,1.0,0.894799,0.666667,0
5,What are the ins/outs of writing equipment pur...,[Most items used in business have to be deprec...,\nWriting equipment purchases off as business ...,[You would report it as business income on Sch...,0.0,1.0,0.94319,0.941176,0
6,Can a entrepreneur hire a self-employed busine...,[Yes. I can by all means start my own company ...,"\nYes, an entrepreneur can hire a self-employe...",[Yes. I can by all means start my own company ...,0.0,1.0,0.999324,0.333333,0
7,Intentions of Deductible Amount for Small Busi...,"[""If your sole proprietorship losses exceed al...",\nThe intention of deductible amounts for smal...,"[""Short answer, yes. But this is not done thro...",0.0,0.833333,0.949886,0.0,0
8,How can I deposit a check made out to my busin...,[You should have a separate business account. ...,\nYou can deposit a check made out to your bus...,"[""I have checked with Bank of America, and the...",0.0,0.5,0.97683,0.25,0
9,Filing personal with 1099s versus business s-c...,[Depends whom the 1099 was issued to. If it wa...,\nFiling personal taxes with 1099s versus fili...,[Depends whom the 1099 was issued to. If it wa...,0.0,1.0,0.942206,1.0,0
