# Bring your own LLMs

Ragas uses langchain under the hood for connecting to LLMs for metrices that require them. This means you can swap out the default LLM we use (`gpt-3.5-turbo-16k`) to use any 100s of API supported out of the box with langchain.

- [Completion LLMs Supported](https://api.python.langchain.com/en/latest/api_reference.html#module-langchain.llms)
- [Chat based LLMs Supported](https://api.python.langchain.com/en/latest/api_reference.html#module-langchain.chat_models)

This guide will show you how to use another or LLM API for evaluation.

## Evaluating with GPT-3.5-turbo-instruct

In [1]:
%pip show ragas

Name: ragas
Version: 0.0.15.dev2+gd590b10.d20230924
Summary: 
Home-page: 
Author: 
Author-email: 
License: 
Location: /Users/inflaton/miniconda3/lib/python3.10/site-packages
Requires: datasets, langchain, numpy, openai, pydantic, pysbd, sentence-transformers, transformers
Required-by: 
Note: you may need to restart the kernel to use updated packages.


In [2]:
import os
from dotenv import load_dotenv
load_dotenv()

True

In [53]:
# data
from datasets import load_dataset

fiqa_eval = load_dataset("explodinggradients/fiqa", "ragas_eval")
fiqa_eval

DatasetDict({
    baseline: Dataset({
        features: ['question', 'ground_truths', 'answer', 'contexts'],
        num_rows: 30
    })
})

In [54]:
pruned_index = [1,  2,  3,  7,  9, 10, 12, 13, 14, 15, 18, 19, 21, 22, 23, 24, 25, 26, 27, 28]
pruned_ds = fiqa_eval["baseline"].select(pruned_index)
pruned_ds.to_pandas()

Unnamed: 0,question,ground_truths,answer,contexts
0,Can I send a money order from USPS as a business?,[Sure you can. You can fill in whatever you w...,"\nYes, you can send a money order from USPS as...",[Sure you can. You can fill in whatever you w...
1,1 EIN doing business under multiple business n...,[You're confusing a lot of things here. Compan...,"\nYes, it is possible to have one EIN doing bu...",[You're confusing a lot of things here. Compan...
2,Applying for and receiving business credit,"[""I'm afraid the great myth of limited liabili...",\nApplying for and receiving business credit c...,[Set up a meeting with the bank that handles y...
3,Intentions of Deductible Amount for Small Busi...,"[""If your sole proprietorship losses exceed al...",\nThe intention of deductible amounts for smal...,"[""Short answer, yes. But this is not done thro..."
4,Filing personal with 1099s versus business s-c...,[Depends whom the 1099 was issued to. If it wa...,\nFiling personal taxes with 1099s versus fili...,[Depends whom the 1099 was issued to. If it wa...
5,Using credit card points to pay for tax deduct...,"[""For simplicity, let's start by just consider...",\nUsing credit card points to pay for tax dedu...,"[""For simplicity, let's start by just consider..."
6,Investing/business with other people's money: ...,"[""Basically, you either borrow money, or get o...",\nInvesting/business with other people's money...,"[""Basically, you either borrow money, or get o..."
7,What approaches are there for pricing a small ...,"[I don't have any experience in this, but this...",\nThere are several approaches for pricing a s...,"[I don't have any experience in this, but this..."
8,How to account for money earned and spent prio...,[Funds earned and spent before opening a dedic...,\nMoney earned and spent prior to establishing...,[Funds earned and spent before opening a dedic...
9,Do I need a new EIN since I am hiring employee...,[I called the IRS (click here for IRS contact ...,"\nNo, you do not need a new EIN since you are ...",[You don't need to notify the IRS of new membe...


In [55]:
pruned_ds.to_csv("pruned-ds.csv", sep='\t')

Creating CSV from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]

100872

In [19]:
"""
Official evaluation script for QAConv, modified from SQuAD 2.0.

 * Copyright (c) 2021, salesforce.com, inc.
 * All rights reserved.
 * SPDX-License-Identifier: BSD-3-Clause
 * For full license text, see the LICENSE file in the repo root or https://opensource.org/licenses/BSD-3-Clause

"""

import collections
import re
import string


def normalize_answer(s):
    """Lower text and remove punctuation, articles and extra whitespace."""

    def remove_articles(text):
        regex = re.compile(r"\b(a|an|the)\b", re.UNICODE)
        return re.sub(regex, " ", text)

    def white_space_fix(text):
        return " ".join(text.split())

    def remove_punc(text):
        exclude = set(string.punctuation)
        return "".join(ch for ch in text if ch not in exclude)

    def lower(text):
        return text.lower()

    return white_space_fix(remove_articles(remove_punc(lower(s))))


def get_tokens(s):
    if not s:
        return []
    return normalize_answer(s).split()


def compute_exact(a_gold, a_pred):
    return int(normalize_answer(a_gold) == normalize_answer(a_pred))


def compute_f1(a_gold, a_pred):
    gold_toks = get_tokens(a_gold)
    pred_toks = get_tokens(a_pred)
    common = collections.Counter(gold_toks) & collections.Counter(pred_toks)
    num_same = sum(common.values())
    if len(gold_toks) == 0 or len(pred_toks) == 0:
        # If either is no-answer, then F1 is 1 if they agree, 0 otherwise
        return int(gold_toks == pred_toks)
    if num_same == 0:
        return 0
    precision = 1.0 * num_same / len(pred_toks)
    recall = 1.0 * num_same / len(gold_toks)
    f1 = (2 * precision * recall) / (precision + recall)
    return f1


In [58]:
cut_off_at =  5 # dataset.num_rows
new_ds = pruned_ds.map(
    lambda record, idx: {
        "answer": record["ground_truths"][0] if idx < cut_off_at else  record["answer"],
        "EM": compute_exact(record['ground_truths'][0], record["ground_truths"][0] if idx < cut_off_at else record["answer"]), 
        "F1": compute_f1(record['ground_truths'][0], record["ground_truths"][0] if idx < cut_off_at else record["answer"])
    },
    batched=False,
    with_indices=True,
)
new_ds

Map:   0%|          | 0/20 [00:00<?, ? examples/s]

Dataset({
    features: ['question', 'ground_truths', 'answer', 'contexts', 'EM', 'F1'],
    num_rows: 20
})

In [59]:
new_ds.to_pandas()

Unnamed: 0,question,ground_truths,answer,contexts,EM,F1
0,Can I send a money order from USPS as a business?,[Sure you can. You can fill in whatever you w...,Sure you can. You can fill in whatever you wa...,[Sure you can. You can fill in whatever you w...,1,1.0
1,1 EIN doing business under multiple business n...,[You're confusing a lot of things here. Compan...,You're confusing a lot of things here. Company...,[You're confusing a lot of things here. Compan...,1,1.0
2,Applying for and receiving business credit,"[""I'm afraid the great myth of limited liabili...","""I'm afraid the great myth of limited liabilit...",[Set up a meeting with the bank that handles y...,1,1.0
3,Intentions of Deductible Amount for Small Busi...,"[""If your sole proprietorship losses exceed al...","""If your sole proprietorship losses exceed all...","[""Short answer, yes. But this is not done thro...",1,1.0
4,Filing personal with 1099s versus business s-c...,[Depends whom the 1099 was issued to. If it wa...,Depends whom the 1099 was issued to. If it was...,[Depends whom the 1099 was issued to. If it wa...,1,1.0
5,Using credit card points to pay for tax deduct...,"[""For simplicity, let's start by just consider...",\nUsing credit card points to pay for tax dedu...,"[""For simplicity, let's start by just consider...",0,0.199367
6,Investing/business with other people's money: ...,"[""Basically, you either borrow money, or get o...",\nInvesting/business with other people's money...,"[""Basically, you either borrow money, or get o...",0,0.356436
7,What approaches are there for pricing a small ...,"[I don't have any experience in this, but this...",\nThere are several approaches for pricing a s...,"[I don't have any experience in this, but this...",0,0.383495
8,How to account for money earned and spent prio...,[Funds earned and spent before opening a dedic...,\nMoney earned and spent prior to establishing...,[Funds earned and spent before opening a dedic...,0,0.671642
9,Do I need a new EIN since I am hiring employee...,[I called the IRS (click here for IRS contact ...,"\nNo, you do not need a new EIN since you are ...",[You don't need to notify the IRS of new membe...,0,0.331288


In [8]:
%%time
# evaluate
from ragas import evaluate
from ragas.metrics import (
    answer_relevancy,
    faithfulness,
)

result = evaluate(
    new_ds,
    metrics=[
        faithfulness,
        answer_relevancy,
    ],
)

result

using model: gpt-3.5-turbo-instruct
evaluating with [faithfulness]


100%|██████████| 2/2 [00:13<00:00,  6.73s/it]


evaluating with [answer_relevancy]


100%|██████████| 2/2 [00:40<00:00, 20.13s/it]


CPU times: user 4.18 s, sys: 1.02 s, total: 5.2 s
Wall time: 1min 14s


{'ragas_score': 0.8096, 'faithfulness': 0.7795, 'answer_relevancy': 0.8422}

In [9]:
df = result.to_pandas()
df

Unnamed: 0,question,contexts,answer,ground_truths,faithfulness,answer_relevancy
0,How to deposit a cheque issued to an associate...,[Just have the associate sign the back and the...,Have the check reissued to the proper payee.Ju...,[Have the check reissued to the proper payee.J...,0.714286,0.86771
1,Can I send a money order from USPS as a business?,[Sure you can. You can fill in whatever you w...,Sure you can. You can fill in whatever you wa...,[Sure you can. You can fill in whatever you w...,1.0,0.843422
2,1 EIN doing business under multiple business n...,[You're confusing a lot of things here. Compan...,You're confusing a lot of things here. Company...,[You're confusing a lot of things here. Compan...,1.0,0.798575
3,Applying for and receiving business credit,[Set up a meeting with the bank that handles y...,"""I'm afraid the great myth of limited liabilit...","[""I'm afraid the great myth of limited liabili...",1.0,0.824579
4,401k Transfer After Business Closure,[The time horizon for your 401K/IRA is essenti...,You should probably consult an attorney. Howev...,[You should probably consult an attorney. Howe...,0.0,0.847144
5,What are the ins/outs of writing equipment pur...,[You would report it as business income on Sch...,Most items used in business have to be depreci...,[Most items used in business have to be deprec...,0.6,0.844843
6,Can a entrepreneur hire a self-employed busine...,[Yes. I can by all means start my own company ...,Yes. I can by all means start my own company a...,[Yes. I can by all means start my own company ...,0.5,0.860921
7,Intentions of Deductible Amount for Small Busi...,"[""Short answer, yes. But this is not done thro...","""If your sole proprietorship losses exceed all...","[""If your sole proprietorship losses exceed al...",1.0,0.766336
8,How can I deposit a check made out to my busin...,"[""I have checked with Bank of America, and the...",You should have a separate business account. M...,[You should have a separate business account. ...,0.733333,0.87224
9,Filing personal with 1099s versus business s-c...,[Depends whom the 1099 was issued to. If it wa...,Depends whom the 1099 was issued to. If it was...,[Depends whom the 1099 was issued to. If it wa...,1.0,0.855224


In [12]:
pruned = df[df.faithfulness > 0.6]
pruned

Unnamed: 0,question,contexts,answer,ground_truths,faithfulness,answer_relevancy
0,How to deposit a cheque issued to an associate...,[Just have the associate sign the back and the...,Have the check reissued to the proper payee.Ju...,[Have the check reissued to the proper payee.J...,0.714286,0.86771
1,Can I send a money order from USPS as a business?,[Sure you can. You can fill in whatever you w...,Sure you can. You can fill in whatever you wa...,[Sure you can. You can fill in whatever you w...,1.0,0.843422
2,1 EIN doing business under multiple business n...,[You're confusing a lot of things here. Compan...,You're confusing a lot of things here. Company...,[You're confusing a lot of things here. Compan...,1.0,0.798575
3,Applying for and receiving business credit,[Set up a meeting with the bank that handles y...,"""I'm afraid the great myth of limited liabilit...","[""I'm afraid the great myth of limited liabili...",1.0,0.824579
7,Intentions of Deductible Amount for Small Busi...,"[""Short answer, yes. But this is not done thro...","""If your sole proprietorship losses exceed all...","[""If your sole proprietorship losses exceed al...",1.0,0.766336
8,How can I deposit a check made out to my busin...,"[""I have checked with Bank of America, and the...",You should have a separate business account. M...,[You should have a separate business account. ...,0.733333,0.87224
9,Filing personal with 1099s versus business s-c...,[Depends whom the 1099 was issued to. If it wa...,Depends whom the 1099 was issued to. If it was...,[Depends whom the 1099 was issued to. If it wa...,1.0,0.855224
10,Using credit card points to pay for tax deduct...,"[""For simplicity, let's start by just consider...","""For simplicity, let's start by just consideri...","[""For simplicity, let's start by just consider...",1.0,0.861972
12,Investing/business with other people's money: ...,"[""Basically, you either borrow money, or get o...","""Basically, you either borrow money, or get ot...","[""Basically, you either borrow money, or get o...",1.0,0.875814
13,What approaches are there for pricing a small ...,"[I don't have any experience in this, but this...","I don't have any experience in this, but this ...","[I don't have any experience in this, but this...",1.0,0.838752


In [13]:
pruned.index

Int64Index([ 0,  1,  2,  3,  7,  8,  9, 10, 12, 13, 14, 15, 18, 19, 21, 22, 23,
            24, 25, 26, 27, 28, 29],
           dtype='int64')

In [23]:
pruned_index = [1,  2,  3,  7,  9, 10, 12, 13, 14, 15, 18, 19, 21, 22, 23, 24, 25, 26, 27, 28]
pruned_ds = new_ds.select(pruned_index)
pruned_ds.to_pandas()

Unnamed: 0,question,ground_truths,answer,contexts,EM,F1
0,Can I send a money order from USPS as a business?,[Sure you can. You can fill in whatever you w...,Sure you can. You can fill in whatever you wa...,[Sure you can. You can fill in whatever you w...,1,1.0
1,1 EIN doing business under multiple business n...,[You're confusing a lot of things here. Compan...,You're confusing a lot of things here. Company...,[You're confusing a lot of things here. Compan...,1,1.0
2,Applying for and receiving business credit,"[""I'm afraid the great myth of limited liabili...","""I'm afraid the great myth of limited liabilit...",[Set up a meeting with the bank that handles y...,1,1.0
3,Intentions of Deductible Amount for Small Busi...,"[""If your sole proprietorship losses exceed al...","""If your sole proprietorship losses exceed all...","[""Short answer, yes. But this is not done thro...",1,1.0
4,Filing personal with 1099s versus business s-c...,[Depends whom the 1099 was issued to. If it wa...,Depends whom the 1099 was issued to. If it was...,[Depends whom the 1099 was issued to. If it wa...,1,1.0
5,Using credit card points to pay for tax deduct...,"[""For simplicity, let's start by just consider...","""For simplicity, let's start by just consideri...","[""For simplicity, let's start by just consider...",1,1.0
6,Investing/business with other people's money: ...,"[""Basically, you either borrow money, or get o...","""Basically, you either borrow money, or get ot...","[""Basically, you either borrow money, or get o...",1,1.0
7,What approaches are there for pricing a small ...,"[I don't have any experience in this, but this...","I don't have any experience in this, but this ...","[I don't have any experience in this, but this...",1,1.0
8,How to account for money earned and spent prio...,[Funds earned and spent before opening a dedic...,Funds earned and spent before opening a dedica...,[Funds earned and spent before opening a dedic...,1,1.0
9,Do I need a new EIN since I am hiring employee...,[I called the IRS (click here for IRS contact ...,I called the IRS (click here for IRS contact i...,[You don't need to notify the IRS of new membe...,1,1.0


In [24]:
%%time
# evaluate
from ragas import evaluate
from ragas.metrics import (
    answer_relevancy,
    faithfulness,
)

pruned_result = evaluate(
    pruned_ds,
    metrics=[
        faithfulness,
        answer_relevancy,
    ],
)

pruned_result

evaluating with [faithfulness]


100%|██████████| 2/2 [00:11<00:00,  5.52s/it]


evaluating with [answer_relevancy]


100%|██████████| 2/2 [00:23<00:00, 11.92s/it]


CPU times: user 329 ms, sys: 68 ms, total: 397 ms
Wall time: 35.5 s


{'ragas_score': 0.8849, 'faithfulness': 0.9295, 'answer_relevancy': 0.8443}

In [30]:
df = pruned_result.to_pandas()
df

Unnamed: 0,question,contexts,answer,ground_truths,faithfulness,answer_relevancy
0,Can I send a money order from USPS as a business?,[Sure you can. You can fill in whatever you w...,Sure you can. You can fill in whatever you wa...,[Sure you can. You can fill in whatever you w...,1.0,0.830856
1,1 EIN doing business under multiple business n...,[You're confusing a lot of things here. Compan...,You're confusing a lot of things here. Company...,[You're confusing a lot of things here. Compan...,1.0,0.782628
2,Applying for and receiving business credit,[Set up a meeting with the bank that handles y...,"""I'm afraid the great myth of limited liabilit...","[""I'm afraid the great myth of limited liabili...",1.0,0.829392
3,Intentions of Deductible Amount for Small Busi...,"[""Short answer, yes. But this is not done thro...","""If your sole proprietorship losses exceed all...","[""If your sole proprietorship losses exceed al...",1.0,0.779775
4,Filing personal with 1099s versus business s-c...,[Depends whom the 1099 was issued to. If it wa...,Depends whom the 1099 was issued to. If it was...,[Depends whom the 1099 was issued to. If it wa...,1.0,0.840258
5,Using credit card points to pay for tax deduct...,"[""For simplicity, let's start by just consider...","""For simplicity, let's start by just consideri...","[""For simplicity, let's start by just consider...",1.0,0.861889
6,Investing/business with other people's money: ...,"[""Basically, you either borrow money, or get o...","""Basically, you either borrow money, or get ot...","[""Basically, you either borrow money, or get o...",0.833333,0.87717
7,What approaches are there for pricing a small ...,"[I don't have any experience in this, but this...","I don't have any experience in this, but this ...","[I don't have any experience in this, but this...",1.0,0.846074
8,How to account for money earned and spent prio...,[Funds earned and spent before opening a dedic...,Funds earned and spent before opening a dedica...,[Funds earned and spent before opening a dedic...,1.0,0.91294
9,Do I need a new EIN since I am hiring employee...,[You don't need to notify the IRS of new membe...,I called the IRS (click here for IRS contact i...,[I called the IRS (click here for IRS contact ...,1.0,0.822008


In [33]:
result_all = pruned_ds.map(
    lambda record, idx: {
        "faithfulness (gpt-3.5-turbo-instruct)": df["faithfulness"][idx], 
        "answer_relevancy (gpt-3.5-turbo-instruct)": df["answer_relevancy"][idx], 
        "ragas_score (gpt-3.5-turbo-instruct)": 2 * df["faithfulness"][idx] * df["answer_relevancy"][idx] / (df["faithfulness"][idx] + df["answer_relevancy"][idx])
    },
    batched=False,
    with_indices=True,
    remove_columns=dataset.column_names
)
result_all

Map:   0%|          | 0/20 [00:00<?, ? examples/s]

Dataset({
    features: ['EM', 'F1', 'faithfulness (gpt-3.5-turbo-instruct)', 'answer_relevancy (gpt-3.5-turbo-instruct)', 'ragas_score (gpt-3.5-turbo-instruct)'],
    num_rows: 20
})

In [34]:
result_all.to_pandas()

Unnamed: 0,EM,F1,faithfulness (gpt-3.5-turbo-instruct),answer_relevancy (gpt-3.5-turbo-instruct),ragas_score (gpt-3.5-turbo-instruct)
0,1,1.0,1.0,0.830856,0.907615
1,1,1.0,1.0,0.782628,0.878061
2,1,1.0,1.0,0.829392,0.906741
3,1,1.0,1.0,0.779775,0.876262
4,1,1.0,1.0,0.840258,0.913196
5,1,1.0,1.0,0.861889,0.925822
6,1,1.0,0.833333,0.87717,0.85469
7,1,1.0,1.0,0.846074,0.91662
8,1,1.0,1.0,0.91294,0.954489
9,1,1.0,1.0,0.822008,0.90231


In [28]:
from langchain.chat_models import ChatOpenAI
from ragas.metrics import Faithfulness, AnswerRelevancy

gpt3 = ChatOpenAI(model_name="gpt-3.5-turbo")
faithfulness_gpt3 = Faithfulness(name="faithfulness", llm=gpt3)
answer_relevancy_gpt3 = AnswerRelevancy(name="answer_relevancy", llm=gpt3)

gpt4 = ChatOpenAI(model_name="gpt-4")
faithfulness_gpt4 = Faithfulness(name="faithfulness", llm=gpt4)
answer_relevancy_gpt4 = AnswerRelevancy(name="answer_relevancy", llm=gpt4)

In [29]:
%%time
# evaluate

result_gpt3 = evaluate(
    pruned_ds,
    metrics=[
        faithfulness_gpt3,
        answer_relevancy_gpt3,
    ],
)

result_gpt3

evaluating with [faithfulness]


100%|██████████| 2/2 [06:16<00:00, 188.28s/it]


evaluating with [answer_relevancy]


100%|██████████| 2/2 [01:01<00:00, 30.58s/it]


CPU times: user 599 ms, sys: 132 ms, total: 730 ms
Wall time: 7min 18s


{'ragas_score': 0.8163, 'faithfulness': 0.7840, 'answer_relevancy': 0.8513}

In [35]:
df = result_gpt3.to_pandas()
df

Unnamed: 0,question,contexts,answer,ground_truths,faithfulness,answer_relevancy
0,Can I send a money order from USPS as a business?,[Sure you can. You can fill in whatever you w...,Sure you can. You can fill in whatever you wa...,[Sure you can. You can fill in whatever you w...,1.0,0.840292
1,1 EIN doing business under multiple business n...,[You're confusing a lot of things here. Compan...,You're confusing a lot of things here. Company...,[You're confusing a lot of things here. Compan...,0.6,0.776698
2,Applying for and receiving business credit,[Set up a meeting with the bank that handles y...,"""I'm afraid the great myth of limited liabilit...","[""I'm afraid the great myth of limited liabili...",1.0,0.813242
3,Intentions of Deductible Amount for Small Busi...,"[""Short answer, yes. But this is not done thro...","""If your sole proprietorship losses exceed all...","[""If your sole proprietorship losses exceed al...",0.5,0.788614
4,Filing personal with 1099s versus business s-c...,[Depends whom the 1099 was issued to. If it wa...,Depends whom the 1099 was issued to. If it was...,[Depends whom the 1099 was issued to. If it wa...,1.0,0.838918
5,Using credit card points to pay for tax deduct...,"[""For simplicity, let's start by just consider...","""For simplicity, let's start by just consideri...","[""For simplicity, let's start by just consider...",0.714286,0.859645
6,Investing/business with other people's money: ...,"[""Basically, you either borrow money, or get o...","""Basically, you either borrow money, or get ot...","[""Basically, you either borrow money, or get o...",1.0,0.876517
7,What approaches are there for pricing a small ...,"[I don't have any experience in this, but this...","I don't have any experience in this, but this ...","[I don't have any experience in this, but this...",1.0,0.857932
8,How to account for money earned and spent prio...,[Funds earned and spent before opening a dedic...,Funds earned and spent before opening a dedica...,[Funds earned and spent before opening a dedic...,1.0,0.912479
9,Do I need a new EIN since I am hiring employee...,[You don't need to notify the IRS of new membe...,I called the IRS (click here for IRS contact i...,[I called the IRS (click here for IRS contact ...,1.0,0.898253


In [37]:
result_all = result_all.map(
    lambda record, idx: {
        "faithfulness (gpt-3.5-turbo)": df["faithfulness"][idx], 
        "answer_relevancy (gpt-3.5-turbo)": df["answer_relevancy"][idx], 
        "ragas_score (gpt-3.5-turbo)": 2 * df["faithfulness"][idx] * df["answer_relevancy"][idx] / (df["faithfulness"][idx] + df["answer_relevancy"][idx])
    },
    batched=False,
    with_indices=True
)
result_all

Map:   0%|          | 0/20 [00:00<?, ? examples/s]

Dataset({
    features: ['EM', 'F1', 'faithfulness (gpt-3.5-turbo-instruct)', 'answer_relevancy (gpt-3.5-turbo-instruct)', 'ragas_score (gpt-3.5-turbo-instruct)', 'faithfulness (gpt-3.5-turbo)', 'answer_relevancy (gpt-3.5-turbo)', 'ragas_score (gpt-3.5-turbo)'],
    num_rows: 20
})

In [38]:
result_all.to_pandas()

Unnamed: 0,EM,F1,faithfulness (gpt-3.5-turbo-instruct),answer_relevancy (gpt-3.5-turbo-instruct),ragas_score (gpt-3.5-turbo-instruct),faithfulness (gpt-3.5-turbo),answer_relevancy (gpt-3.5-turbo),ragas_score (gpt-3.5-turbo)
0,1,1.0,1.0,0.830856,0.907615,1.0,0.840292,0.913216
1,1,1.0,1.0,0.782628,0.878061,0.6,0.776698,0.67701
2,1,1.0,1.0,0.829392,0.906741,1.0,0.813242,0.897003
3,1,1.0,1.0,0.779775,0.876262,0.5,0.788614,0.611986
4,1,1.0,1.0,0.840258,0.913196,1.0,0.838918,0.912404
5,1,1.0,1.0,0.861889,0.925822,0.714286,0.859645,0.780253
6,1,1.0,0.833333,0.87717,0.85469,1.0,0.876517,0.934196
7,1,1.0,1.0,0.846074,0.91662,1.0,0.857932,0.923534
8,1,1.0,1.0,0.91294,0.954489,1.0,0.912479,0.954237
9,1,1.0,1.0,0.822008,0.90231,1.0,0.898253,0.9464


In [41]:
%%time
# evaluate

result_gpt4 = evaluate(
    pruned_ds,
    metrics=[
        faithfulness_gpt4,
        answer_relevancy_gpt4,
    ],
)

result_gpt4

evaluating with [faithfulness]


  0%|          | 0/2 [00:00<?, ?it/s]Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised Timeout: Request timed out: HTTPSConnectionPool(host='api.openai.com', port=443): Read timed out. (read timeout=600).
100%|██████████| 2/2 [25:47<00:00, 773.91s/it] 


evaluating with [answer_relevancy]


100%|██████████| 2/2 [01:09<00:00, 34.88s/it]


CPU times: user 670 ms, sys: 428 ms, total: 1.1 s
Wall time: 26min 58s


{'ragas_score': 0.8200, 'faithfulness': 0.7850, 'answer_relevancy': 0.8583}

In [42]:
df = result_gpt4.to_pandas()
df

Unnamed: 0,question,contexts,answer,ground_truths,faithfulness,answer_relevancy
0,Can I send a money order from USPS as a business?,[Sure you can. You can fill in whatever you w...,Sure you can. You can fill in whatever you wa...,[Sure you can. You can fill in whatever you w...,0.857143,0.879709
1,1 EIN doing business under multiple business n...,[You're confusing a lot of things here. Compan...,You're confusing a lot of things here. Company...,[You're confusing a lot of things here. Compan...,0.909091,0.775887
2,Applying for and receiving business credit,[Set up a meeting with the bank that handles y...,"""I'm afraid the great myth of limited liabilit...","[""I'm afraid the great myth of limited liabili...",1.0,0.84573
3,Intentions of Deductible Amount for Small Busi...,"[""Short answer, yes. But this is not done thro...","""If your sole proprietorship losses exceed all...","[""If your sole proprietorship losses exceed al...",0.0,0.77795
4,Filing personal with 1099s versus business s-c...,[Depends whom the 1099 was issued to. If it wa...,Depends whom the 1099 was issued to. If it was...,[Depends whom the 1099 was issued to. If it wa...,1.0,0.878502
5,Using credit card points to pay for tax deduct...,"[""For simplicity, let's start by just consider...","""For simplicity, let's start by just consideri...","[""For simplicity, let's start by just consider...",1.0,0.876391
6,Investing/business with other people's money: ...,"[""Basically, you either borrow money, or get o...","""Basically, you either borrow money, or get ot...","[""Basically, you either borrow money, or get o...",0.4,0.870155
7,What approaches are there for pricing a small ...,"[I don't have any experience in this, but this...","I don't have any experience in this, but this ...","[I don't have any experience in this, but this...",1.0,0.847384
8,How to account for money earned and spent prio...,[Funds earned and spent before opening a dedic...,Funds earned and spent before opening a dedica...,[Funds earned and spent before opening a dedic...,1.0,0.919039
9,Do I need a new EIN since I am hiring employee...,[You don't need to notify the IRS of new membe...,I called the IRS (click here for IRS contact i...,[I called the IRS (click here for IRS contact ...,0.666667,0.846412


In [43]:
result_all = result_all.map(
    lambda record, idx: {
        "faithfulness (gpt-4)": df["faithfulness"][idx], 
        "answer_relevancy (gpt-4)": df["answer_relevancy"][idx], 
        "ragas_score (gpt-4)": 2 * df["faithfulness"][idx] * df["answer_relevancy"][idx] / (df["faithfulness"][idx] + df["answer_relevancy"][idx])
    },
    batched=False,
    with_indices=True
)
result_all

Map:   0%|          | 0/20 [00:00<?, ? examples/s]

Dataset({
    features: ['EM', 'F1', 'faithfulness (gpt-3.5-turbo-instruct)', 'answer_relevancy (gpt-3.5-turbo-instruct)', 'ragas_score (gpt-3.5-turbo-instruct)', 'faithfulness (gpt-3.5-turbo)', 'answer_relevancy (gpt-3.5-turbo)', 'ragas_score (gpt-3.5-turbo)', 'faithfulness (gpt-4)', 'answer_relevancy (gpt-4)', 'ragas_score (gpt-4)'],
    num_rows: 20
})

In [44]:
result_all.to_pandas()

Unnamed: 0,EM,F1,faithfulness (gpt-3.5-turbo-instruct),answer_relevancy (gpt-3.5-turbo-instruct),ragas_score (gpt-3.5-turbo-instruct),faithfulness (gpt-3.5-turbo),answer_relevancy (gpt-3.5-turbo),ragas_score (gpt-3.5-turbo),faithfulness (gpt-4),answer_relevancy (gpt-4),ragas_score (gpt-4)
0,1,1.0,1.0,0.830856,0.907615,1.0,0.840292,0.913216,0.857143,0.879709,0.868279
1,1,1.0,1.0,0.782628,0.878061,0.6,0.776698,0.67701,0.909091,0.775887,0.837224
2,1,1.0,1.0,0.829392,0.906741,1.0,0.813242,0.897003,1.0,0.84573,0.916418
3,1,1.0,1.0,0.779775,0.876262,0.5,0.788614,0.611986,0.0,0.77795,0.0
4,1,1.0,1.0,0.840258,0.913196,1.0,0.838918,0.912404,1.0,0.878502,0.935322
5,1,1.0,1.0,0.861889,0.925822,0.714286,0.859645,0.780253,1.0,0.876391,0.934124
6,1,1.0,0.833333,0.87717,0.85469,1.0,0.876517,0.934196,0.4,0.870155,0.548062
7,1,1.0,1.0,0.846074,0.91662,1.0,0.857932,0.923534,1.0,0.847384,0.917388
8,1,1.0,1.0,0.91294,0.954489,1.0,0.912479,0.954237,1.0,0.919039,0.957812
9,1,1.0,1.0,0.822008,0.90231,1.0,0.898253,0.9464,0.666667,0.846412,0.745863


In [47]:
pruned_ds.to_csv("pruned-ds.csv", sep='\t')
result_all.to_csv("pruned-result.csv", sep='\t')

Creating CSV from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]

Creating CSV from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]

3009