<img src="https://www.rp.edu.sg/images/default-source/default-album/rp-logo.png" width="200" alt="Republic Polytechnic"/>

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/koayst-rplesson/C3669C-2025-01/blob/main/L17/L17.ipynb)

# Setup and Installation

You can run this Jupyter notebook either on your local machine or run it at Google Colab.

* For local machine, it is recommended to install Anaconda and create a new development environment called `c3669c`.
* Pip/Conda install the libraries stated below when necessary.
---

# <font color='red'>ATTENTION</font>

## Google Colab
- If you are running this code in Google Colab, **DO NOT** store the API Key in a text file and load the key later from Google Drive. This is insecure and will expose the key.
- **DO NOT** hard code the API Key directly in the Python code, even though it might seem convenient for quick development.
- You need to enter the API key at python code `getpass.getpass()` when ask.

## Local Environment/Laptop
- If you are running this code locally in your laptop, you can create a env.txt and store the API key there.
- Make sure env.txt is in the same directory of this Jupyter notebook.
- You need to install `python-dotenv` and run the Python code to load in the API key.

---
```
%pip install python-dotenv

from dotenv import load_dotenv

load_dotenv('env.txt')
openai_api_key = os.getenv('OPENAI_API_KEY')
```
---

## GitHub/GitLab
- **DO NOT** `commit` or `push` API Key to services like GitHub or GitLab.

# Lesson 17

- RAGAS is designed to evaluate RAG applications, which combine retrieval (fetching relevant information from knowledge bases) and generation (LLM synthesizing answers). RAGAS provides a clear methodology to evaluate and improve RAG workflows.
- RAGAS evaluates pipelines on multiple dimensions, such as retrieval accuracy, response quality, and semantic alignment between input and output.
- Hallucinations are common in LLMs, where generated outputs include fabricated or inaccurate information. RAGAS helps assess the factual alignment of responses with retrieved knowledge.

In [1]:
%%capture --no-stderr
%pip install --quiet -U langchain
%pip install --quiet -U langchain-openai
%pip install --quiet -U ragas
%pip install --quiet -U datasets

In [None]:
# langchain        0.3.13
# langchain-core   0.3.27
# langchain-openai 0.2.14
# openai           1.58.1
# ragas            0.2.10
# datasets         3.1.0

In [2]:
import getpass
import os

# setup the OpenAI API Key

# get OpenAI API key ready and enter it when ask
os.environ["OPENAI_API_KEY"] = getpass.getpass()

··········


## A Simple RAG Evaluation Using RAGAS

The below example code demonstrates how to work with RAGAS to evaluate a simple RAG pipeline.  Additional comments are provided to aid in understanding the code.

- [Reference 1](https://docs.ragas.io/en/stable/),
- [Reference 2](https://docs.ragas.io/en/stable/getstarted/rag_eval/)

In [3]:
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings

from ragas import EvaluationDataset, evaluate
from ragas.llms import LangchainLLMWrapper

from ragas.metrics import (
    faithfulness,
    answer_relevancy,
    answer_correctness,
    answer_similarity,
    context_precision,
    context_recall,
)

import pandas as pd

# set the max columns to none
pd.set_option('display.max_columns', None)

import numpy as np

In [4]:
# Python class to abstract a simple RAG. It contains functions to load the documents,
# get the most relevant documents and generate an answer based on the context passed in
class RAG:
    def __init__(self, model="gpt-4o-mini"):
        self.llm = ChatOpenAI(model=model)
        self.embeddings = OpenAIEmbeddings()
        self.doc_embeddings = None
        self.docs = None

    def load_documents(self, documents):
        """Load documents and compute their embeddings."""
        self.docs = documents
        self.doc_embeddings = self.embeddings.embed_documents(documents)

    def get_most_relevant_docs(self, query):
        """Find the most relevant document for a given query."""
        if not self.docs or not self.doc_embeddings:
            raise ValueError("Documents and their embeddings are not loaded.")

        query_embedding = self.embeddings.embed_query(query)
        similarities = [
            np.dot(query_embedding, doc_emb)
            / (np.linalg.norm(query_embedding) * np.linalg.norm(doc_emb))
            for doc_emb in self.doc_embeddings
        ]
        most_relevant_doc_index = np.argmax(similarities)
        return [self.docs[most_relevant_doc_index]]

    def generate_answer(self, query, relevant_doc):
        """Generate an answer for a given query based on the most relevant document."""
        prompt = f"question: {query}\n\nDocuments: {relevant_doc}"
        messages = [
            ("system", "You are a helpful assistant that answers questions based on given documents only."),
            ("human", prompt),
        ]
        ai_msg = self.llm.invoke(messages)
        return ai_msg.content

In [5]:
# document samples

sample_docs = [
    "Albert Einstein proposed the theory of relativity, which transformed our understanding of time, space, and gravity.",
    "Marie Curie was a physicist and chemist who conducted pioneering research on radioactivity and won two Nobel Prizes.",
    "Isaac Newton formulated the laws of motion and universal gravitation, laying the foundation for classical mechanics.",
    "Charles Darwin introduced the theory of evolution by natural selection in his book 'On the Origin of Species'.",
    "Ada Lovelace is regarded as the first computer programmer for her work on Charles Babbage's early mechanical computer, the Analytical Engine."
]

In [6]:
# Initialize RAG instance
rag = RAG()

# Load documents
rag.load_documents(sample_docs)

In [7]:
# Query and retrieve the most relevant document
query = "Who introduced the theory of relativity?"
relevant_doc = rag.get_most_relevant_docs(query)

In [8]:
# Generate an answer
answer = rag.generate_answer(query, relevant_doc)

print(f"Query: {query}")
print(f"Relevant Document: {relevant_doc}")
print(f"Answer: {answer}")

Query: Who introduced the theory of relativity?
Relevant Document: ['Albert Einstein proposed the theory of relativity, which transformed our understanding of time, space, and gravity.']
Answer: The theory of relativity was introduced by Albert Einstein.


In [9]:
# query samples
sample_queries = [
    "Who introduced the theory of relativity?",
    "Who was the first computer programmer?",
    "What did Isaac Newton contribute to science?",
    "Who won two Nobel Prizes for research on radioactivity?",
    "What is the theory of evolution by natural selection?"
]

# this is the golden response
expected_responses = [
    "Albert Einstein proposed the theory of relativity, which transformed our understanding of time, space, and gravity.",
    "Ada Lovelace is regarded as the first computer programmer for her work on Charles Babbage's early mechanical computer, the Analytical Engine.",
    "Isaac Newton formulated the laws of motion and universal gravitation, laying the foundation for classical mechanics.",
    "Marie Curie was a physicist and chemist who conducted pioneering research on radioactivity and won two Nobel Prizes.",
    "Charles Darwin introduced the theory of evolution by natural selection in his book 'On the Origin of Species'."
]

In [10]:
dataset = []

# setup the dataset for evaluation task later
for query, reference in zip(sample_queries, expected_responses):
    relevant_docs = rag.get_most_relevant_docs(query)
    response = rag.generate_answer(query, relevant_docs)
    dataset.append(
        {
            "user_input" : query,
            "retrieved_contexts" : relevant_docs,
            "response" : response,
            "reference" : reference
        }
    )

In [11]:
# observe the content of the first evaluation data
dataset[0]

{'user_input': 'Who introduced the theory of relativity?',
 'retrieved_contexts': ['Albert Einstein proposed the theory of relativity, which transformed our understanding of time, space, and gravity.'],
 'response': 'The theory of relativity was introduced by Albert Einstein.',
 'reference': 'Albert Einstein proposed the theory of relativity, which transformed our understanding of time, space, and gravity.'}

In [12]:
# create the evaluation dataset from a list
evaluation_dataset = EvaluationDataset.from_list(dataset)

In [13]:
# the metrics we are interested

metrics = [
    faithfulness,
    answer_relevancy,
    answer_correctness,
    answer_similarity,
    context_precision,
    context_recall,
]

In [14]:
# let's evaluate using gpt_40 as judge

llm_model_gpt_4o = ChatOpenAI(
    model="gpt-4o"
)

evaluator_llm = LangchainLLMWrapper(llm_model_gpt_4o)

results = evaluate(dataset=evaluation_dataset,
                  metrics=metrics,
                  llm=evaluator_llm
)

Evaluating:   0%|          | 0/30 [00:00<?, ?it/s]

In [15]:
# print average scores

for key, value in results._repr_dict.items():
    print(f"{key} = {value:.4f}")

faithfulness = 0.8000
answer_relevancy = 0.9682
answer_correctness = 0.7505
semantic_similarity = 0.9620
context_precision = 1.0000
context_recall = 1.0000


In [16]:
# show scores in a table

results.to_pandas()

Unnamed: 0,user_input,retrieved_contexts,response,reference,faithfulness,answer_relevancy,answer_correctness,semantic_similarity,context_precision,context_recall
0,Who introduced the theory of relativity?,[Albert Einstein proposed the theory of relati...,The theory of relativity was introduced by Alb...,Albert Einstein proposed the theory of relativ...,1.0,1.0,0.532551,0.930206,1.0,1.0
1,Who was the first computer programmer?,[Ada Lovelace is regarded as the first compute...,"The first computer programmer is Ada Lovelace,...",Ada Lovelace is regarded as the first computer...,1.0,0.986161,0.993265,0.973062,1.0,1.0
2,What did Isaac Newton contribute to science?,[Isaac Newton formulated the laws of motion an...,Isaac Newton contributed to science by formula...,Isaac Newton formulated the laws of motion and...,1.0,0.989277,0.995009,0.980036,1.0,1.0
3,Who won two Nobel Prizes for research on radio...,[Marie Curie was a physicist and chemist who c...,Marie Curie won two Nobel Prizes for her resea...,Marie Curie was a physicist and chemist who co...,0.5,0.910074,0.492999,0.971996,1.0,1.0
4,What is the theory of evolution by natural sel...,[Charles Darwin introduced the theory of evolu...,"The theory of evolution by natural selection, ...",Charles Darwin introduced the theory of evolut...,0.5,0.955691,0.738654,0.954618,1.0,1.0


In [17]:
# let's evaluate using gpt-3.5-turbo as judge

llm_model_gpt_35_turbo = ChatOpenAI(
    model="gpt-3.5-turbo"
)

evaluator_llm = LangchainLLMWrapper(llm_model_gpt_35_turbo)

results = evaluate(dataset=evaluation_dataset,
                  metrics=metrics,
                  llm=evaluator_llm
)

Evaluating:   0%|          | 0/30 [00:00<?, ?it/s]

In [18]:
# print average scores

for key, value in results._repr_dict.items():
    print(f"{key} = {value:.4f}")

faithfulness = 0.7500
answer_relevancy = 0.9525
answer_correctness = 0.6779
semantic_similarity = 0.9620
context_precision = 1.0000
context_recall = 1.0000


In [19]:
# show scores in a table

results.to_pandas()

Unnamed: 0,user_input,retrieved_contexts,response,reference,faithfulness,answer_relevancy,answer_correctness,semantic_similarity,context_precision,context_recall
0,Who introduced the theory of relativity?,[Albert Einstein proposed the theory of relati...,The theory of relativity was introduced by Alb...,Albert Einstein proposed the theory of relativ...,1.0,1.0,0.532551,0.930206,1.0,1.0
1,Who was the first computer programmer?,[Ada Lovelace is regarded as the first compute...,"The first computer programmer is Ada Lovelace,...",Ada Lovelace is regarded as the first computer...,1.0,0.980095,0.993265,0.973062,1.0,1.0
2,What did Isaac Newton contribute to science?,[Isaac Newton formulated the laws of motion an...,Isaac Newton contributed to science by formula...,Isaac Newton formulated the laws of motion and...,0.75,1.0,0.578342,0.980036,1.0,1.0
3,Who won two Nobel Prizes for research on radio...,[Marie Curie was a physicist and chemist who c...,Marie Curie won two Nobel Prizes for her resea...,Marie Curie was a physicist and chemist who co...,0.5,0.865491,0.67157,0.971996,1.0,1.0
4,What is the theory of evolution by natural sel...,[Charles Darwin introduced the theory of evolu...,"The theory of evolution by natural selection, ...",Charles Darwin introduced the theory of evolut...,0.5,0.916836,0.613654,0.954618,1.0,1.0


---

## RAGAs Metrics

Let's practice RAGAs metrics using a dataset tailored for the financial domain. `FiQA` dataset is designed for financial question-answering and recommendation tasks, making it suitable for RAG evaluations.
- FiQA contains financial questions and answers, a domain that often requires precise retrieval and factual consistency.
- It has annotated data that can help you test retrieval and generation accuracy.
- Financial data typically requires contextual understanding, making it a good test for RAG models.
- A sub dataset called `rag_eval` (part of FiQA dataset) contains contexts field. This field is necessary for obtaining metrics like context recall and context precision.

In [20]:
from ragas import evaluate
from ragas.llms import LangchainLLMWrapper

from ragas.metrics import (
    faithfulness,
    answer_relevancy,
    answer_correctness,
    answer_similarity,
    context_precision,
    context_recall,
)

from datasets import Dataset, load_dataset

from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings

import pandas as pd

# set the max columns to none
pd.set_option('display.max_columns', None)

In [21]:
# there are 3 sub datasets in FiQA which are corpus, main and raga_eval
# only raga_eval has an answer and contexts
# https://huggingface.co/datasets/explodinggradients/fiqa

fiqa_dataset = load_dataset("explodinggradients/fiqa", "ragas_eval")
fiqa_dataset

README.md:   0%|          | 0.00/112 [00:00<?, ?B/s]

fiqa.py:   0%|          | 0.00/6.60k [00:00<?, ?B/s]

0000.parquet:   0%|          | 0.00/115k [00:00<?, ?B/s]

Generating baseline split:   0%|          | 0/30 [00:00<?, ? examples/s]

DatasetDict({
    baseline: Dataset({
        features: ['question', 'ground_truths', 'answer', 'contexts'],
        num_rows: 30
    })
})

In [22]:
# convert to dataframe

df = fiqa_dataset['baseline'].to_pandas()
df

Unnamed: 0,question,ground_truths,answer,contexts
0,How to deposit a cheque issued to an associate...,[Have the check reissued to the proper payee.J...,\nThe best way to deposit a cheque issued to a...,[Just have the associate sign the back and the...
1,Can I send a money order from USPS as a business?,[Sure you can. You can fill in whatever you w...,"\nYes, you can send a money order from USPS as...",[Sure you can. You can fill in whatever you w...
2,1 EIN doing business under multiple business n...,[You're confusing a lot of things here. Compan...,"\nYes, it is possible to have one EIN doing bu...",[You're confusing a lot of things here. Compan...
3,Applying for and receiving business credit,"[""I'm afraid the great myth of limited liabili...",\nApplying for and receiving business credit c...,[Set up a meeting with the bank that handles y...
4,401k Transfer After Business Closure,[You should probably consult an attorney. Howe...,\nIf your employer has closed and you need to ...,[The time horizon for your 401K/IRA is essenti...
5,What are the ins/outs of writing equipment pur...,[Most items used in business have to be deprec...,\nWriting equipment purchases off as business ...,[You would report it as business income on Sch...
6,Can a entrepreneur hire a self-employed busine...,[Yes. I can by all means start my own company ...,"\nYes, an entrepreneur can hire a self-employe...",[Yes. I can by all means start my own company ...
7,Intentions of Deductible Amount for Small Busi...,"[""If your sole proprietorship losses exceed al...",\nThe intention of deductible amounts for smal...,"[""Short answer, yes. But this is not done thro..."
8,How can I deposit a check made out to my busin...,[You should have a separate business account. ...,\nYou can deposit a check made out to your bus...,"[""I have checked with Bank of America, and the..."
9,Filing personal with 1099s versus business s-c...,[Depends whom the 1099 was issued to. If it wa...,\nFiling personal taxes with 1099s versus fili...,[Depends whom the 1099 was issued to. If it wa...


In [23]:
data_samples = {
    'user_input' : list(df['question']),
    'response' : list(df['answer']),
    'reference' : df['ground_truths'],
    'retrieved_contexts' : df['contexts']
}

In [24]:
dataset = Dataset.from_dict(data_samples)
rag_df = pd.DataFrame(dataset)

In [25]:
# why need to do the conversion?
# because RAGAs does not accept "reference" as a list

rag_df['reference'] = rag_df['reference'].apply(lambda x: x[0])

In [26]:
rag_eval_dataset = Dataset.from_pandas(rag_df)
rag_eval_dataset.to_pandas()

Unnamed: 0,user_input,response,reference,retrieved_contexts
0,How to deposit a cheque issued to an associate...,\nThe best way to deposit a cheque issued to a...,Have the check reissued to the proper payee.Ju...,[Just have the associate sign the back and the...
1,Can I send a money order from USPS as a business?,"\nYes, you can send a money order from USPS as...",Sure you can. You can fill in whatever you wa...,[Sure you can. You can fill in whatever you w...
2,1 EIN doing business under multiple business n...,"\nYes, it is possible to have one EIN doing bu...",You're confusing a lot of things here. Company...,[You're confusing a lot of things here. Compan...
3,Applying for and receiving business credit,\nApplying for and receiving business credit c...,"""I'm afraid the great myth of limited liabilit...",[Set up a meeting with the bank that handles y...
4,401k Transfer After Business Closure,\nIf your employer has closed and you need to ...,You should probably consult an attorney. Howev...,[The time horizon for your 401K/IRA is essenti...
5,What are the ins/outs of writing equipment pur...,\nWriting equipment purchases off as business ...,Most items used in business have to be depreci...,[You would report it as business income on Sch...
6,Can a entrepreneur hire a self-employed busine...,"\nYes, an entrepreneur can hire a self-employe...",Yes. I can by all means start my own company a...,[Yes. I can by all means start my own company ...
7,Intentions of Deductible Amount for Small Busi...,\nThe intention of deductible amounts for smal...,"""If your sole proprietorship losses exceed all...","[""Short answer, yes. But this is not done thro..."
8,How can I deposit a check made out to my busin...,\nYou can deposit a check made out to your bus...,You should have a separate business account. M...,"[""I have checked with Bank of America, and the..."
9,Filing personal with 1099s versus business s-c...,\nFiling personal taxes with 1099s versus fili...,Depends whom the 1099 was issued to. If it was...,[Depends whom the 1099 was issued to. If it wa...


In [27]:
# metrics to track

metrics = [
    faithfulness,
    answer_relevancy,
    answer_correctness,
    answer_similarity,
    context_precision,
    context_recall,
]

In [28]:
# set up the LLM and embedding models

embeddings = OpenAIEmbeddings()

llm_model_gpt_35_turbo = ChatOpenAI(
    model="gpt-3.5-turbo"
)

In [29]:
# evaulate the rag_eval_dataeset

evaluator_llm = LangchainLLMWrapper(llm_model_gpt_35_turbo)

results = evaluate(
    dataset=rag_eval_dataset,
    metrics=metrics,
    llm=llm_model_gpt_35_turbo,
    embeddings=embeddings
)

Evaluating:   0%|          | 0/180 [00:00<?, ?it/s]

In [30]:
# print average scores

for key, value in results._repr_dict.items():
    print(f"{key} = {value:.4f}")

faithfulness = 0.6719
answer_relevancy = 0.8094
answer_correctness = 0.4038
semantic_similarity = 0.8842
context_precision = 0.9000
context_recall = 0.6925


In [31]:
results_df = results.to_pandas()

results_df

Unnamed: 0,user_input,retrieved_contexts,response,reference,faithfulness,answer_relevancy,answer_correctness,semantic_similarity,context_precision,context_recall
0,How to deposit a cheque issued to an associate...,[Just have the associate sign the back and the...,\nThe best way to deposit a cheque issued to a...,Have the check reissued to the proper payee.Ju...,0.6,0.98252,0.227103,0.908412,1.0,0.875
1,Can I send a money order from USPS as a business?,[Sure you can. You can fill in whatever you w...,"\nYes, you can send a money order from USPS as...",Sure you can. You can fill in whatever you wa...,0.833333,0.973122,0.557659,0.944922,1.0,1.0
2,1 EIN doing business under multiple business n...,[You're confusing a lot of things here. Compan...,"\nYes, it is possible to have one EIN doing bu...",You're confusing a lot of things here. Company...,0.8,0.948733,0.209942,0.839766,1.0,1.0
3,Applying for and receiving business credit,[Set up a meeting with the bank that handles y...,\nApplying for and receiving business credit c...,"""I'm afraid the great myth of limited liabilit...",1.0,0.904752,0.402882,0.944862,1.0,1.0
4,401k Transfer After Business Closure,[The time horizon for your 401K/IRA is essenti...,\nIf your employer has closed and you need to ...,You should probably consult an attorney. Howev...,0.444444,0.894578,0.199907,0.799629,0.0,0.0
5,What are the ins/outs of writing equipment pur...,[You would report it as business income on Sch...,\nWriting equipment purchases off as business ...,Most items used in business have to be depreci...,0.923077,0.953613,0.452152,0.919719,1.0,0.866667
6,Can a entrepreneur hire a self-employed busine...,[Yes. I can by all means start my own company ...,"\nYes, an entrepreneur can hire a self-employe...",Yes. I can by all means start my own company a...,0.833333,0.999347,0.360164,0.840654,1.0,1.0
7,Intentions of Deductible Amount for Small Busi...,"[""Short answer, yes. But this is not done thro...",\nThe intention of deductible amounts for smal...,"""If your sole proprietorship losses exceed all...",0.666667,0.872493,0.200744,0.802975,1.0,0.0
8,How can I deposit a check made out to my busin...,"[""I have checked with Bank of America, and the...",\nYou can deposit a check made out to your bus...,You should have a separate business account. M...,0.5,0.936057,0.392478,0.903246,1.0,0.4
9,Filing personal with 1099s versus business s-c...,[Depends whom the 1099 was issued to. If it wa...,\nFiling personal taxes with 1099s versus fili...,Depends whom the 1099 was issued to. If it was...,0.428571,0.0,0.488569,0.895451,1.0,1.0


## Faithfulness

**Input**: generated answer and context.

Factual consistency of the generated answer against the given context.

In [32]:
# you may need to adjust the index to show a low score

print(f"Input:\n{results_df.iloc[20]['user_input']}")
print('-'*60)
print(f"Ground Truth:\n{results_df.iloc[20]['reference']}")
print('-'*60)
print(f"Answer: {results_df.iloc[20]['response']}")
print('-'*60)
print(f"Faithfulness:\n{results_df.iloc[20]['faithfulness']:.4f}")

Input:
What percentage of my company should I have if I only put money?
------------------------------------------------------------
Ground Truth:
There is no universal answer here; it depends on how much risk each person is taking, how you want to define the value of the business now and in the future, how much each person's contribution is essential to creating and sustaining the business, how hard it would be to get those resources elsewhere and what they would cost... What is fair is whatever you folks agree is fair. Just make sure to get it nailed down in writing and signed by all the parties, so you don't risk someone changing their minds later.Question (which you need to ask yourself): How well are your friends paid for their work? What would happen if you just took your money and bought a garage, and hired two car mechanics? How would that be different from what you are doing?  The money that you put into the company, is that paid in capital, or is it a loan to the company that

---

In [33]:
# you may need to adjust the index to show a high score

print(f"Input:\n{results_df.iloc[24]['user_input']}")
print('-'*60)
print(f"Ground Truth:\n{results_df.iloc[24]['reference']}")
print('-'*60)
print(f"Answer: {results_df.iloc[24]['response']}")
print('-'*60)
print(f"Faithfulness:\n{results_df.iloc[24]['faithfulness']:.4f}")

Input:
Following an investment guru a good idea?
------------------------------------------------------------
Ground Truth:
"The best answer here is ""maybe, but probably not"". A few quick reasons: Its not a bad idea to watch other investors especially those who can move markets but do your own research on an investment first. Your sole reason for investing should not be ""Warren did it""."I think following the professional money managers is a strategy worth considering. The buys from your favorite investors can be taken as strong signals. But you should never buy any stock blindly just because someone else bought it. Be sure do your due diligence before the purchase. The most important question is not what they bought, but why they bought it and how much. To add/comment on Freiheit's points:
------------------------------------------------------------
Answer: 
No, following an investment guru is not necessarily a good idea. It is important to do your own research and due diligence be

## Answer Relevancy

**input**: question and generated answer.

How relevant is the generated answer to the input (prompt).

In [34]:
# you may need to adjust the index to show a high score

print(f"Input:\n{results_df.iloc[6]['user_input']}")
print('-'*60)
print(f"Ground Truth:\n{results_df.iloc[6]['reference']}")
print('-'*60)
print(f"Answer: {results_df.iloc[6]['response']}")
print('-'*60)
print(f"Answer Relevancy:\n{results_df.iloc[6]['answer_relevancy']:.4f}")

Input:
Can a entrepreneur hire a self-employed business owner?
------------------------------------------------------------
Ground Truth:
Yes. I can by all means start my own company and name myself CEO. If Bill Gates wanted to hire me, I'll take the offer and still be CEO of my own company. Now, whether or not my company makes money and survives is another question.  This is the basis of self-employed individuals who contract out their services.
------------------------------------------------------------
Answer: 
Yes, an entrepreneur can hire a self-employed business owner. However, the self-employed business owner must be careful to ensure that their payments are accounted for as self-employment income and not as directors' remuneration, which would be subject to PAYE and NIC. Additionally, the entrepreneur should ensure that the self-employed business owner is not providing services as an employee or office holder, but as a self-employed contractor.
--------------------------------

## Answer Correctness

**Input**: generated answer and ground truth.

An assessment of answer corrrectness, considering both factual accuracy and overall similarity.

In [35]:
# show the row which has the maximum score for answer correctness

print(f"Answer Correctness = {results_df['answer_correctness'].max():.4f}")
print('-'*60)
idx_max = results_df['answer_correctness'].idxmax()
print(results_df.iloc[idx_max])

Answer Correctness = 0.7969
------------------------------------------------------------
user_input             Employer options when setting up 401k for empl...
retrieved_contexts     [Pre-Enron many companies forced the 401K matc...
response               \nWhen setting up a 401k plan for employees, e...
reference              If you were looking to maximize your ability t...
faithfulness                                                    0.444444
answer_relevancy                                                0.961961
answer_correctness                                              0.796854
semantic_similarity                                             0.839589
context_precision                                                    1.0
context_recall                                                       0.0
Name: 23, dtype: object


---

In [36]:
# high score

print(f"Input:\n{results_df.iloc[idx_max]['user_input']}")
print('-'*60)
print(f"Ground Truth:\n{results_df.iloc[idx_max]['reference']}")
print('-'*60)
print(f"Answer: {results_df.iloc[idx_max]['response']}")
print('-'*60)

Input:
Employer options when setting up 401k for employees
------------------------------------------------------------
Ground Truth:
If you were looking to maximize your ability to save in a qualified plan, why not setup a 401K plan in Company A and keep the SEP in B? Setup the 401K in A such that any employee can contribute 100% of their salary.  Then take a salary for around 19K/year (assuming under age 50), so you can contribute and have enough to cover SS taxes.   Then continue to move dividends to Company A, and continue the SEP in B.  This way if you are below age 50, you can contribute 54K (SEP limit) + 18K (IRA limit) + 5500 (ROTH income dependent) to a qualified plan.
------------------------------------------------------------
Answer: 
When setting up a 401k plan for employees, employers have a variety of options. They can choose the type of investments that will be available to employees, such as stocks, bonds, mutual funds, and ETFs. They can also decide how much of the em

---

In [37]:
# low score

print(f"Answer Correctness = {results_df['answer_correctness'].min():.4f}")
print('-'*60)
idx_min = results_df['answer_correctness'].idxmin()
print(results_df.iloc[idx_min])

Answer Correctness = 0.1999
------------------------------------------------------------
user_input                          401k Transfer After Business Closure
retrieved_contexts     [The time horizon for your 401K/IRA is essenti...
response               \nIf your employer has closed and you need to ...
reference              You should probably consult an attorney. Howev...
faithfulness                                                    0.444444
answer_relevancy                                                0.894578
answer_correctness                                              0.199907
semantic_similarity                                             0.799629
context_precision                                                    0.0
context_recall                                                       0.0
Name: 4, dtype: object


In [38]:
print(f"Input:\n{results_df.iloc[idx_min]['user_input']}")
print('-'*60)
print(f"Ground Truth:\n{results_df.iloc[idx_min]['reference']}")
print('-'*60)
print(f"Answer: {results_df.iloc[idx_min]['response']}")
print('-'*60)

Input:
401k Transfer After Business Closure
------------------------------------------------------------
Ground Truth:
You should probably consult an attorney. However, if the owner was a corporation/LLC and it has been officially dissolved, you can provide an evidence of that from your State's department of State/Corporations to show that their request is unfeasible. If the owner was a sole-proprietor, then that may be harder as you'll need to track the person down and have him close the plan.
------------------------------------------------------------
Answer: 
If your employer has closed and you need to transfer your 401k funds, you should contact the HR department of your former employer to get the necessary paperwork and instructions for the transfer. Depending on the plan, you may be able to transfer the funds to another 401k plan, such as the 401k plan of your new employer, or you may need to transfer the funds to an IRA. If you transfer the funds to an IRA, you will have more i

## Answer Similarity (Semantic Similarity)

**Input**: generated answer and ground truth.

Scores the semantic similarity of ground truth with generated answer.

In [39]:
# you may need to adjust the index to show a high score

print(f"Input:\n{results_df.iloc[3]['user_input']}")
print('-'*60)
print(f"Ground Truth:\n{results_df.iloc[3]['reference']}")
print('-'*60)
print(f"Answer: {results_df.iloc[3]['response']}")
print('-'*60)
print(f"Answer Similarity:\n{results_df.iloc[3]['semantic_similarity']:.4f}")

Input:
Applying for and receiving business credit
------------------------------------------------------------
Ground Truth:
"I'm afraid the great myth of limited liability companies is that all such vehicles have instant access to credit.  Limited liability on a company with few physical assets to underwrite the loan, or with insufficient revenue, will usually mean that the owners (or others) will be asked to stand surety on any credit. However, there is a particular form of ""credit"" available to businesses on terms with their clients.  It is called factoring. Factoring is a financial transaction   whereby a business sells its accounts   receivable (i.e., invoices) to a third   party (called a factor) at a discount   in exchange for immediate money with   which to finance continued business.   Factoring differs from a bank loan in   three main ways. First, the emphasis   is on the value of the receivables   (essentially a financial asset), not   the firm’s credit worthiness.   Secon

## Context Precision
**Input**: ground truth and context

All the ground truth relevants items present in the contexts are ranked higher or not. All the relevent chunks must appear at the top ranks.

**Check**: https://github.com/explodinggradients/ragas/issues/365 for discrepancy in definition.

In [40]:
# you may need to adjust the index to show a high score

print(f"Input:\n{results_df.iloc[15]['user_input']}")
print('-'*60)
print(f"Ground Truth:\n{results_df.iloc[15]['reference']}")
print('-'*60)
print(f"Answer: {results_df.iloc[15]['response']}")
print('-'*60)
print(f"Answer Similarity:\n{results_df.iloc[15]['context_precision']:.4f}")

Input:
Do I need a new EIN since I am hiring employees for my LLC?
------------------------------------------------------------
Ground Truth:
I called the IRS (click here for IRS contact info) and they said I do not need to get a new EIN.  I could have just filed the appropriate employer federal tax return (940/941) and then the filing requirements would have been updated.  But while I was on the phone, they just updated the filing requirements for my LLC so I am all good now (I still need to file the correct form and make the correct payments, etc. but I can use this same EIN going forward). Disclaimer: Don't trust me (or this answer) for tax advice (your situation may be different).  The IRS person on the phone was very helpful so I recommend calling them if you are in a similar situation.  FYI, I have found calling the IRS to always be very helpful.
------------------------------------------------------------
Answer: 
No, you do not need a new EIN since you are hiring employees for 

## Context Recall

**Input**: ground truth and context

Statement from the ground truth can be found in the retrieved context.

In [41]:
# you may need to adjust the index to show a high score

print(f"Input:\n{results_df.iloc[27]['user_input']}")
print('-'*60)
print(f"Ground Truth:\n{results_df.iloc[27]['reference']}")
print('-'*60)
print(f"Answer: {results_df.iloc[27]['response']}")
print('-'*60)
print(f"Answer Similarity:\n{results_df.iloc[27]['context_recall']:.4f}")

Input:
Will one’s education loan application be rejected if one doesn't have a payslip providing collateral?
------------------------------------------------------------
Ground Truth:
A bank can reject a loan if they feel you do not meet the eligibility criteria.  You can talk to few banks and find out.
------------------------------------------------------------
Answer: 
It is possible that one's education loan application could be rejected if one does not have a payslip providing collateral. Banks may require proof of income or other forms of collateral in order to approve a loan. It is important to check with the bank to find out what their requirements are.
------------------------------------------------------------
Answer Similarity:
1.0000
