# RAG Metrics

## Metrics definition
* **Answer Correctness**: evaluates the `answer` with the `ground truth` and returns a score between 0 and 1.
* **Context Precision**: evaluates how well documents are retrieved by the retriever based on the `question`.
* **Context Recall**: evaluates how well documents are retrieved by the retriever based on the `ground truth` answer.

In [1]:
import huggingface_hub
from langchain_community.chat_models.openai import ChatOpenAI

llm = ChatOpenAI(
    model_name="NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO",
    openai_api_key=huggingface_hub.get_token(),
    openai_api_base="https://api-inference.huggingface.co/v1/",
    max_tokens=1024,
    temperature=0.1,
)

  warn_deprecated(


In [2]:
fake = {
    "question": "What is the capital of France?",
    "ground_truth": "Paris",
    "answer": "Paris",
    "contexts": ["Paris is the capital of France." , "Charlie likes to eat chocolate.", "1776 Paris was the center of a war."],
  }

## Context Recall

Context recall measures the extent to which the retrieved context aligns with the ground truth. It is computed based on the ground truth and the retrieved context, and the values range between 0 and 1, with higher values indicating better performance.


In [3]:
from easyrag.metrics import ContextRecall

context_recall = ContextRecall(llm=llm,verbose=True)

In [4]:
context_recall.compute(ground_truth=fake["ground_truth"], contexts=fake["contexts"])

INFO:easyrag.metrics.context_recall:LLM prompt: [SystemMessage(content="Given a context, and an answer, analyze each sentence in the answer and classify if the sentence can be attributed to the given context or not. Use only 'Yes' (1) or 'No' (0) as a binary classification. Output json with reason. Output in only valid JSON format."), HumanMessage(content='context: "Albert Einstein (14 March 1879 – 18 April 1955) was a German-born theoretical physicist, widely held to be one of the greatest and most influential scientists of all time. Best known for developing the theory of relativity, he also made important contributions to quantum mechanics, and was thus a central figure in the revolutionary reshaping of the scientific understanding of nature that modern physics accomplished in the first decades of the twentieth century. His mass–energy equivalence formula E = mc2, which arises from relativity theory, has been called \'the world\'s most famous equation\'. He received the 1921 Nobel P

1.0

## Context Precision

Context Precision is a metric that evaluates whether all of the ground-truth relevant items present in the contexts are ranked higher or not. Ideally all the relevant chunks must appear at the top ranks. This metric is computed using the question and the contexts, with values ranging between 0 and 1, where higher scores indicate better precision.

In [5]:
from easyrag.metrics import ContextPrecision

context_precision = ContextPrecision(llm=llm,verbose=True)

In [6]:
context_precision.compute(quesiton=fake["question"], answer=fake["answer"], contexts=fake["contexts"])

INFO:easyrag.metrics.context_precision:LLM prompt: [SystemMessage(content='Given question, answer and context verify if the context was useful in arriving at the given answer. Give score as "1" if useful and "0" if not with json output.'), HumanMessage(content='question: "What can you tell me about albert Albert Einstein?"\ncontext: "Albert Einstein (14 March 1879 – 18 April 1955) was a German-born theoretical physicist, widely held to be one of the greatest and most influential scientists of all time. Best known for developing the theory of relativity, he also made important contributions to quantum mechanics, and was thus a central figure in the revolutionary reshaping of the scientific understanding of nature that modern physics accomplished in the first decades of the twentieth century. His mass–energy equivalence formula E = mc2, which arises from relativity theory, has been called "the world\'s most famous equation". He received the 1921 Nobel Prize in Physics "for his services t

0.6666666666666666

## Answer Correctness

Answer Correctness is a metric that evaluates the correctness of the generated answer. It is computed based on the ground truth and the generated answer and returns a score with 0 INCORRECT and 1 CORRECT.

In [7]:
from easyrag.metrics import AnswerCorrectness
answer_correctness  = AnswerCorrectness(llm=llm,verbose=True)

In [8]:
answer_correctness.compute(quesiton=fake["question"], answer=fake["answer"], ground_truth=fake["ground_truth"])

INFO:easyrag.metrics.answer_correctness:LLM prompt: [SystemMessage(content="You are an expert evaluation system for a question answering chatbot.\n\nYou are given the following information:\n- a question, \n- a ground_truth\n- a answer.\n\nYour job is to score the correctness of the answer. Use only 'Correct' (1) or 'Incorrect' (0) as a binary classification. Output json with reason. Output in only valid JSON format."), HumanMessage(content='question: "What powers the sun and what is its primary function?"\nground_truth: "The sun is actually powered by nuclear fusion, not fission. In its core, hydrogen atoms fuse to form helium, releasing a tremendous amount of energy. This energy is what lights up the sun and provides heat and light, essential for life on Earth. The sun\'s light also plays a critical role in Earth\'s climate system and helps to drive the weather and ocean currents."\nanswer: "The sun is powered by nuclear fission, similar to nuclear reactors on Earth, and its primary 

1.0

## Answer Faithfulness

The Answer Faithfulness is metric evaluates whether the response is faithful to any of the retrieved contexts. It is computed using the retrieved contexts, and the generated answer, and returns a score with 0 NOT FAITHFUL and 1 FAITHFUL.

In [9]:
from easyrag.metrics import AnswerFaithfulness
answer_faithfulness  = AnswerFaithfulness(llm=llm,verbose=True)

In [11]:
answer_faithfulness.compute(answer=fake["answer"], context=fake["contexts"])

INFO:easyrag.metrics.answer_faithfulness:LLM prompt: [SystemMessage(content="You are an expert evaluation system for a question answering chatbot.\n\nYou are given the following information:\n- a context including information, \n- a generated answer.\n\nYour job is to classify if the answer is supported by the context. Use only 'Faithfull' (1) if the any of the contexts supports the answer, even if most of the context is unrelated. Use 'Unfaithfull' (0) if the context doesn't provide any support for the answer. Output json with reason. Output in only valid JSON format."), HumanMessage(content='context: "John is a student at XYZ University. He is pursuing a degree in Computer Science. He is enrolled in several courses this semester, including Data Structures, Algorithms, and Database Management. John is a diligent student and spends a significant amount of time studying and completing assignments. He often stays late in the library to work on his projects."\nanswer: "John is majoring in

1.0

In [12]:
answer_faithfulness.compute(answer=fake["answer"], context="Today is a sunny day.")

INFO:easyrag.metrics.answer_faithfulness:LLM prompt: [SystemMessage(content="You are an expert evaluation system for a question answering chatbot.\n\nYou are given the following information:\n- a context including information, \n- a generated answer.\n\nYour job is to classify if the answer is supported by the context. Use only 'Faithfull' (1) if the any of the contexts supports the answer, even if most of the context is unrelated. Use 'Unfaithfull' (0) if the context doesn't provide any support for the answer. Output json with reason. Output in only valid JSON format."), HumanMessage(content='context: "John is a student at XYZ University. He is pursuing a degree in Computer Science. He is enrolled in several courses this semester, including Data Structures, Algorithms, and Database Management. John is a diligent student and spends a significant amount of time studying and completing assignments. He often stays late in the library to work on his projects."\nanswer: "John is majoring in

0.0