Skip to content

Faithfullness returns always Nan with a local LLM #1274

@cpolcino

Description

@cpolcino

[ x] I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug
Faithfullness is not working returning always Nan, sometimes also the context precision

Python version: 3.11.9
Ragas version: 0.1.15

Code to Reproduce

data_samples_2 = {
'question': ["What are the dimensions of twist?"],
'answer': ["The limits for the mechanical characteristics of twist shall be ≤ 1.5% for board thickness < 1.6 mm and ≤ 1.1% for board thickness ≥ 1.6 mm."],
'contexts': [
["Twist shall be measured in conformance with the test method 2.4.22c from IPC-TM-650.",
"The PCB shall be placed on a horizontal surface so that it rests on three corners.",
"The distance between the horizontal surface and the fourth corner of the PCB shall be measured as specified in Figure 9-3.",
"The length of the diagonal of the PCB shall be measured.",
"The twist shall be expressed in percentage terms.",
"Twist shall be calculated as follows: Twist [%] = max distance [mm] / (2 x length of PCB diagonal [mm]) x 100",
"The maximum twist shall be ≤ 1,5 %.",
"The procurement authority may specify a more stringent requirement for twist in the PCB definition dossier."]
],
'ground_truth': ["""Based on the information provided in the document, the key dimensions related to twist in PCBs are:

  1. The twist is measured as the distance between the horizontal surface and the fourth corner of the PCB when it rests on three corners.
  2. The length of the diagonal of the PCB is measured.
  3. The twist is expressed as a percentage and calculated using the following formula:
    Twist [%] = max distance [mm] / (2 x length of PCB diagonal [mm]) x 100
  4. The maximum allowable twist is specified as ≤ 1.5%.
  5. The document mentions that the procurement authority may specify a more stringent requirement for twist in the PCB definition dossier, with a typical stringent twist requirement being ≤ 0.75%.
    So the key dimensions are:
  • The maximum distance between the fourth corner and the horizontal surface
  • The length of the PCB diagonal
  • The calculated percentage of twist
  • The maximum allowable percentage (1.5% or potentially lower)
    The document also includes a diagram illustrating how twist is measured on a PCB."""]
    }

from langchain_community.vectorstores import FAISS
from langchain_community.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.embeddings import OpenAIEmbeddings
from langchain_community.embeddings import HuggingFaceBgeEmbeddings
from langchain.chains import RetrievalQA
import os
import time
from langchain.llms import Ollama
from langchain.callbacks.manager import CallbackManagerForLLMRun
from langchain.embeddings import HuggingFaceEmbeddings
from typing import Any, List, Optional

os.environ["CUDA_VISIBLE_DEVICES"] = "6"

Initialize Ollama LLM

llm = Ollama(model="llama3.1:latest")

Initialize Sentence Transformers embedding model

embedding_model = HuggingFaceEmbeddings(
model_name="sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
model_kwargs={'device': 'cpu'},
encode_kwargs={'normalize_embeddings': True} # set True to compute cosine similarity
)

from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper

llm = LangchainLLMWrapper(llm)
embedding_model = LangchainEmbeddingsWrapper(embedding_model)

from datasets import Dataset
from ragas.metrics import faithfulness, answer_relevancy, context_precision, context_recall
from ragas import evaluate

dataset = Dataset.from_dict(data_samples_2)

score = evaluate(dataset, metrics=[faithfulness, answer_relevancy, context_precision, context_recall], llm=llm, embeddings=embedding_model)
score.to_pandas()
print(score)

Error trace
RAGAS_FOSCO_06_09
Last Checkpoint: yesterday
[Python 3 (ipykernel)]
import sys
import ragas
print(f"Python version: {sys.version}")

Versione di Ragas

print(f"Ragas version: {ragas.version}")

Python version: 3.11.9 (main, Apr 24 2024, 09:31:52) [GCC 14.0.1 20240411 (Red Hat 14.0.1-0)]
Ragas version: 0.1.15

#ECSS-Q-ST-70-60-C
data_samples_2 = {
'question': ["What are the dimensions of twist?"],
'answer': ["The limits for the mechanical characteristics of twist shall be ≤ 1.5% for board thickness < 1.6 mm and ≤ 1.1% for board thickness ≥ 1.6 mm."],
'contexts': [
["Twist shall be measured in conformance with the test method 2.4.22c from IPC-TM-650.",
"The PCB shall be placed on a horizontal surface so that it rests on three corners.",
"The distance between the horizontal surface and the fourth corner of the PCB shall be measured as specified in Figure 9-3.",
"The length of the diagonal of the PCB shall be measured.",
"The twist shall be expressed in percentage terms.",
"Twist shall be calculated as follows: Twist [%] = max distance [mm] / (2 x length of PCB diagonal [mm]) x 100",
"The maximum twist shall be ≤ 1,5 %.",
"The procurement authority may specify a more stringent requirement for twist in the PCB definition dossier."]
],
'ground_truth': ["""Based on the information provided in the document, the key dimensions related to twist in PCBs are:

  1. The twist is measured as the distance between the horizontal surface and the fourth corner of the PCB when it rests on three corners.
  2. The length of the diagonal of the PCB is measured.
  3. The twist is expressed as a percentage and calculated using the following formula:
    Twist [%] = max distance [mm] / (2 x length of PCB diagonal [mm]) x 100
  4. The maximum allowable twist is specified as ≤ 1.5%.
  5. The document mentions that the procurement authority may specify a more stringent requirement for twist in the PCB definition dossier, with a typical stringent twist requirement being ≤ 0.75%.
    So the key dimensions are:
  • The maximum distance between the fourth corner and the horizontal surface
  • The length of the PCB diagonal
  • The calculated percentage of twist
  • The maximum allowable percentage (1.5% or potentially lower)
    The document also includes a diagram illustrating how twist is measured on a PCB."""]
    }
    from langchain_community.vectorstores import FAISS
    from langchain_community.vectorstores import Chroma
    from langchain.text_splitter import CharacterTextSplitter
    from langchain_community.embeddings import OpenAIEmbeddings
    from langchain_community.embeddings import HuggingFaceBgeEmbeddings
    from langchain.chains import RetrievalQA
    import os
    import time
    from langchain.llms import Ollama
    from langchain.callbacks.manager import CallbackManagerForLLMRun
    from langchain.embeddings import HuggingFaceEmbeddings
    from typing import Any, List, Optional

os.environ["CUDA_VISIBLE_DEVICES"] = "6"

Initialize Ollama LLM

llm = Ollama(model="llama3.1:latest")

Initialize Sentence Transformers embedding model

embedding_model = HuggingFaceEmbeddings(
model_name="sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
model_kwargs={'device': 'cpu'},
encode_kwargs={'normalize_embeddings': True} # set True to compute cosine similarity
)

from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper

llm = LangchainLLMWrapper(llm)
embedding_model = LangchainEmbeddingsWrapper(embedding_model)

from datasets import Dataset
from ragas.metrics import faithfulness, answer_relevancy, context_precision, context_recall
from ragas import evaluate

dataset = Dataset.from_dict(data_samples_2)

score = evaluate(dataset, metrics=[faithfulness, answer_relevancy, context_precision, context_recall], llm=llm, embeddings=embedding_model)
score.to_pandas()
print(score)

/home/user/.pyenv/versions/3.11.9/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
warnings.warn(

Evaluating: 100%
 4/4 [01:29<00:00, 20.72s/it]

Failed to parse output. Returning None.

{'faithfulness': nan, 'answer_relevancy': 0.8362, 'context_precision': 1.0000, 'context_recall': 0.4000}

Expected behavior
i want to give to faithfullness a value

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions