# 1. Faithfulness

忠诚性。让LLM判断，答案是否能够根据上下文推断出来，如果可以标记为1，否则标记为0.

结果为: 标记为1的总数 / 总数据。

范围在 0 ～ 1之间。

参数：

- name: str, 指标名称,默认为"faithfulness"
- statement_generator_prompt: 语句生成器提示词，用于将答案分解成一个一个的子句。
- nli_prompt: NLI(Natural Language Inference, 自然语言推理)提示词。
- max_retries: 最大重试次数，默认值为1。


*******************************************************************
- _required_columns: 需要在数据信息, ["user_input", "retrieved_contexts", "response"]


## 1.1 statement_generator_prompt-语句生成器提示词


1. 提示词内容：

Given a question and an answer, analyze the complexity of each sentence in the answer. Break down each sentence into one or more fully understandable statements. Ensure that no pronouns are used in any statement. Format the outputs in JSON.

中文翻译:

给定一个问题和一个答案，分析答案中每个句子的复杂度。将每个句子分解成一个或多个完全可理解的语句。确保任何语句中均未使用代词。将输出格式化为 JSON。

例子：

StatementGeneratorInput:

- question: Who was Albert Einstein and what is he best known for?
- answer: He was a German-born theoretical physicist, widely acknowledged to be one of the greatest and most influential physicists of all time. He was best known for developing the theory of relativity, he also made important contributions to the development of the theory of quantum mechanics.

StatementGeneratorOutput:

- "Albert Einstein was a German-born theoretical physicist.",
- "Albert Einstein is recognized as one of the greatest and most influential physicists of all time.",
- "Albert Einstein was best known for developing the theory of relativity.",
- "Albert Einstein also made important contributions to the development of the theory of quantum mechanics."

中文翻译：

输入：

- question: 阿尔伯特·爱因斯坦是谁？他以什么而闻名？
- answer: 他是一位出生于德国的理论物理学家，被公认为有史以来最伟大、最具影响力的物理学家之一。他因创立相对论而闻名，同时也为量子力学理论的发展做出了重要贡献。

输出：

- 阿尔伯特·爱因斯坦是一位出生于德国的理论物理学家。
- 阿尔伯特·爱因斯坦被公认为有史以来最伟大、最具影响力的物理学家之一。
- 阿尔伯特·爱因斯坦最著名的成就是创立了相对论。
- 阿尔伯特·爱因斯坦还为量子力学理论的发展做出了重要贡献。


## 1.2 nli_prompt - NLI提示词

提示词内容：

Your task is to judge the faithfulness of a series of statements based on a given context. For each statement you must return verdict as 1 if the statement can be directly inferred based on the context or 0 if the statement can not be directly inferred based on the context.

中文翻译：

你的任务是根据给定的上下文判断一系列语句的真实性。对于每个语句，如果该语句可以根据上下文直接推断出来，则返回判定结果 1；如果该语句无法根据上下文直接推断出来，则返回判定结果 0。

# 1.3 结果计算

用判断为1的总数 / 判定的问题总数。

In [1]:
from ragas import evaluate, RunConfig
from ragas.metrics import Faithfulness
from langchain_ollama import ChatOllama
from ragas.llms import LangchainLLMWrapper
from langchain_ollama import ChatOllama
from ragas import EvaluationDataset

dataset = EvaluationDataset.from_jsonl("questions_answers_01.txt")

llm = ChatOllama(model="qwen3:8b", reasoning=False)
# <think>xxxxxxx...</think> {"xxx": "xxx"}
metric = Faithfulness()

result = evaluate(
    dataset=dataset,
    metrics=[metric],
    llm=LangchainLLMWrapper(llm),
    run_config=RunConfig(
        timeout=600,
        max_retries=3,
        max_workers=4
    ),
)
result

  from .autonotebook import tqdm as notebook_tqdm
Evaluating: 100%|████████████████████████████| 67/67 [1:26:56<00:00, 77.86s/it]


{'faithfulness': 0.9090}

In [None]:
# {"statements": [{"statement": "xxx", "reason": "xxxx", "verdict": 1}]}
{"text": { "statements": [{"statement": "xxx", "reason": "xxxx", "verdict": 1}] }}

# 4. 对比

|对比| FactualCorrectness(实时正确性) | Faithfulness(忠诚性)|
|-|-|-|
|比较对象| response(响应/回答)和reference(参考答案)|response(响应/回答)/retrieved_contexts(检索到的上下文)|
| 分解提示词 | claim_decomposition_prompt(主张分解提示词) | statement_generator_prompt(语句生成器提示词) |
| 评判提示词 | nli_prompt(NLI提示词) | nli_prompt(NLI提示词) |
| 结果计算 | 精确度/召回率/F1分数 | 忠诚度 |

NLI: Natural Language Inference, 自然语言推理
