# RAGAS 기초 예제 - Faithfulness
## 작성자 : AISchool ( http://aischool.ai/%ec%98%a8%eb%9d%bc%ec%9d%b8-%ea%b0%95%ec%9d%98-%ec%b9%b4%ed%85%8c%ea%b3%a0%eb%a6%ac/ )
## Reference : https://github.com/explodinggradients/ragas

In [None]:
!pip install ragas langsmith nest_asyncio

Collecting ragas
  Downloading ragas-0.1.10-py3-none-any.whl (91 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m91.5/91.5 kB[0m [31m835.8 kB/s[0m eta [36m0:00:00[0m
[?25hCollecting langsmith
  Downloading langsmith-0.1.83-py3-none-any.whl (127 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m127.5/127.5 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
Collecting datasets (from ragas)
  Downloading datasets-2.20.0-py3-none-any.whl (547 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m547.8/547.8 kB[0m [31m19.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting tiktoken (from ragas)
  Downloading tiktoken-0.7.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m29.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain (from ragas)
  Downloading langchain-0.2.6-py3-none-any.whl (975 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━

In [None]:
import nest_asyncio

In [None]:
# Apply nest_asyncio to allow nested event loops
nest_asyncio.apply()

# API Key 설정

In [None]:
import os
os.environ["OPENAI_API_KEY"] = "여러분의_OPENAI_API_KEY"

In [None]:
from uuid import uuid4

unique_id = uuid4().hex[0:8]

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = f"RAGAS Example - {unique_id}"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"
os.environ["LANGCHAIN_API_KEY"] = "여러분의_LANGSMITH_API_KEY"

In [None]:
unique_id

'42dafdd6'

# 1. Faithfulness 성능 측정
## Reference : https://docs.ragas.io/en/stable/concepts/metrics/faithfulness.html#

In [None]:
from ragas.metrics import context_precision, answer_relevancy, faithfulness
from ragas import evaluate

In [None]:
from datasets import Dataset

data_samples = {
    'question': ['Where and when was Einstein born?', 'Where and when was Einstein born?', 'Who won the most super bowls?'],
    'answer': ['Einstein was born in Germany on 14th March 1879.', 'Einstein was born in Germany on 20th March 1879.', 'The most super bowls have been won by The New England Patriots'],
    'contexts' : [['Albert Einstein (born 14 March 1879) was a German-born theoretical physicist, widely held to be one of the greatest and most influential scientists of all time'],['Albert Einstein (born 14 March 1879) was a German-born theoretical physicist, widely held to be one of the greatest and most influential scientists of all time'],
    ['The Green Bay Packers...Green Bay, Wisconsin.','The Packers compete...Football Conference']],
}
dataset = Dataset.from_dict(data_samples)

In [None]:
dataset

Dataset({
    features: ['question', 'answer', 'contexts'],
    num_rows: 3
})

In [None]:
result = evaluate(
    dataset,
    metrics=[faithfulness],
)

result

Evaluating:   0%|          | 0/3 [00:00<?, ?it/s]

{'faithfulness': 0.5000}

In [None]:
result

{'faithfulness': 0.5000}

# Faithfulness 측정 프롬프트 살펴보기

In [None]:
# question: Where and when was Einstein born?
# answer: Einstein was born in Germany on 14th March 1879.
# contexts:
#   - Albert Einstein (born 14 March 1879) was a German-born theoretical physicist,
# widely held to be one of the greatest and most influential scientists of all time

In [None]:
# output: 1

# 1단계 - 문장의 주장(Claims) 분석

In [None]:
# Given a question, an answer, and sentences from the answer analyze the complexity of each sentence given under 'sentences' and break down each sentence into one or more fully understandable statements while also ensuring no pronouns are used in each statement. Format the outputs in JSON.

# The output should be a well-formatted JSON instance that conforms to the JSON schema below.

# As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
# the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

# Here is the output JSON schema:
# ```
# {"type": "array", "items": {"$ref": "#/definitions/Statements"}, "definitions": {"Statements": {"title": "Statements", "type": "object", "properties": {"sentence_index": {"title": "Sentence Index", "description": "Index of the sentence from the statement list", "type": "integer"}, "simpler_statements": {"title": "Simpler Statements", "description": "the simpler statements", "type": "array", "items": {"type": "string"}}}, "required": ["sentence_index", "simpler_statements"]}}}
# ```

# Do not return any preamble or explanations, return only a pure JSON string surrounded by triple backticks (```).

# Examples:

# question: "Who was Albert Einstein and what is he best known for?"
# answer: "He was a German-born theoretical physicist, widely acknowledged to be one of the greatest and most influential physicists of all time. He was best known for developing the theory of relativity, he also made important contributions to the development of the theory of quantum mechanics."
# sentences: "\n        0:He was a German-born theoretical physicist, widely acknowledged to be one of the greatest and most influential physicists of all time. \n        1:He was best known for developing the theory of relativity, he also made important contributions to the development of the theory of quantum mechanics.\n        "
# analysis: ```[{"sentence_index": 0, "simpler_statements": ["Albert Einstein was a German-born theoretical physicist.", "Albert Einstein is recognized as one of the greatest and most influential physicists of all time."]}, {"sentence_index": 1, "simpler_statements": ["Albert Einstein was best known for developing the theory of relativity.", "Albert Einstein also made important contributions to the development of the theory of quantum mechanics."]}]```

# Your actual task:

# question: Where and when was Einstein born?
# answer: Einstein was born in Germany on 14th March 1879.
# sentences: 0:Einstein was born in Germany on 14th March 1879.
# analysis:


In [None]:
# 질문, 답변, 그리고 '문장들' 아래에 있는 답변에서 추출한 문장을 주어지면,
# 각 문장의 복잡성을 분석하고 각 문장을 하나 이상의 완전히 이해 가능한 문장으로 분해하십시오.
# 또한 각 문장에 대명사가 사용되지 않도록 하십시오. 출력 형식은 JSON으로 하십시오.

# 출력은 아래 JSON 스키마를 따르는 잘 형식화된 JSON 인스턴스여야 합니다.

# 예를 들어, 스키마가 {"properties": {"foo": {"title": "Foo", "description":
# "a list of strings", "type": "array", "items": {"type": "string"}}},
# "required": ["foo"]}인 경우,
# 객체 {"foo": ["bar", "baz"]}는 스키마에 잘 형식화된 인스턴스입니다.
# 객체 {"properties": {"foo": ["bar", "baz"]}}는 잘 형식화된 인스턴스가 아닙니다.

# 다음은 출력 JSON 스키마입니다:
# ```
# {"type": "array", "items": {"$ref": "#/definitions/Statements"},
# "definitions": {"Statements": {"title": "Statements", "type": "object",
# "properties": {"sentence_index": {"title": "Sentence Index",
# "description": "Index of the sentence from the statement list", "type": "integer"},
# "simpler_statements": {"title": "Simpler Statements",
# "description": "the simpler statements", "type": "array", "items": {"type": "string"}}},
# "required": ["sentence_index", "simpler_statements"]}}}
# ```

# 어떠한 서문이나 설명 없이, 삼중 백틱(```)으로 둘러싸인 순수 JSON 문자열만 반환하십시오.

# Examples:

# 질문: "알베르트 아인슈타인은 누구이며 무엇으로 가장 잘 알려져 있습니까?"
# 답변: "그는 독일 태생의 이론 물리학자로, 역사상 가장 위대하고 영향력 있는 물리학자 중 한 명으로 널리 인정받고 있습니다.
# 그는 상대성 이론을 개발한 것으로 가장 잘 알려져 있으며, 양자 역학 이론의 발전에도 중요한 기여를 했습니다."
# 문장들: "\n        0:그는 독일 태생의 이론 물리학자로, 역사상 가장 위대하고 영향력 있는 물리학자 중 한 명으로 널리 인정받고 있습니다. \n
# 1:그는 상대성 이론을 개발한 것으로 가장 잘 알려져 있으며, 양자 역학 이론의 발전에도 중요한 기여를 했습니다.\n        "
# 분석: ```[{"sentence_index": 0, "simpler_statements": ["알베르트 아인슈타인은 독일 태생의 이론 물리학자였습니다.",
# "알베르트 아인슈타인은 역사상 가장 위대하고 영향력 있는 물리학자 중 한 명으로 인정받고 있습니다."]},
# {"sentence_index": 1, "simpler_statements": ["알베르트 아인슈타인은 상대성 이론을 개발한 것으로 가장 잘 알려져 있습니다.",
# "알베르트 아인슈타인은 양자 역학 이론의 발전에도 중요한 기여를 했습니다."]}]```

# Your actual task:

# question: Where and when was Einstein born?
# answer: Einstein was born in Germany on 14th March 1879.
# sentences: 0:Einstein was born in Germany on 14th March 1879.
# analysis:

In [None]:
# ```[{"sentence_index": 0,
# "simpler_statements": ["Einstein was born in Germany.", "Einstein was born on 14th March 1879."]}]```

## 2단계 - 각 주장의 Faithfulness 측정

In [None]:
# Your task is to judge the faithfulness of a series of statements based on a given context. For each statement you must return verdict as 1 if the statement can be directly inferred based on the context or 0 if the statement can not be directly inferred based on the context.

# The output should be a well-formatted JSON instance that conforms to the JSON schema below.

# As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
# the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

# Here is the output JSON schema:
# ```
# {"type": "array", "items": {"$ref": "#/definitions/StatementFaithfulnessAnswer"}, "definitions": {"StatementFaithfulnessAnswer": {"title": "StatementFaithfulnessAnswer", "type": "object", "properties": {"statement": {"title": "Statement", "description": "the original statement, word-by-word", "type": "string"}, "reason": {"title": "Reason", "description": "the reason of the verdict", "type": "string"}, "verdict": {"title": "Verdict", "description": "the verdict(0/1) of the faithfulness.", "type": "integer"}}, "required": ["statement", "reason", "verdict"]}}}
# ```

# Do not return any preamble or explanations, return only a pure JSON string surrounded by triple backticks (```).

# Examples:

# context: "John is a student at XYZ University. He is pursuing a degree in Computer Science. He is enrolled in several courses this semester, including Data Structures, Algorithms, and Database Management. John is a diligent student and spends a significant amount of time studying and completing assignments. He often stays late in the library to work on his projects."
# statements: ```["John is majoring in Biology.", "John is taking a course on Artificial Intelligence.", "John is a dedicated student.", "John has a part-time job."]```
# answer: ```[{"statement": "John is majoring in Biology.", "reason": "John's major is explicitly mentioned as Computer Science. There is no information suggesting he is majoring in Biology.", "verdict": 0}, {"statement": "John is taking a course on Artificial Intelligence.", "reason": "The context mentions the courses John is currently enrolled in, and Artificial Intelligence is not mentioned. Therefore, it cannot be deduced that John is taking a course on AI.", "verdict": 0}, {"statement": "John is a dedicated student.", "reason": "The context states that he spends a significant amount of time studying and completing assignments. Additionally, it mentions that he often stays late in the library to work on his projects, which implies dedication.", "verdict": 1}, {"statement": "John has a part-time job.", "reason": "There is no information given in the context about John having a part-time job.", "verdict": 0}]```

# context: "Photosynthesis is a process used by plants, algae, and certain bacteria to convert light energy into chemical energy."
# statements: ```["Albert Einstein was a genius."]```
# answer: ```[{"statement": "Albert Einstein was a genius.", "reason": "The context and statement are unrelated", "verdict": 0}]```

# Your actual task:

# context: Albert Einstein (born 14 March 1879) was a German-born theoretical physicist, widely held to be one of the greatest and most influential scientists of all time
# statements: ["Einstein was born in Germany.", "Einstein was born on 14th March 1879."]
# answer:


In [None]:
# 당신의 임무는 주어진 문맥을 바탕으로 일련의 진술의 신뢰성(faithfulness)을 판단하는 것입니다.
# 각 진술에 대해 해당 진술이 문맥을 바탕으로 직접적으로 추론될 수 있다면 판결을 1로 반환하고,
# 문맥을 바탕으로 직접적으로 추론할 수 없다면 판결을 0으로 반환해야 합니다.

# 출력은 아래 JSON 스키마를 따르는 잘 형식화된 JSON 인스턴스여야 합니다.

# 예를 들어, 스키마가 {"properties": {"foo": {"title": "Foo", "description":
# "a list of strings", "type": "array", "items": {"type": "string"}}},
# "required": ["foo"]}인 경우, 객체 {"foo": ["bar", "baz"]}는 스키마에 잘 형식화된 인스턴스입니다.
# 객체 {"properties": {"foo": ["bar", "baz"]}}는 잘 형식화된 인스턴스가 아닙니다.

# 다음은 출력 JSON 스키마입니다:
# ```
# {"type": "array", "items": {"$ref": "#/definitions/StatementFaithfulnessAnswer"},
# "definitions": {"StatementFaithfulnessAnswer": {"title": "StatementFaithfulnessAnswer",
# "type": "object", "properties": {"statement": {"title": "Statement",
# "description": "the original statement, word-by-word", "type": "string"},
# "reason": {"title": "Reason", "description": "the reason of the verdict", "type": "string"},
# "verdict": {"title": "Verdict", "description": "the verdict(0/1) of the faithfulness.", "type": "integer"}},
# "required": ["statement", "reason", "verdict"]}}}
# ```

# 어떠한 서문이나 설명 없이, 삼중 백틱(```)으로 둘러싸인 순수 JSON 문자열만 반환하십시오.

# Examples:

# 문맥: "John은 XYZ 대학의 학생입니다. 그는 컴퓨터 과학 학위를 추구하고 있습니다.
# 이번 학기에 데이터 구조, 알고리즘, 데이터베이스 관리 등 여러 과목을 수강하고 있습니다.
# John은 성실한 학생으로 공부하고 과제를 완료하는 데 상당한 시간을 보냅니다. 그는 종종 도서관에 늦게까지 남아 프로젝트를 작업합니다."
# 진술들: ```["John은 생물학을 전공하고 있습니다.",
# "John은 인공지능 과목을 수강하고 있습니다.",
# "John은 헌신적인 학생입니다.",
# "John은 아르바이트를 하고 있습니다."]```
# 답변: ```[{"진술": "John은 생물학을 전공하고 있습니다.",
# "이유": "John의 전공이 컴퓨터 과학으로 명시되어 있습니다. 그가 생물학을 전공하고 있다는 정보는 없습니다.",
# "판결": 0},
# {"진술": "John은 인공지능 과목을 수강하고 있습니다.",
# "이유": "문맥에서 John이 현재 수강 중인 과목이 언급되어 있으며, 인공지능은 언급되지 않았습니다.
# 따라서 John이 인공지능 과목을 수강하고 있다고 추론할 수 없습니다.", "판결": 0},
# {"진술": "John은 헌신적인 학생입니다.", "이유": "문맥에서 그는 공부하고 과제를 완료하는 데 상당한 시간을 보낸다고 명시되어 있습니다.
# 또한, 그는 종종 도서관에 늦게까지 남아 프로젝트를 작업한다고 언급되어 있어 헌신을 암시합니다.", "판결": 1},
# {"진술": "John은 아르바이트를 하고 있습니다.",
# "이유": "문맥에서는 John이 아르바이트를 하고 있다는 정보가 주어지지 않았습니다.", "판결": 0}]```

# 문맥: "광합성은 식물, 조류 및 특정 박테리아가 빛 에너지를 화학 에너지로 전환하는 데 사용하는 과정입니다."
# 진술들: ```["알베르트 아인슈타인은 천재였습니다."]```
# 답변: ```[{"진술": "알베르트 아인슈타인은 천재였습니다.", "이유": "문맥과 진술이 관련이 없습니다.", "판결": 0}]```

# Your actual task:

# context: Albert Einstein (born 14 March 1879) was a German-born theoretical physicist,
# widely held to be one of the greatest and most influential scientists of all time
# statements: ["Einstein was born in Germany.", "Einstein was born on 14th March 1879."]
# answer:

In [None]:
# ```
# [{"statement": "Einstein was born in Germany.",
# "reason": "The context explicitly states that Einstein was German-born, confirming that he was indeed born in Germany.", "verdict": 1},
# {"statement": "Einstein was born on 14th March 1879.",
# "reason": "The context provides the exact date of Einstein's birth as 14th March 1879, confirming the statement.", "verdict": 1}]
# ```

In [None]:
# [{"진술": "아인슈타인은 독일에서 태어났습니다.",
# "이유": "문맥에서 아인슈타인이 독일 태생이라고 명시하고 있어, 그가 실제로 독일에서 태어났음을 확인합니다.", "판결": 1},
# {"진술": "아인슈타인은 1879년 3월 14일에 태어났습니다.",
# "이유": "문맥에서 아인슈타인의 출생 날짜를 1879년 3월 14일로 명시하여 진술을 확인합니다.", "판결": 1}]