# Avaliação automatizada de LLMs utilizando DeepEval - Hallucination

In [1]:
import os
from dotenv import load_dotenv

from langchain_openai import ChatOpenAI
from deepeval.test_case import LLMTestCase
from deepeval.metrics import HallucinationMetric

from IPython.display import display, Markdown



In [2]:
load_dotenv()
openai_api_key = os.getenv("OPENAI_API_KEY")

## 1. Alucinação

Garantir que o modelo não introduza informações falsas ou não fundamentadas, assegurando a precisão e confiabilidade das respostas geradas.

Em outras palavras, ela verifica se a resposta do modelo está alinhada com as informações presentes no contexto, identificando possíveis informações inventadas ou não suportadas pelo conteúdo fornecido.

**Alice’s Adventures in Wonderland** Chapter 1

Ref do livro: https://www.gutenberg.org/files/11/11-0.txt

In [3]:
livro_cp1 = ""
with open("./short_story.txt", "r", encoding="utf-8") as arquivo:
    livro_cp1 = arquivo.read()


display(Markdown(f'{livro_cp1[:492]}...\n\n...\n\n...{livro_cp1[-500:]}'))
print(len(livro_cp1))

CHAPTER I.
Down the Rabbit-Hole

Alice was beginning to get very tired of sitting by her sister on the
bank, and of having nothing to do: once or twice she had peeped into
the book her sister was reading, but it had no pictures or
conversations in it, “and what is the use of a book,” thought Alice
“without pictures or conversations?”

So she was considering in her own mind (as well as she could, for the
hot day made her feel very sleepy and stupid), whether the pleasure of
making a daisy...

...

...ate a little bit, and said anxiously to herself, “Which way? Which
way?”, holding her hand on the top of her head to feel which way it was
growing, and she was quite surprised to find that she remained the same
size: to be sure, this generally happens when one eats cake, but Alice
had got so much into the way of expecting nothing but out-of-the-way
things to happen, that it seemed quite dull and stupid for life to go
on in the common way.

So she set to work, and very soon finished off the cake.

11285


In [4]:
PROMPT = f"""
"Read the following excerpt from the book carefully. Pay attention to the details and try to imagine the scene as described. Once you’ve thoroughly understood the passage, consider any characters Alice might envision interacting with as she experiences this moment. After reading, answer the question thoughtfully and accurately based on the text.

Here is the text:

{{TEXT}}

Question: Alice isn't surprised to overhear the White Rabbit talking to itself. Which of the White Rabbit's actions does surprise Alice, and why?
"""

prompt = PROMPT.format(TEXT = livro_cp1)
display(Markdown(f'{prompt[:1000]}...'))


"Read the following excerpt from the book carefully. Pay attention to the details and try to imagine the scene as described. Once you’ve thoroughly understood the passage, consider any characters Alice might envision interacting with as she experiences this moment. After reading, answer the question thoughtfully and accurately based on the text.

Here is the text:

CHAPTER I.
Down the Rabbit-Hole

Alice was beginning to get very tired of sitting by her sister on the
bank, and of having nothing to do: once or twice she had peeped into
the book her sister was reading, but it had no pictures or
conversations in it, “and what is the use of a book,” thought Alice
“without pictures or conversations?”

So she was considering in her own mind (as well as she could, for the
hot day made her feel very sleepy and stupid), whether the pleasure of
making a daisy-chain would be worth the trouble of getting up and
picking the daisies, when suddenly a White Rabbit with pink eyes ran
close by her.

The...

In [5]:
chat = ChatOpenAI(model="gpt-4o-mini")
response = chat.invoke(prompt)

In [6]:
display(Markdown(f'{response.content}'))

Alice is surprised when the White Rabbit actually takes a watch out of its waistcoat-pocket and looks at it. This action amazes her because she has never seen a rabbit with a waistcoat-pocket or a watch before. Up to that point, she finds the Rabbit's behavior somewhat natural, but the moment it produces a watch from its pocket highlights a level of anthropomorphism that she finds extraordinary. This unexpected and fantastical behavior ignites her curiosity and prompts her to chase after the Rabbit, leading her down the rabbit-hole into her adventurous journey.

### Avaliação da resposta

ref.: https://www.coursehero.com/lit/Alice-in-Wonderland/discussion-questions/page-1/

In [7]:
actual_output = response.content

retrieval_context = [livro_cp1]

metric = HallucinationMetric(
    model="gpt-4",
    include_reason=True
)

test_case = LLMTestCase(
    input="Alice isn't surprised to overhear the White Rabbit talking to itself. Which of the White Rabbit's actions does surprise Alice, and why?",
    actual_output=actual_output,
    context=retrieval_context,
)

metric.measure(test_case)
print(metric.score)
print(metric.reason)

Output()

0.0
The score is 0.00 because the actual output perfectly aligns with the provided context and there are no contradictions.


In [8]:
# troquei relógio por bússola
resposta_alucinada = """Alice is surprised when she sees the White Rabbit take a
watch out of its waistcoat pocket and look at it. This action astonishes 
her because she has never encountered a rabbit that not only wears a waistcoat 
but also possesses a watch to check the time. Additionally, the rabbit pulls 
out a small compass to check its direction, an action that confuses 
Alice even further, as she has never seen a rabbit with navigational tools. 
The sight of a rabbit engaging in such human-like behavior—carrying a watch, a 
compass, and expressing concern about being late—sparks Alice's curiosity and 
prompts her to follow the Rabbit down the rabbit-hole. This moment signifies the 
start of her adventures in a curious and fantastical world, highlighting her initial 
sense of wonder and intrigue about the unusual events unfolding around her."""

retrieval_context = [livro_cp1]

metric = HallucinationMetric(
    model="gpt-4o-mini",
    include_reason=True
)

test_case = LLMTestCase(
    input="Alice isn't surprised to overhear the White Rabbit talking to itself. Which of the White Rabbit's actions does surprise Alice, and why?",
    actual_output=resposta_alucinada,
    context=retrieval_context,
)

metric.measure(test_case)
print(metric.score)
print(metric.reason)

Output()

1.0
The score is 1.00 because the actual output introduces a fact (the Rabbit taking out a compass) that is not supported by the context, leading to a complete contradiction.
