# Alucinaciones

Vamos a someter al modelo a una serie de preguntas relacionadas con las olimpiadas de 2020, con la que no ha sido entrenado. 




In [1]:
# Testing hallucinations . 
# Instalar las librerías que necesitamos

# !pip install openai
%pip install transformers

Note: you may need to restart the kernel to use updated packages.


In [2]:
# Importar métodos, API y crear una variable ambiente con el modelo que vamos a usar

import openai
import pandas as pd
import numpy as np
import os
import pickle
from transformers import GPT2TokenizerFast
from typing import Dict, Tuple, List

  from .autonotebook import tqdm as notebook_tqdm
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.


In [3]:
# Crear variable ambiente con el modelo y API 

openai.api_key = os.getenv("OPENAI_API_KEY")
COMPLETIONS_MODEL = "text-davinci-003"


Ahora hacemos una consulta sobre la que el modelo no haya sido entrenado para que el modelo se invente la respuesta. 

Marcelo Chierghini es un olimpista brasileño, pero es nadador y quedó en el puesto 8 de las olimpiadas.

In [4]:
prompt = "Who won the 2020 Summer Olympics men's high jump?"
# prompt = "Quién ganó las olimpiadas de verano 2020 en la categoría de salto de altura de hombres?"

openai.Completion.create(
    prompt=prompt,
    temperature=0,
    max_tokens=300,
    top_p=1,
    frequency_penalty=0,
    presence_penalty=0,
    model=COMPLETIONS_MODEL
)["choices"][0]["text"].strip(" \n")

"Marcelo Chierighini of Brazil won the gold medal in the men's high jump at the 2020 Summer Olympics."

# Prevenir alucinaciones

Vamos a tratar una serie de métodos para prevenir las alucinaciones por este orden: 
- Hacer reconocer al modelo que no sabe la respuesta (en este notebook)
- Mejorar la consulta, es decir el prompt (en este notebook)
- Utilizar embeddings.
- Fine-tunear el modelo. 

In [6]:
# Hacer que el modelo responda que no sabe a respuesta: 

prompt = """Responde a la pregunta lo más verídicamente posible, y si no estás seguro de la respuesta responde 'Lo siento, no lo sé'. 

Q: Quién ganó las olimpiadas de verano 2020 en la categoría de salto de altura de hombres? 
A:"""

openai.Completion.create(
    prompt=prompt,
    temperature=0,
    max_tokens=300,
    top_p=1,
    frequency_penalty=0,
    presence_penalty=0,
    model=COMPLETIONS_MODEL
)["choices"][0]["text"].strip(" \n")

'Lo siento, no lo sé.'

In [7]:
# Mejorar el prompt dando pistas sobre la respuesta, además de lo anterior. 
# Observar cómo si la cantidad de contexto no es muy grande, podemos incluirla en el prompt directamente. 

prompt = """Answer the question as truthfully as possible using the provided text, and if the answer is not contained within the text below, say "I don't know"

Context:
The men's high jump event at the 2020 Summer Olympics took place between 30 July and 1 August 2021 at the Olympic Stadium.
33 athletes from 24 nations competed; the total possible number depended on how many nations would use universality places 
to enter athletes in addition to the 32 qualifying through mark or ranking (no universality places were used in 2021).
Italian athlete Gianmarco Tamberi along with Qatari athlete Mutaz Essa Barshim emerged as joint winners of the event following
a tie between both of them as they cleared 2.37m. Both Tamberi and Barshim agreed to share the gold medal in a rare instance
where the athletes of different nations had agreed to share the same medal in the history of Olympics. 
Barshim in particular was heard to ask a competition official "Can we have two golds?" in response to being offered a 
'jump off'. Maksim Nedasekau of Belarus took bronze. The medals were the first ever in the men's high jump for Italy and 
Belarus, the first gold in the men's high jump for Italy and Qatar, and the third consecutive medal in the men's high jump
for Qatar (all by Barshim). Barshim became only the second man to earn three medals in high jump, joining Patrik Sjöberg
of Sweden (1984 to 1992).

Q: Who won the 2020 Summer Olympics men's high jump?
A:"""

openai.Completion.create(
    prompt=prompt,
    temperature=0,
    max_tokens=300,
    top_p=1,
    frequency_penalty=0,
    presence_penalty=0,
    model=COMPLETIONS_MODEL
)["choices"][0]["text"].strip(" \n")



'Gianmarco Tamberi and Mutaz Essa Barshim emerged as joint winners of the event.'

Haciendo algo de prompt engineering hemos conseguido que el modelo de una respuesta acertada, pero ¿a qué precio? este sistema no es práctico porque no escala: no podemos enviar la totalidad del contexto cada vez que hacemos consultas, y mucho menos el dataset completo cuando éstas son imprevisibles. 

Aquí es donde tienen sentido los Embeddings. 





In [None]:
# Simulating hallucinations with numeric values 

import requests
import os
import openai
import collections

openai.api_key = os.getenv("OPENAI_API_KEY")

# Define the prompts and correct answers
prompts = [
    "Find the result of 3 plus 6.",
    "Find the result of 3 plus 6.",
    "Find the result of 3 plus 6.",
    "Find the result of 3 plus 6.",
    "Find the result of 3 plus 6."
]
correct_answer = "9"

# Number of queries for each prompt
num_queries = 25

# Initialize responses dictionary
responses = {prompt: [] for prompt in prompts}

# Send queries and accumulate responses
for prompt in prompts:
    for _ in range(num_queries):
        response = openai.Completion.create(
            engine="text-davinci-002",
            prompt=prompt,
            max_tokens=50,
        )
        response_text = response.choices[0].text.strip()
        responses[prompt].append(response_text)

# Calculate statistics for each prompt
for prompt, answers in responses.items():
    answer_distribution = collections.Counter(answers)
    correct_count = answer_distribution[correct_answer]
    total_responses = len(answers)
    
    print(f"Prompt: '{prompt}'")
    print(f"Correct Answer: '{correct_answer}'")
    print(f"Correct Count: {correct_count} out of {total_responses} responses")
    print(f"Accuracy: {correct_count / total_responses:.2%}\n")