# GenAI Workshop
## Lesson 5: RAG Pitfalls

This lesson is intended to show you how different RAG configurations affect the output quality of a Retrieval Augmented Generation system.

During this lesson you will learn how to ...

- evaluate the quality of a RAG system using simple quality metrics
- manipulate the *chunk size* to improve the quality metrics
- manipulate *top k* to improve the quality metrics

### Set up the environment

In [1]:
import os
import google.generativeai as genai

if os.getenv("COLAB_RELEASE_TAG"):
   COLAB = True
   print("Running on COLAB environment.")
else:
   COLAB = False
   print("WARNING: Running on LOCAL environment.")


Running on COLAB environment.


In [2]:
# Clone the data repository into colab
!git clone https://github.com/openknowledge/workshop-genai-data.git
PROCESSED_DATA_PATH = "/content/workshop-genai-data/processed/gutenberg/"
EVALUATION_DATA_PATH = "/content/workshop-genai-data/evaluation/"

Cloning into 'workshop-genai-data'...
remote: Enumerating objects: 41, done.[K
remote: Counting objects: 100% (41/41), done.[K
remote: Compressing objects: 100% (33/33), done.[K
remote: Total 41 (delta 13), reused 25 (delta 2), pack-reused 0 (from 0)[K
Receiving objects: 100% (41/41), 438.10 KiB | 2.37 MiB/s, done.
Resolving deltas: 100% (13/13), done.


In [3]:
# import colab specific lib to read user data (aka colab managed secrets)
from google.colab import userdata

In [4]:
# Initialize Google GenAI Client API with GOOGLE_API_KEY to be able to call the model.
# Note: GEMINI_API_KEY must be set as COLAB userdata before!
GOOGLE_API_KEY=userdata.get('GEMINI_API_KEY')
genai.configure(api_key=GOOGLE_API_KEY)

In [5]:
# Install additional libraries
%%capture
!pip install -qU langchain-text-splitters
!pip install chromadb

In [6]:
# Import additional libraries
from langchain_text_splitters import RecursiveCharacterTextSplitter
from chromadb import EphemeralClient
import requests
import re
import uuid
import json
import typing_extensions as typing
from google.generativeai.types import HarmCategory, HarmBlockThreshold
import pandas as pd
from pathlib import Path
import time
import numpy as np


In [7]:
# Configure pandas display options
pd.set_option("max_colwidth", None)

In [8]:
# Set default values for model, model parameters and prompt
DEFAULT_MODEL = "gemini-1.5-flash"
DEFAULT_CONFIG_TEMPERATURE = 0.0
DEFAULT_CONFIG_TOP_K = 1
DEFAULT_CONFIG_MAX_OUTPUT_TOKENS = 200
DEFAULT_SYSTEM_PROMPT = "Your are a friendly assistant"
DEFAULT_USER_PROMPT = " "

# Set defaults for retrieval
DEFAULT_TOP_K = 3
DEFAULT_CHUNK_OVERLAP = 100
DEFAULT_CHUNK_SIZE = 2000

In [9]:
# This will be the chromadb collection we use as a knowledge base. We do not need the in-memory EphemeralClient.
chromadb_client = EphemeralClient()
chromadb_collection = chromadb_client.create_collection(name="default")

In [None]:
def call_genai_model_for_completion(
    model_name: str = DEFAULT_MODEL,
    config_temperature:float = DEFAULT_CONFIG_TEMPERATURE,
    config_top_k: int = DEFAULT_CONFIG_TOP_K,
    config_max_output_tokens: int = DEFAULT_CONFIG_MAX_OUTPUT_TOKENS,
    system_prompt : str = DEFAULT_SYSTEM_PROMPT,
    user_prompt : str = DEFAULT_USER_PROMPT,
    verbose: bool = False
    ):
    """ Calls a gemini model with a given set of parameters and returns the completions

    Parameters
    ----------
    model_name : str, optional [default: DEFAULT_MODEL]
        The name of the model to use for the completion
    temperature : float, optional [default: DEFAULT_CONFIG_TEMPERATURE]
        The temperature of the model
    top_k : int, optional [default: DEFAULT_CONFIG_TOP_K]
        The number of most recent matches to return
    max_output_tokens : int, optional [default: DEFAULT_CONFIG_MAX_OUTPUT_TOKENS]
        The maximum number of output tokens to return
    system_prompt : str, optional [default: DEFAULT_SYSTEM_PROMPT]
        The system prompt to use for the completion
    user_prompt : str, optional [default: DEFAULT_USER_PROMPT]
        The user prompt to use for the completion
    file_list : [str], optional [default: empty list]
    verbose : bool, optional [default: False]
        Whether to print details of the completion process or not. Defaults to False
    Returns
    -------
    completions :
        a GenerateContentResponse instance representing the genAI model answer(s)
    """

    if verbose:
        # print out summary of input values / parameters
        print(f'Generating answer for following config:')
        print(f'  - SYSTEM PROMPT used:\n {system_prompt}')
        print(f'  - USER PROMPT used:\n {user_prompt}')
        print(f'  - MODEL used:\n {model_name} (temperature = {config_temperature}, top_k = {config_top_k}, max_output_tokens = {config_max_output_tokens})')

    # create generation config
    model_config = genai.GenerationConfig(
        max_output_tokens=config_max_output_tokens,
        temperature=config_temperature,
        top_k=config_top_k
    )

    # create genai model with generation config
    genai_model = genai.GenerativeModel(
        model_name= model_name,
        generation_config= model_config
    )

    # Attention: We manipulated the safety settings
    response = genai_model.generate_content(
        contents=[system_prompt, user_prompt], safety_settings={
        HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_NONE,
        HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_NONE,
        HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_NONE,
        HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_NONE,
    })
    return response

In [None]:
# RAG building blocks

# Get content of books. The content will already be cleansed.
def load_file_content(file_name: str) -> str:
  """ Loads content of a file in the local file systemby a given file name and return its content

    Parameters
    ----------
    file_name : str
        The name of the file to be loaded
    Returns
    -------
    file_content : str
        the content of the file as a string
  """
  with open(f"{PROCESSED_DATA_PATH}{file_name}", "r") as f:
    return f.read()

# Building Block "Chunking": Split the content into smaller chunks
def do_chunk(text: str, chunk_size: int = DEFAULT_CHUNK_SIZE, chunk_overlap: int = DEFAULT_CHUNK_OVERLAP) -> list[str]:
  """ Chunks a given text by a given chunk size and chunk overlap and returns a list of chunks

  Parameters
  ----------
  text : str
      The text to be chunked
  chunk_size : int, optional [default: DEFAULT_CHUNK_SIZE]
        The desired chunk size
  chunk_overlap : int, optional [default: DEFAULT_CHUNK_OVERLAP]
        The desired chunk overlap
  Returns
  -------
  chunks: [str]
      The created chunks
  """
  text_splitter = RecursiveCharacterTextSplitter(
      chunk_size=chunk_size,
      chunk_overlap=chunk_overlap,
      length_function=len,
  )
  return text_splitter.split_text(text=text)

# Building Block "Embedding": Create multi dimensional embeddings for a given chunk.
def do_embed(chunk: str) -> list[float]:
  """ Embeds a given chunk and returns the embedding

  Parameters
  ----------
  chunk : str
      The chunk to be embedded
  Returns
  -------
  embedding: [float]
      The created embedding
  """
  return genai.embed_content(model=EMBEDDING_MODEL, content=chunk).get("embedding")

def do_batch_embed(chunks: list[str]) -> list[list[float]]:
  """ Embeds a list of chunks and returns the embeddings

  Parameters
  ----------
  chunks : [str]
      The chunks to be embedded
  Returns
  -------
  embeddings: [list[float]]
      The created embeddings
  """
  return genai.embed_content(model=EMBEDDING_MODEL, content=chunks).get("embedding")

# Building Block "Knowledgebase": Store embeddings and the corresponding content in a vectorstore
def persist_embeddings(chunks: list[str], embeddings: list[float])-> None:
  """ Persists the embeddings and the chunks in the knowledgebase

  Parameters
  ----------
  chunks : [str]
      The chunks to be persisted
  embeddings: [list[float]]
      The corresponding embeddings to be persisted
  """
  chromadb_collection = chromadb_client.get_or_create_collection(name="default")
  # Persist the embeddings and the chunks in the knowledgebase
  ids = [str(uuid.uuid4()) for _ in chunks]
  chromadb_collection.add(ids=ids, documents=chunks, embeddings=embeddings)

# Building Block "Augmentation": Create an updated prompt by merging the original user input with the provided context
# Attention: We manipulated the augmented prompt in order to see the guardrails in action
def augment(user_input: str, context: list[str]) -> str:
  """ Augments a given user input by merging it with the provided context and returns the augmented prompt

  Parameters
  ----------
  user_input : str
      The user input to be augmented
  context : [str]
      The context to be merged with the user input
  Returns
  -------
  augmented_prompt: str
      The created augmented prompt
  """
  prepared_context = "\n".join(context)
  augmented_prompt = f"""
    Answer the question as detailed as possible from the provided context. If the answer is not in
    provided context just say, 'answer is not available in the context', don't provide the wrong answer.
    Respond short and concisely.
    Context:\n{prepared_context}?\n
    Question: \n{user_input}\n

    Answer:
  """
  return augmented_prompt

# Building Block "Top-k Fetching": Get the k semantically closest chunks to the user input from the knowledgebase
def do_top_k_fetching(user_input_embedding: list[float], top_k: int) -> tuple[list[str],list[float]]:
  """ Fetches the k semantically closest chunks to the user input from the knowledgebase

  Parameters
  ----------
  user_input_embedding : [float]
      The embedding of the user input
  top_k : int
      The number of semantically closest chunks to be fetched

  Returns
  -------
  context: [str]
      The fetched chunks
  distances: [float]
      The corresponding distances to the user_input_embedding
  """
  # Since we will do the fetching always only for one user_input,
  # instead of querying for multiple embeddings simultanously as allowed by the choma API,
  # we add the embeddings below to a list and return only the first document (chunk)
  chromadb_collection = chromadb_client.get_or_create_collection(name="default")
  results = chromadb_collection.query(query_embeddings=[user_input_embedding], n_results=top_k)
  return (results["documents"][0], results["distances"][0]) # Return the distances to get better insights

# Building Block "Generation": Use the generation model to create a response
# We return the total token count for this exercise
def generate_response(prompt: str) -> tuple[str, int]:
  """ Generates a response for a given prompt

  Parameters
  ----------
  prompt : str
      The prompt to be used for the generation
  Returns
  -------
  response: str
      The generated response
  total_token_count: int
      The total token count for the generation
  """
  completion_result = call_genai_model_for_completion(
      model_name=GENERATION_MODEL,
      user_prompt=prompt,
  )
  total_token_count = completion_result.usage_metadata.total_token_count
  return (completion_result.text, total_token_count)

In [None]:
def do_ingestion(file_names: list[str], chunk_size: int = DEFAULT_CHUNK_SIZE, clear_knowledgebase: bool = False, verbose: bool = False) -> None:
  """ Ingests a list of files by a given file name and chunk

  Parameters
  ----------
  file_names : [str]
      The names of the files to be ingested
  chunk_size : int, optional [default: DEFAULT_CHUNK_SIZE]
      The desired chunk size
  clear_knowledgebase : bool, optional [default: False]
      Whether to clear the knowledgebase before ingesting the new files
  verbose : bool, optional [default: False]
      Whether to print details of the ingestion process or not. Defaults to False
  """
  if clear_knowledgebase:
    chromadb_client.delete_collection(name="default")
  # Ingest file by file
  for file_name in file_names:
    # Load prepared book content
    file_content = load_file_content(file_name)

    # Chunk the content into smaller chunks
    chunks = do_chunk(file_content, chunk_size=chunk_size)
    if verbose:
      print(f'Loaded {len(chunks)} chunks from {file_name}')

    # Embed the chunks
    embeddings = do_batch_embed(chunks)

    # Persist the embeddings and the chunks in the knowledgebase
    persist_embeddings(chunks, embeddings)

  if verbose:
    chunks_count = chromadb_client.get_collection(name="default").count()
    print(f'Added {chunks_count} chunks to the knowledgebase')

In [None]:
def do_rag(user_input: str, top_k: int = DEFAULT_TOP_K, verbose: bool = False) -> tuple[str, list[str], list[float], int]:
  """ Runs the RAG pipeline with a given user input

  Parameters
  ----------
  user_input : str
      The user input to be used for the RAG pipeline
  top_k : int, optional [default: DEFAULT_TOP_K]
      The number of semantically closest chunks to be fetched
  verbose : bool, optional [default: False]
      Whether to print details of the RAG process or not. Defaults to False
  Returns
  -------
  response: str
      The generated response
  context: [str]
      The fetched chunks
  distances: [float]
      The corresponding distances to the user_input_embedding
  """
  # Embed the user input
  user_input_embedding = do_embed(chunk=user_input)

  # "R" like "Retrieval": Get the k semantically closest chunks to the user input from the knowledgebase
  retrieval_start_time = time.time()
  (context, distances) = do_top_k_fetching(user_input_embedding=user_input_embedding, top_k=top_k)
  if verbose:
    retrieval_end_time = time.time()
    retrieval_time = round(retrieval_end_time - retrieval_start_time, 2)

  # "A" like "Augmented": Create the augmented prompt
  augmented_prompt = augment(user_input=user_input, context=context)

  # "G" like "Generation": Generate a response
  generation_start_time = time.time()
  (response, total_token_count) = generate_response(prompt=augmented_prompt)
  if verbose:
    generation_end_time = time.time()
    generation_time = round(generation_end_time - generation_start_time, 2)
    print(f'Retrieval took {retrieval_time}s. Generation took {generation_time}s.')

  return (response, context, distances, total_token_count)


In [14]:
# Define a custom exception
class FactCheckingValidationError(Exception):
  """ Exception raised for errors in the fact checking validation. """
  pass

# Define a response format
class FactCheckingValidationAnswer(typing.TypedDict):
  """ Response format for the fact checking validation. """
  is_grounded: bool


def guard_fact_checking(bot_response: str, context: list[str]) -> str:
  """ Uses fact checking validation for a given bot response and context

  Parameters
  ----------
  bot_response : str
      The bot response to be guarded
  context : [str]
      The context to be used for the fact checking validation
  Returns
  -------
  bot_response: str
      The guarded bot response
  """
  # Prepare the context to be used in the guard prompt
  context = "\n".join(context)

  # Define the prompt for the guardrail
  guard_prompt = f"""
    You are given a task to identify if the answer is grounded and entailed to the context.
    You will only use the contents of the context and not rely on external knowledge.
    'context': {context} 'answer': {bot_response}
    """

  # Call the guardrail model with the desired output format
  model = genai.GenerativeModel(GUARDING_MODEL)
  result = model.generate_content(
      guard_prompt,
      generation_config=genai.GenerationConfig(
          response_mime_type="application/json", response_schema=FactCheckingValidationAnswer
      ),
  )

  # Evaluate the validation
  fact_checking_validation = json.loads(result.text)
  if not fact_checking_validation["is_grounded"]:
    error_msg = f"The bot answer '{bot_response}' is not grounded in the context '{context}'"
    raise FactCheckingValidationError(error_msg)
  return bot_response

In [15]:
def do_fact_checked_rag(user_input: str, top_k: int = DEFAULT_TOP_K, verbose: bool = False) -> tuple[str, list[str], list[float], int]:
  """ Runs the RAG pipeline with a given user input and fact checking validation

  Parameters
  ----------
  user_input : str
      The user input to be used for the RAG pipeline
  top_k : int, optional [default: DEFAULT_TOP_K]
      The number of semantically closest chunks to be fetched
  verbose : bool, optional [default: False]
      Whether to print details of the RAG process or not. Defaults to False
  Returns
  -------
  response: str
      The generated response
  context: [str]
      The fetched chunks
  distances: [float]
      The corresponding distances to the user_input
  total_token_count: int
      The total token count for the generation
  """
  (answer, context, distances, total_token_count) = do_rag(user_input=user_input, top_k=top_k, verbose=verbose)
  try:
    guarded_output = guard_fact_checking(bot_response=answer, context=context)
  except FactCheckingValidationError:
    # Return a predefined response if the fact checking validation fails
    guarded_output = "Possible hallucination detected."
  return (guarded_output, context, distances, total_token_count)

In [16]:
def print_insights(dataframe: pd.DataFrame):
  """ Prints the insights for a given evaluation dataframe

  Parameters
  ----------
  dataframe : pd.DataFrame
      The dataframe to be analyzed
  """
  median_response_time = round(dataframe['response_time'].median(), 2)
  median_min_distance = round(dataframe['min_context_distance'].median(), 2)
  median_median_distance = round(dataframe['median_context_distance'].median(), 2)
  median_total_token_count = round(dataframe['total_token_count'].median(), 2)
  n_hallucinations = dataframe["rag_response"].value_counts().get("Possible hallucination detected.", 0)

  print(f'Number of detected hallucinations: {n_hallucinations}')
  print(f'Median response time: {median_response_time} seconds')
  print(f'Median of minimum distance: {median_min_distance}')
  print(f'Median of median distance: {median_median_distance}')
  print(f'Median total token count: {median_total_token_count}')

In [17]:
# The function gives us the desired outputs to gather some insights
def generate_rag_answers(dataframe: pd.DataFrame, top_k: int = DEFAULT_TOP_K, verbose: bool = False):
  """ Generates the RAG answers for a given dataframe

  Parameters
  ----------
  dataframe : pd.DataFrame
      The dataframe, which includes the questions to be answered
  top_k : int, optional [default: DEFAULT_TOP_K]
      The number of semantically closest chunks to be fetched
  verbose : bool, optional [default: False]
      Whether to print details of the RAG process or not. Defaults to False
  Returns
  -------
  dataframe: pd.DataFrame
      The dataframe with the RAG answers
  """
  def generate_rag_response_with_insights(question: str) -> pd.Series:
        # Define start time for calculating the response time
        start_time = time.time()

        # Generate the RAG response
        output = do_fact_checked_rag(question, top_k=top_k, verbose=verbose)
        response = output[0]  # Extract the response from the output
        distances = output[2]  # Extract the distances from the output
        total_token_count = output[3]  # Extract the total token count from the output
        min_distance = round(min(distances), 2)  # Find the minimum distance
        median_distance = np.nanmedian(distances)  # Calculate the mean distance

        # Define end time for calculating the response time
        end_time = time.time()

        # Calculate response time
        response_time = round(end_time - start_time, 2)

        # Return response and insights
        return pd.Series([response, response_time, min_distance, median_distance, total_token_count])

  # Apply the insights function to each row and store results in new columns
  dataframe[['rag_response', 'response_time', 'min_context_distance', 'median_context_distance', 'total_token_count']] = dataframe['question'].apply(generate_rag_response_with_insights)
  return dataframe


### Configure the genAI models

In [None]:
GENERATION_MODEL = "gemini-1.5-flash"
EMBEDDING_MODEL = "models/text-embedding-004"
GUARDING_MODEL = "gemini-1.5-flash-8b"

### Prepare the knowledgebase

In [19]:
KNOWLEDGEBASE_CONTENT = set(['study_in_scarlett.txt'])

do_ingestion(file_names=KNOWLEDGEBASE_CONTENT)

Loaded 126 chunks from study_in_scarlett.txt
Added 126 chunks to the knowledgebase


### Manually evaluate responses

In [20]:
# Read csv from local files
evaluation_dataframe = pd.read_csv(EVALUATION_DATA_PATH + 'simple_evaluation_dataset.csv')
evaluation_dataframe.head()

Unnamed: 0,story_name,question,ground_truth_answer
0,A Study in Scarlet,What year does Dr. Watson complete his medical degree?,1878
1,A Study in Scarlet,Where do Sherlock Holmes and Dr. Watson decide to live together?,221B Baker Street
2,A Study in Scarlet,What word does Sherlock Holmes find written in blood on the wall at the crime scene?,RACHE
3,A Study in Scarlet,What profession does Sherlock Holmes describe himself as having?,A consulting detective
4,A Study in Scarlet,What clue suggests to Holmes that the murderer might have smoked a particular kind of cigar?,Ashes from a Trichinopoly cigar found at the crime scene


In [21]:
# Generate responses and insights. We use a wrapper function, which collects the insights and stores them in a dataframe
evaluation_dataframe = generate_rag_answers(evaluation_dataframe)
evaluation_dataframe

Unnamed: 0,story_name,question,ground_truth_answer,rag_response,response_time,min_context_distance,median_context_distance,total_token_count
0,A Study in Scarlet,What year does Dr. Watson complete his medical degree?,1878,1878 \n,2.31,0.83,1.015261,1507
1,A Study in Scarlet,Where do Sherlock Holmes and Dr. Watson decide to live together?,221B Baker Street,They decide to live together at 221B Baker Street. \n,1.98,0.84,0.844118,1442
2,A Study in Scarlet,What word does Sherlock Holmes find written in blood on the wall at the crime scene?,RACHE,RACHE \n,2.0,0.63,0.692538,1481
3,A Study in Scarlet,What profession does Sherlock Holmes describe himself as having?,A consulting detective,Sherlock Holmes describes himself as a **consulting detective**. \n,1.86,0.64,0.751798,1457
4,A Study in Scarlet,What clue suggests to Holmes that the murderer might have smoked a particular kind of cigar?,Ashes from a Trichinopoly cigar found at the crime scene,"Holmes found scattered ash on the floor that was dark in color and flakey, which is characteristic of a Trichinopoly cigar. \n",1.98,0.69,0.740873,1536
5,A Scandal in Bohemia,Who is the woman that Sherlock Holmes admires and refers to as 'the woman'?,Irene Adler,Possible hallucination detected.,2.14,0.86,0.874383,1432
6,The Red-Headed League,What unique characteristic qualified Mr. Jabez Wilson to join the Red-Headed League?,His bright red hair,Possible hallucination detected.,1.83,1.05,1.04848,1476
7,A Case of Identity,Who is Miss Mary Sutherland engaged to marry?,Hosmer Angel,Possible hallucination detected.,8.07,1.06,1.073309,1487
8,The Boscombe Valley Mystery,Who is initially suspected of killing Charles McCarthy?,"His son, James McCarthy",Possible hallucination detected.,2.04,1.11,1.131647,1545
9,The Five Orange Pips,Who is ultimately responsible for the threats and deaths in the Openshaw family?,The Ku Klux Klan,Possible hallucination detected.,2.01,1.19,1.191253,1445


In [22]:
print_insights(evaluation_dataframe)

Number of detected hallucinations: 10
Median response time: 1.98 seconds
Median of minimum distance: 0.86
Median of median distance: 0.94
Median total token count: 1476.0


### Ingest remaining stories

In [23]:
# Ingest the second book, which contains the other stories
KNOWLEDGEBASE_CONTENT.add('adventures_of_sherlock_holmes.txt')
do_ingestion(file_names=KNOWLEDGEBASE_CONTENT, clear_knowledgebase=False)

Loaded 304 chunks from adventures_of_sherlock_holmes.txt
Loaded 126 chunks from study_in_scarlett.txt
Added 556 chunks to the knowledgebase


In [24]:
# Generate responses and insights
evaluation_dataframe = generate_rag_answers(evaluation_dataframe)
evaluation_dataframe

Unnamed: 0,story_name,question,ground_truth_answer,rag_response,response_time,min_context_distance,median_context_distance,total_token_count
0,A Study in Scarlet,What year does Dr. Watson complete his medical degree?,1878,1878 \n,3.3,0.83,0.825438,1555
1,A Study in Scarlet,Where do Sherlock Holmes and Dr. Watson decide to live together?,221B Baker Street,Possible hallucination detected.,1.99,0.83,0.841487,1446
2,A Study in Scarlet,What word does Sherlock Holmes find written in blood on the wall at the crime scene?,RACHE,RACHE \n,1.83,0.63,0.631067,1470
3,A Study in Scarlet,What profession does Sherlock Holmes describe himself as having?,A consulting detective,"Sherlock Holmes describes himself as a ""consulting detective"". \n",2.03,0.64,0.643664,1481
4,A Study in Scarlet,What clue suggests to Holmes that the murderer might have smoked a particular kind of cigar?,Ashes from a Trichinopoly cigar found at the crime scene,"Holmes found the ash of an Indian cigar, which he identified based on his knowledge of tobacco ashes. \n",1.95,0.59,0.693573,1521
5,A Scandal in Bohemia,Who is the woman that Sherlock Holmes admires and refers to as 'the woman'?,Irene Adler,Irene Adler \n,1.8,0.78,0.784401,1466
6,The Red-Headed League,What unique characteristic qualified Mr. Jabez Wilson to join the Red-Headed League?,His bright red hair,Mr. Jabez Wilson's red hair qualified him to join the Red-Headed League. \n,1.87,0.55,0.605021,1584
7,A Case of Identity,Who is Miss Mary Sutherland engaged to marry?,Hosmer Angel,Miss Mary Sutherland is engaged to marry Mr. Hosmer Angel. \n,1.84,0.82,0.866177,1559
8,The Boscombe Valley Mystery,Who is initially suspected of killing Charles McCarthy?,"His son, James McCarthy","James McCarthy, the son of the deceased, is initially suspected of killing Charles McCarthy. \n",2.26,0.87,0.902041,1431
9,The Five Orange Pips,Who is ultimately responsible for the threats and deaths in the Openshaw family?,The Ku Klux Klan,The Ku Klux Klan is ultimately responsible for the threats and deaths in the Openshaw family. \n,1.9,0.83,0.924548,1425


In [25]:
print_insights(evaluation_dataframe)

Number of detected hallucinations: 5
Median response time: 1.9 seconds
Median of minimum distance: 0.76
Median of median distance: 0.78
Median total token count: 1487.0


### Manually adjust *chunk size*

In [26]:
# Define new chunk size. Be careful: The smaller the chunk_size, the more time is needed for the ingestion. I.e. adjusted_chunk_size = 500 takes 1-2 minutes
adjusted_chunk_size = 1000

In [27]:
# We need to clear the knowledgebase for this
do_ingestion(file_names=KNOWLEDGEBASE_CONTENT, chunk_size=adjusted_chunk_size, clear_knowledgebase=True, verbose=True)

Loaded 640 chunks from adventures_of_sherlock_holmes.txt
Loaded 266 chunks from study_in_scarlett.txt
Added 906 chunks to the knowledgebase


In [28]:
# Generate responses and insights
evaluation_dataframe = generate_rag_answers(evaluation_dataframe, verbose=True)
evaluation_dataframe

Retrieval took 0.01s. Generation took 0.59s.
Retrieval took 0.01s. Generation took 0.69s.
Retrieval took 0.01s. Generation took 0.71s.
Retrieval took 0.01s. Generation took 0.61s.
Retrieval took 0.01s. Generation took 0.66s.
Retrieval took 0.01s. Generation took 0.61s.
Retrieval took 0.01s. Generation took 0.89s.
Retrieval took 0.01s. Generation took 0.64s.
Retrieval took 0.01s. Generation took 0.69s.
Retrieval took 0.01s. Generation took 0.59s.
Retrieval took 0.01s. Generation took 0.61s.
Retrieval took 0.01s. Generation took 0.69s.
Retrieval took 0.01s. Generation took 0.66s.
Retrieval took 0.01s. Generation took 0.66s.
Retrieval took 0.01s. Generation took 0.64s.


Unnamed: 0,story_name,question,ground_truth_answer,rag_response,response_time,min_context_distance,median_context_distance,total_token_count
0,A Study in Scarlet,What year does Dr. Watson complete his medical degree?,1878,Possible hallucination detected.,2.16,0.95,0.973122,766
1,A Study in Scarlet,Where do Sherlock Holmes and Dr. Watson decide to live together?,221B Baker Street,221B Baker Street \n,1.93,0.79,0.799011,760
2,A Study in Scarlet,What word does Sherlock Holmes find written in blood on the wall at the crime scene?,RACHE,"The word is ""RACHE"". \n",1.83,0.69,0.730881,789
3,A Study in Scarlet,What profession does Sherlock Holmes describe himself as having?,A consulting detective,Sherlock Holmes describes himself as a **consulting detective**. \n,1.83,0.6,0.709403,795
4,A Study in Scarlet,What clue suggests to Holmes that the murderer might have smoked a particular kind of cigar?,Ashes from a Trichinopoly cigar found at the crime scene,"Holmes found the ash of a cigar, which his special knowledge of tobacco ashes enabled him to pronounce as an Indian cigar. \n",1.91,0.6,0.634511,838
5,A Scandal in Bohemia,Who is the woman that Sherlock Holmes admires and refers to as 'the woman'?,Irene Adler,Possible hallucination detected.,1.93,0.66,0.673151,889
6,The Red-Headed League,What unique characteristic qualified Mr. Jabez Wilson to join the Red-Headed League?,His bright red hair,Possible hallucination detected.,2.19,0.44,0.616427,812
7,A Case of Identity,Who is Miss Mary Sutherland engaged to marry?,Hosmer Angel,Miss Mary Sutherland is engaged to marry Hosmer Angel. \n,2.11,0.83,0.87601,792
8,The Boscombe Valley Mystery,Who is initially suspected of killing Charles McCarthy?,"His son, James McCarthy","The initial suspect is Charles McCarthy's son, James McCarthy. \n",1.8,0.93,0.943639,775
9,The Five Orange Pips,Who is ultimately responsible for the threats and deaths in the Openshaw family?,The Ku Klux Klan,Possible hallucination detected.,1.72,0.81,0.930894,798


In [29]:
print_insights(evaluation_dataframe)

Number of detected hallucinations: 6
Median response time: 1.93 seconds
Median of minimum distance: 0.72
Median of median distance: 0.73
Median total token count: 798.0


### Manually adjust *top_k*
We do not need to ingest again, since our adjustment just affects the retrieval pipeline. Keep in mind, that we use the chunking from before, because we did not update the knowledgebase.


In [30]:
adjusted_top_k = 10

In [32]:
# Generate responses and insights
evaluation_dataframe = generate_rag_answers(evaluation_dataframe, top_k=adjusted_top_k, verbose=True)
evaluation_dataframe

Retrieval took 0.01s. Generation took 0.76s.
Retrieval took 0.01s. Generation took 0.74s.
Retrieval took 0.01s. Generation took 0.71s.
Retrieval took 0.01s. Generation took 0.74s.
Retrieval took 0.01s. Generation took 0.87s.
Retrieval took 0.01s. Generation took 0.74s.
Retrieval took 0.01s. Generation took 0.79s.
Retrieval took 0.01s. Generation took 0.69s.
Retrieval took 0.01s. Generation took 0.99s.
Retrieval took 0.01s. Generation took 0.84s.
Retrieval took 0.01s. Generation took 0.71s.
Retrieval took 0.01s. Generation took 2.03s.
Retrieval took 0.01s. Generation took 0.81s.
Retrieval took 0.01s. Generation took 0.71s.
Retrieval took 0.01s. Generation took 0.77s.


Unnamed: 0,story_name,question,ground_truth_answer,rag_response,response_time,min_context_distance,median_context_distance,total_token_count
0,A Study in Scarlet,What year does Dr. Watson complete his medical degree?,1878,Possible hallucination detected.,2.89,0.95,1.008003,2424
1,A Study in Scarlet,Where do Sherlock Holmes and Dr. Watson decide to live together?,221B Baker Street,221B Baker Street \n,1.98,0.79,0.84494,2435
2,A Study in Scarlet,What word does Sherlock Holmes find written in blood on the wall at the crime scene?,RACHE,"The word is ""RACHE"". \n",1.93,0.69,0.777222,2347
3,A Study in Scarlet,What profession does Sherlock Holmes describe himself as having?,A consulting detective,Sherlock Holmes describes himself as a consulting detective. \n,2.03,0.6,0.752171,2462
4,A Study in Scarlet,What clue suggests to Holmes that the murderer might have smoked a particular kind of cigar?,Ashes from a Trichinopoly cigar found at the crime scene,"Holmes found the ash of a cigar, which he recognized as an Indian cigar. \n",2.29,0.6,0.701482,2468
5,A Scandal in Bohemia,Who is the woman that Sherlock Holmes admires and refers to as 'the woman'?,Irene Adler,Irene Adler \n,2.03,0.66,0.758444,2548
6,The Red-Headed League,What unique characteristic qualified Mr. Jabez Wilson to join the Red-Headed League?,His bright red hair,Mr. Jabez Wilson's unique characteristic that qualified him to join the Red-Headed League was his **red hair**. \n,2.04,0.44,0.757083,2549
7,A Case of Identity,Who is Miss Mary Sutherland engaged to marry?,Hosmer Angel,Possible hallucination detected.,1.93,0.83,0.915457,2450
8,The Boscombe Valley Mystery,Who is initially suspected of killing Charles McCarthy?,"His son, James McCarthy","James McCarthy, the son of the deceased, is initially suspected of killing Charles McCarthy. \n",2.38,0.93,0.977571,2422
9,The Five Orange Pips,Who is ultimately responsible for the threats and deaths in the Openshaw family?,The Ku Klux Klan,Possible hallucination detected.,2.27,0.81,0.94921,2422


In [33]:
print_insights(evaluation_dataframe)

Number of detected hallucinations: 4
Median response time: 2.05 seconds
Median of minimum distance: 0.72
Median of median distance: 0.8
Median total token count: 2435.0


### Lessons learned
- If we ingest more files, the hallucinations will be reduced
- If we increase the *chunk_size*:
  - hallucinations decrease, probably due to more context
  - distances go up. The context becomes more irrelevant
  - token-usage increases, which raises the costs
  - response-time seems to increase, probably due to more tokens
- If we increase *top_k*:
  - hallucinations decrease, du to more context
  - median-distance goes up, while the min-distance stays the same. We seem to add more irrelevant context
  - token-usage increases, which raises the costs
  - the response time seems to increase.
- Generally:
  - increasing *chunk_size* and *top_k* lead to less hallucinations, but increases the costs
  - This procedure needs to be automatized by using optimization techniques in addition to rag metrics.

Keep in mind: There are many more parameters to be adjusted like prompts, generation/embedding/guardrail models, chunk_overlap, types of splitters, types of retrievers, distance functions, distance thresholds, model parameters, [...]

### Exercise 01: Find the best parameter values for *chunk_size* and *top_k*
- **Main goal**: Generate a minimum amount of hallucinations
- **Second goal**: Keep the total count of tokens as small as possible to reduce costs

In order to be able to run the following code, you need to excute **Step 1** and **Step 2** of this notebook.

In [19]:
# We use a smaller set of evaluation question to reduce response time
evaluation_dataframe = pd.read_csv(EVALUATION_DATA_PATH + 'simple_evaluation_dataset_shortened.csv')
evaluation_dataframe.head()

Unnamed: 0,story_name,question,ground_truth_answer
0,A Scandal in Bohemia,What item does the King of Bohemia wish to recover from Irene Adler?,A compromising photograph of himself with Irene Adler
1,The Red-Headed League,What was Mr. Wilson asked to do for the Red-Headed League?,Copy out the Encyclopædia Britannica
2,The Boscombe Valley Mystery,What does Holmes deduce is the actual murder weapon?,A stone found near the scene of the crime
3,The Five Orange Pips,What mysterious items do members of the Openshaw family receive before their deaths?,Five orange pips
4,A Study in Scarlet,What war does Dr. Watson serve in as an army doctor?,The Second Afghan War


In [20]:
# The knowledgebase consists of shortened versions of the books to reduce ingestion time. Do not change this
KNOWLEDGEBASE_CONTENT = ['study_in_scarlett_shortened.txt', 'adventures_of_sherlock_holmes_shortened.txt']

In [21]:
# TODO: Adjust the following parameters
chunk_size = 2000
top_k = 5

In [22]:
# This needs to be executed if you changed chunk_size
do_ingestion(file_names=KNOWLEDGEBASE_CONTENT, chunk_size=chunk_size)

Loaded 58 chunks from study_in_scarlett_shortened.txt
Loaded 120 chunks from adventures_of_sherlock_holmes_shortened.txt
Added 178 chunks to the knowledgebase


In [23]:
# This needs to be executed always
evaluation_dataframe = generate_rag_answers(evaluation_dataframe, top_k=top_k)
evaluation_dataframe

Unnamed: 0,story_name,question,ground_truth_answer,rag_response,response_time,min_context_distance,median_context_distance,total_token_count
0,A Scandal in Bohemia,What item does the King of Bohemia wish to recover from Irene Adler?,A compromising photograph of himself with Irene Adler,The King of Bohemia wishes to recover a compromising photograph of himself with Irene Adler. \n,3.07,0.65,0.734073,2561
1,The Red-Headed League,What was Mr. Wilson asked to do for the Red-Headed League?,Copy out the Encyclopædia Britannica,"Mr. Wilson was asked to write about various topics, starting with the letter ""A"", for a salary of £4 a week. \n",2.81,0.61,0.689553,2600
2,The Boscombe Valley Mystery,What does Holmes deduce is the actual murder weapon?,A stone found near the scene of the crime,The murder weapon is a jagged stone. \n,2.46,0.73,0.768303,2446
3,The Five Orange Pips,What mysterious items do members of the Openshaw family receive before their deaths?,Five orange pips,The Openshaw family members receive letters containing dried orange pips before their deaths. \n,2.46,0.77,0.832415,2365
4,A Study in Scarlet,What war does Dr. Watson serve in as an army doctor?,The Second Afghan War,The Second Afghan War. \n,2.63,0.78,0.964393,2423
5,A Study in Scarlet,Who introduces Dr. Watson to Sherlock Holmes?,A mutual acquaintance named Stamford,Stamford introduces Dr. Watson to Sherlock Holmes. \n,2.47,0.88,0.887538,2463


In [24]:
# TODO: Analyze the insights and iteratively find the best configuration for chunk_size and top_k
print_insights(evaluation_dataframe)

Number of detected hallucinations: 0
Median response time: 2.55 seconds
Median of minimum distance: 0.75
Median of median distance: 0.8
Median total token count: 2454.5
