# Retriever Evaluations 

We will use Precision, Recall & F1 Scores to evaluate the retrieval part

In [1]:
import pandas as pd

# read dataset (20 documents)
df = pd.read_csv("../../evaluation/archive/documents.csv", index_col=0)
df

Unnamed: 0_level_0,source_url,text
index,Unnamed: 1_level_1,Unnamed: 2_level_1
0,https://enterthegungeon.fandom.com/wiki/Bullet...,Bullet Kin\nBullet Kin are one of the most com...
1,https://www.dropbox.com/scl/fi/ljtdg6eaucrbf1a...,---The Paths through the Underground/Underdark...
2,https://bytes-and-nibbles.web.app/bytes/stici-...,Semantic and Textual Inference Chatbot Interfa...
3,https://github.com/llmware-ai/llmware,llmware\n\nBuilding Enterprise RAG Pipelines w...
4,https://docs.marimo.io/recipes.html,Recipes\nThis page includes code snippets or “...
5,https://towardsdatascience.com/how-to-maximize...,How to Maximize Your Impact as a Data Scientis...
6,https://ec.europa.eu/commission/presscorner/de...,Why do we need to regulate the use of Artifici...
7,https://bg3.wiki/wiki/The_Emperor,The Emperor is a mind flayer who appears in Ba...
8,https://whattocook.substack.com/p/so-into-nort...,so into northern spain!\nour magical urban-plu...
9,https://dmtalkies.com/the-zone-of-interest-end...,‘The Zone Of Interest’ Ending Explained & Film...


In [2]:
field_mapping = {"index": "doc_id", "source_url": "path", "text": "content"}

# to fix the index
df_reset = df.reset_index()
df_mapped = df_reset.rename(columns=field_mapping)
docs = df_mapped.to_dict("records", index=True)
docs

[{'doc_id': 0,
  'path': 'https://enterthegungeon.fandom.com/wiki/Bullet_Kin',
  'content': 'Bullet Kin\nBullet Kin are one of the most common enemies. They slowly walk towards the player, occasionally firing a single bullet. They can flip tables and use them as cover. They will also deal contact damage if the player touches them.\n\nOccasionally, Bullet Kin will have assault rifles, in which case they will rapidly fire 8 bullets towards the player before reloading. When an assault rifle wielding bullet kin appears, there will often be more in the same room.\n\nOn some occasions the player will also encounter incapacitated Bullet Kin lying on the floor. These Bullet Kin are props and disintegrate upon touch. They can be found in mass quantity in Oubliette.\n\nIn the Black Powder Mine, they can also ride Minecarts. In fact, if there are any unoccupied Minecarts within the room, they will take priority by walking towards them to ride in.\n\nTrivia\nBullet Kin wield Magnums. Assault-rifle

split the chunks of each document

In [3]:
import sys

sys.path.insert(0, "/home/strange/projects/rag/src")

In [4]:
from indexing.chunking.recursive_chunking import RecursiveTextSplitter

splitter = RecursiveTextSplitter(max_length=1200, overlap=100)
for doc in docs:
    chunks = splitter.split_text(doc["content"])
    doc["chunks"] = chunks

docs

[{'doc_id': 0,
  'path': 'https://enterthegungeon.fandom.com/wiki/Bullet_Kin',
  'content': 'Bullet Kin\nBullet Kin are one of the most common enemies. They slowly walk towards the player, occasionally firing a single bullet. They can flip tables and use them as cover. They will also deal contact damage if the player touches them.\n\nOccasionally, Bullet Kin will have assault rifles, in which case they will rapidly fire 8 bullets towards the player before reloading. When an assault rifle wielding bullet kin appears, there will often be more in the same room.\n\nOn some occasions the player will also encounter incapacitated Bullet Kin lying on the floor. These Bullet Kin are props and disintegrate upon touch. They can be found in mass quantity in Oubliette.\n\nIn the Black Powder Mine, they can also ride Minecarts. In fact, if there are any unoccupied Minecarts within the room, they will take priority by walking towards them to ride in.\n\nTrivia\nBullet Kin wield Magnums. Assault-rifle

In [5]:
from indexing.db.vectordb import VectorDB

db = VectorDB("retrieval_eval")
db.load_data(docs)

INFO:indexing.db.vectordb:Loading vector database from disk.


In [6]:
from evaluation.metrics.retrieval import PrecisionRecallF1

testset = pd.read_csv(
    "../../evaluation/archive/single_passage_answer_questions.csv", index_col=0
)

for doc_id, question, answer in testset.itertuples():
    print("-----------------")
    print(f"doc id: {doc_id}")
    print(f"question: {question}")
    print(f"answer: {answer}")

    results = db.search(
        "What kind of gun does the bandana bullet kin use?",
        top_k=10,
        top_r=5,
        alpha=0.6,
        rerank=True,
        debug=True,
    )

    actual: str = " ".join([result["text"] for result in results])
    expected: str = df.iloc[doc_id]["text"]

    metric = PrecisionRecallF1()
    eval_result = metric(actual=actual, reference=expected)

    print(f"Precision: {eval_result['precision']}")
    print(f"Recall: {eval_result['recall']}")
    print(f"F1: {eval_result['f1']}")

INFO:utils.context:Loading embedding model...


-----------------
doc id: 0
question: What do keybullet kin drop?
answer: Keybullet kin drop a key upon death.


INFO:FlagEmbedding.BGE_M3.modeling:The parameters of colbert_linear and sparse linear is new initialize. Make sure the model is loaded for training, not inferencing
INFO:utils.context:Unloading embedding model...
INFO:utils.context:Loading reranker model...
INFO:utils.context:Unloading reranker model...


Top Results:
Hybrid: 0.511 (Dense: 0.499, Sparse: 0.530) Reranker Score: 0.785
Content: Bullet Kin
Bullet Kin are one of the most common enemies. They slowly walk towards the player, occasionally firing a single bullet. They can flip tables and use them as cover. They will also deal contact damage if the player touches them. Occasionally, Bullet Kin will have assault rifles, in which c...

Hybrid: 0.679 (Dense: 0.588, Sparse: 0.817) Reranker Score: 0.751
Content: In the Portuguese translation of the game, they are known as "Balùnculo", a portmanteau of the words "bala" (bullet) and "homúnculo" (homunculus). Bullet Kin makes a playable appearance in the platform fighting games Indie Pogo and Indie Game Battle. Bullet Kin is also a crossover skin in the game R...

Hybrid: 0.550 (Dense: 0.463, Sparse: 0.681) Reranker Score: 0.557
Content: this quote "Balle au bois dormant" is also a wordplay between the fairytale "La belle au bois dormant" (Sleeping Beauty) and "Balle" (Bullet) Like its n

INFO:utils.context:Loading reranker model...


Precision: 0.11290322580645161
Recall: 0.175
F1: 0.1372549019607843
-----------------
doc id: 0
question: What kind of gun does the bandana bullet kin use?
answer: The bandana bullet kin wields a machine pistol.


INFO:utils.context:Unloading reranker model...
INFO:utils.context:Loading reranker model...


Top Results:
Hybrid: 0.511 (Dense: 0.499, Sparse: 0.530) Reranker Score: 0.785
Content: Bullet Kin
Bullet Kin are one of the most common enemies. They slowly walk towards the player, occasionally firing a single bullet. They can flip tables and use them as cover. They will also deal contact damage if the player touches them. Occasionally, Bullet Kin will have assault rifles, in which c...

Hybrid: 0.679 (Dense: 0.588, Sparse: 0.817) Reranker Score: 0.751
Content: In the Portuguese translation of the game, they are known as "Balùnculo", a portmanteau of the words "bala" (bullet) and "homúnculo" (homunculus). Bullet Kin makes a playable appearance in the platform fighting games Indie Pogo and Indie Game Battle. Bullet Kin is also a crossover skin in the game R...

Hybrid: 0.550 (Dense: 0.463, Sparse: 0.681) Reranker Score: 0.557
Content: this quote "Balle au bois dormant" is also a wordplay between the fairytale "La belle au bois dormant" (Sleeping Beauty) and "Balle" (Bullet) Like its n

INFO:utils.context:Unloading reranker model...
INFO:utils.context:Loading reranker model...


Top Results:
Hybrid: 0.511 (Dense: 0.499, Sparse: 0.530) Reranker Score: 0.785
Content: Bullet Kin
Bullet Kin are one of the most common enemies. They slowly walk towards the player, occasionally firing a single bullet. They can flip tables and use them as cover. They will also deal contact damage if the player touches them. Occasionally, Bullet Kin will have assault rifles, in which c...

Hybrid: 0.679 (Dense: 0.588, Sparse: 0.817) Reranker Score: 0.751
Content: In the Portuguese translation of the game, they are known as "Balùnculo", a portmanteau of the words "bala" (bullet) and "homúnculo" (homunculus). Bullet Kin makes a playable appearance in the platform fighting games Indie Pogo and Indie Game Battle. Bullet Kin is also a crossover skin in the game R...

Hybrid: 0.550 (Dense: 0.463, Sparse: 0.681) Reranker Score: 0.557
Content: this quote "Balle au bois dormant" is also a wordplay between the fairytale "La belle au bois dormant" (Sleeping Beauty) and "Balle" (Bullet) Like its n

INFO:utils.context:Unloading reranker model...
INFO:utils.context:Loading reranker model...


Top Results:
Hybrid: 0.511 (Dense: 0.499, Sparse: 0.530) Reranker Score: 0.785
Content: Bullet Kin
Bullet Kin are one of the most common enemies. They slowly walk towards the player, occasionally firing a single bullet. They can flip tables and use them as cover. They will also deal contact damage if the player touches them. Occasionally, Bullet Kin will have assault rifles, in which c...

Hybrid: 0.679 (Dense: 0.588, Sparse: 0.817) Reranker Score: 0.751
Content: In the Portuguese translation of the game, they are known as "Balùnculo", a portmanteau of the words "bala" (bullet) and "homúnculo" (homunculus). Bullet Kin makes a playable appearance in the platform fighting games Indie Pogo and Indie Game Battle. Bullet Kin is also a crossover skin in the game R...

Hybrid: 0.550 (Dense: 0.463, Sparse: 0.681) Reranker Score: 0.557
Content: this quote "Balle au bois dormant" is also a wordplay between the fairytale "La belle au bois dormant" (Sleeping Beauty) and "Balle" (Bullet) Like its n

INFO:utils.context:Unloading reranker model...
INFO:utils.context:Loading reranker model...


Top Results:
Hybrid: 0.511 (Dense: 0.499, Sparse: 0.530) Reranker Score: 0.785
Content: Bullet Kin
Bullet Kin are one of the most common enemies. They slowly walk towards the player, occasionally firing a single bullet. They can flip tables and use them as cover. They will also deal contact damage if the player touches them. Occasionally, Bullet Kin will have assault rifles, in which c...

Hybrid: 0.679 (Dense: 0.588, Sparse: 0.817) Reranker Score: 0.751
Content: In the Portuguese translation of the game, they are known as "Balùnculo", a portmanteau of the words "bala" (bullet) and "homúnculo" (homunculus). Bullet Kin makes a playable appearance in the platform fighting games Indie Pogo and Indie Game Battle. Bullet Kin is also a crossover skin in the game R...

Hybrid: 0.550 (Dense: 0.463, Sparse: 0.681) Reranker Score: 0.557
Content: this quote "Balle au bois dormant" is also a wordplay between the fairytale "La belle au bois dormant" (Sleeping Beauty) and "Balle" (Bullet) Like its n

INFO:utils.context:Unloading reranker model...
INFO:utils.context:Loading reranker model...


Top Results:
Hybrid: 0.511 (Dense: 0.499, Sparse: 0.530) Reranker Score: 0.785
Content: Bullet Kin
Bullet Kin are one of the most common enemies. They slowly walk towards the player, occasionally firing a single bullet. They can flip tables and use them as cover. They will also deal contact damage if the player touches them. Occasionally, Bullet Kin will have assault rifles, in which c...

Hybrid: 0.679 (Dense: 0.588, Sparse: 0.817) Reranker Score: 0.751
Content: In the Portuguese translation of the game, they are known as "Balùnculo", a portmanteau of the words "bala" (bullet) and "homúnculo" (homunculus). Bullet Kin makes a playable appearance in the platform fighting games Indie Pogo and Indie Game Battle. Bullet Kin is also a crossover skin in the game R...

Hybrid: 0.550 (Dense: 0.463, Sparse: 0.681) Reranker Score: 0.557
Content: this quote "Balle au bois dormant" is also a wordplay between the fairytale "La belle au bois dormant" (Sleeping Beauty) and "Balle" (Bullet) Like its n

INFO:utils.context:Unloading reranker model...
INFO:utils.context:Loading reranker model...


Top Results:
Hybrid: 0.511 (Dense: 0.499, Sparse: 0.530) Reranker Score: 0.785
Content: Bullet Kin
Bullet Kin are one of the most common enemies. They slowly walk towards the player, occasionally firing a single bullet. They can flip tables and use them as cover. They will also deal contact damage if the player touches them. Occasionally, Bullet Kin will have assault rifles, in which c...

Hybrid: 0.679 (Dense: 0.588, Sparse: 0.817) Reranker Score: 0.751
Content: In the Portuguese translation of the game, they are known as "Balùnculo", a portmanteau of the words "bala" (bullet) and "homúnculo" (homunculus). Bullet Kin makes a playable appearance in the platform fighting games Indie Pogo and Indie Game Battle. Bullet Kin is also a crossover skin in the game R...

Hybrid: 0.550 (Dense: 0.463, Sparse: 0.681) Reranker Score: 0.557
Content: this quote "Balle au bois dormant" is also a wordplay between the fairytale "La belle au bois dormant" (Sleeping Beauty) and "Balle" (Bullet) Like its n

INFO:utils.context:Unloading reranker model...
INFO:utils.context:Loading reranker model...


Top Results:
Hybrid: 0.511 (Dense: 0.499, Sparse: 0.530) Reranker Score: 0.785
Content: Bullet Kin
Bullet Kin are one of the most common enemies. They slowly walk towards the player, occasionally firing a single bullet. They can flip tables and use them as cover. They will also deal contact damage if the player touches them. Occasionally, Bullet Kin will have assault rifles, in which c...

Hybrid: 0.679 (Dense: 0.588, Sparse: 0.817) Reranker Score: 0.751
Content: In the Portuguese translation of the game, they are known as "Balùnculo", a portmanteau of the words "bala" (bullet) and "homúnculo" (homunculus). Bullet Kin makes a playable appearance in the platform fighting games Indie Pogo and Indie Game Battle. Bullet Kin is also a crossover skin in the game R...

Hybrid: 0.550 (Dense: 0.463, Sparse: 0.681) Reranker Score: 0.557
Content: this quote "Balle au bois dormant" is also a wordplay between the fairytale "La belle au bois dormant" (Sleeping Beauty) and "Balle" (Bullet) Like its n

INFO:utils.context:Unloading reranker model...
INFO:utils.context:Loading reranker model...


Top Results:
Hybrid: 0.511 (Dense: 0.499, Sparse: 0.530) Reranker Score: 0.785
Content: Bullet Kin
Bullet Kin are one of the most common enemies. They slowly walk towards the player, occasionally firing a single bullet. They can flip tables and use them as cover. They will also deal contact damage if the player touches them. Occasionally, Bullet Kin will have assault rifles, in which c...

Hybrid: 0.679 (Dense: 0.588, Sparse: 0.817) Reranker Score: 0.751
Content: In the Portuguese translation of the game, they are known as "Balùnculo", a portmanteau of the words "bala" (bullet) and "homúnculo" (homunculus). Bullet Kin makes a playable appearance in the platform fighting games Indie Pogo and Indie Game Battle. Bullet Kin is also a crossover skin in the game R...

Hybrid: 0.550 (Dense: 0.463, Sparse: 0.681) Reranker Score: 0.557
Content: this quote "Balle au bois dormant" is also a wordplay between the fairytale "La belle au bois dormant" (Sleeping Beauty) and "Balle" (Bullet) Like its n

INFO:utils.context:Unloading reranker model...


Top Results:
Hybrid: 0.511 (Dense: 0.499, Sparse: 0.530) Reranker Score: 0.785
Content: Bullet Kin
Bullet Kin are one of the most common enemies. They slowly walk towards the player, occasionally firing a single bullet. They can flip tables and use them as cover. They will also deal contact damage if the player touches them. Occasionally, Bullet Kin will have assault rifles, in which c...

Hybrid: 0.679 (Dense: 0.588, Sparse: 0.817) Reranker Score: 0.751
Content: In the Portuguese translation of the game, they are known as "Balùnculo", a portmanteau of the words "bala" (bullet) and "homúnculo" (homunculus). Bullet Kin makes a playable appearance in the platform fighting games Indie Pogo and Indie Game Battle. Bullet Kin is also a crossover skin in the game R...

Hybrid: 0.550 (Dense: 0.463, Sparse: 0.681) Reranker Score: 0.557
Content: this quote "Balle au bois dormant" is also a wordplay between the fairytale "La belle au bois dormant" (Sleeping Beauty) and "Balle" (Bullet) Like its n

KeyboardInterrupt: 