# Retrieval Evaluation using Haystack

## Statistical Evaluation
pada bagian pertama ini kita akan melakukan statistical evaluation pada result hasil Retrieval

### Pipeline Definition
pertama-tama dilakukan definisi pipeline, disini kita akan buat pipeline retrieval untuk mengambil data dari mongodb atlas

In [1]:
from haystack import Pipeline
from haystack.components.embedders import SentenceTransformersTextEmbedder
from haystack_integrations.components.retrievers.mongodb_atlas import MongoDBAtlasEmbeddingRetriever
from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
%env MONGO_CONNECTION_STRING=mongodb+srv://user_dibimbing:gasterus@cluster0.zse9okn.mongodb.net/?retryWrites=true&w=majority&appName=Cluster0

env: MONGO_CONNECTION_STRING=mongodb+srv://user_dibimbing:gasterus@cluster0.zse9okn.mongodb.net/?retryWrites=true&w=majority&appName=Cluster0


In [3]:
document_store = MongoDBAtlasDocumentStore(
    database_name="dibimbing",
    collection_name="context_qa",
    vector_search_index="vector_index",
)

In [4]:
pipeline = Pipeline()
pipeline.add_component("embedder",SentenceTransformersTextEmbedder())
pipeline.add_component("retriever",MongoDBAtlasEmbeddingRetriever(document_store=document_store,top_k=10))
pipeline.connect("embedder","retriever")

<haystack.core.pipeline.pipeline.Pipeline object at 0x000001B3BAD526D0>
🚅 Components
  - embedder: SentenceTransformersTextEmbedder
  - retriever: MongoDBAtlasEmbeddingRetriever
🛤️ Connections
  - embedder.embedding -> retriever.query_embedding (List[float])

### Load Dataset
Selanjutnya dilakukan load dataset untuk evaluasi. Disini kita akan menggunakan Stanford Question Answering Dataset (SQuAD). SQuAD adalah sebuah dataset yang tersusun dari pertanyaan, context, dan jawban yang dibuat dengan menggunakan data pengetahuan dari Wikipedia.  
Source: https://rajpurkar.github.io/SQuAD-explorer/

In [5]:
import json
with open("datasets/qa.json","r") as f:
    dataset = json.load(f)

In [6]:
from haystack import Document
questions = []
answers = []
contexts = []
for data in dataset['data']:
  for p in data['paragraphs']:
    doc = Document(content=p['context'])
    contexts.append(doc)
    for qa in p["qas"]:
      questions.append(qa['question'])
      answers.append(qa['answers'][0]['text'])
      break

In [7]:
selected_questions = questions[200:250]
selected_answers = answers[200:250]
selected_contexts = contexts[200:250]

### Retrieve Data

In [8]:
def get_result(questions):
    results = []
    for q in questions:
        result = pipeline.run({"embedder":{"text":q}})
        results.append(result["retriever"]["documents"])
    return results

In [9]:
results = get_result(selected_questions)

Batches: 100%|███████████████████████████████████████████████████████████████████████████| 1/1 [00:03<00:00,  3.77s/it]
Batches: 100%|███████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.93it/s]
Batches: 100%|███████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.98it/s]
Batches: 100%|███████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  3.65it/s]
Batches: 100%|███████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  3.45it/s]
Batches: 100%|███████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  2.83it/s]
Batches: 100%|███████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  2.03it/s]
Batches: 100%|███████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.79it/s]
Batches: 100%|██████████████████████████

## Recall Evaluations

In [10]:
from haystack.components.evaluators import DocumentRecallEvaluator

In [11]:
recall_evaluator = DocumentRecallEvaluator()

In [12]:
recall_result = recall_evaluator.run(
    ground_truth_documents=[[s] for s in selected_contexts],
    retrieved_documents=results
)

In [13]:
print(f"Score: {recall_result['score']}")

Score: 0.98


In [14]:
import numpy as np
individual_scores = np.array(recall_result["individual_scores"])

In [15]:
np.argwhere(individual_scores==0)

array([[47]], dtype=int64)

In [16]:
results_data_check = [d.content for d in results[47]]

In [17]:
results_data_check

['The next major step occurred when James Watt developed (1763–1775) an improved version of Newcomen\'s engine, with a separate condenser. Boulton and Watt\'s early engines used half as much coal as John Smeaton\'s improved version of Newcomen\'s. Newcomen\'s and Watt\'s early engines were "atmospheric". They were powered by air pressure pushing a piston into the partial vacuum generated by condensing steam, instead of the pressure of expanding steam. The engine cylinders had to be large because the only usable force acting on them was due to atmospheric pressure.',
 "In 1781 James Watt patented a steam engine that produced continuous rotary motion. Watt's ten-horsepower engines enabled a wide range of manufacturing machinery to be powered. The engines could be sited anywhere that water and coal or wood fuel could be obtained. By 1883, engines that could provide 10,000 hp had become feasible. The stationary steam engine was a key component of the Industrial Revolution, allowing factori

In [18]:
selected_contexts[47].content

'The centrifugal governor was adopted by James Watt for use on a steam engine in 1788 after Watt’s partner Boulton saw one at a flour mill Boulton & Watt were building. The governor could not actually hold a set speed, because it would assume a new constant speed in response to load changes. The governor was able to handle smaller variations such as those caused by fluctuating heat load to the boiler. Also, there was a tendency for oscillation whenever there was a speed change. As a consequence, engines equipped only with this governor were not suitable for operations requiring constant speed, such as cotton spinning. The governor was improved over time and coupled with variable steam cut off, good speed control in response to changes in load was attainable near the end of the 19th century.'

### MRR Evaluations

In [19]:
from haystack.components.evaluators import DocumentMRREvaluator
MRR_evaluator = DocumentMRREvaluator()
MRR_result = MRR_evaluator.run(
    ground_truth_documents=[[s] for s in selected_contexts],
    retrieved_documents=results
)

In [20]:
print(f"MRR Score: {MRR_result['score']}")

MRR Score: 0.7011111111111111


In [21]:
MRR_individual_scores = np.array(MRR_result["individual_scores"])

In [22]:
MRR_individual_scores

array([0.5       , 1.        , 0.5       , 0.25      , 1.        ,
       1.        , 0.33333333, 1.        , 1.        , 1.        ,
       1.        , 1.        , 0.33333333, 0.25      , 0.11111111,
       0.1       , 0.5       , 0.5       , 1.        , 1.        ,
       1.        , 1.        , 1.        , 0.5       , 0.2       ,
       1.        , 1.        , 0.11111111, 1.        , 1.        ,
       1.        , 1.        , 1.        , 0.5       , 1.        ,
       0.5       , 1.        , 0.5       , 0.5       , 0.5       ,
       0.16666667, 0.5       , 1.        , 0.5       , 1.        ,
       1.        , 1.        , 0.        , 1.        , 0.2       ])

In [23]:
np.argwhere(MRR_individual_scores==0.5)

array([[ 0],
       [ 2],
       [16],
       [17],
       [23],
       [33],
       [35],
       [37],
       [38],
       [39],
       [41],
       [43]], dtype=int64)

In [24]:
results_data_check = [d.content for d in results[16]]
results_data_check

['Around 1685, Huguenot refugees found a safe haven in the Lutheran and Reformed states in Germany and Scandinavia. Nearly 50,000 Huguenots established themselves in Germany, 20,000 of whom were welcomed in Brandenburg-Prussia, where they were granted special privileges (Edict of Potsdam) and churches in which to worship (such as the Church of St. Peter and St. Paul, Angermünde) by Frederick William, Elector of Brandenburg and Duke of Prussia. The Huguenots furnished two new regiments of his army: the Altpreußische Infantry Regiments No. 13 (Regiment on foot Varenne) and 15 (Regiment on foot Wylich). Another 4,000 Huguenots settled in the German territories of Baden, Franconia (Principality of Bayreuth, Principality of Ansbach), Landgraviate of Hesse-Kassel, Duchy of Württemberg, in the Wetterau Association of Imperial Counts, in the Palatinate and Palatinate-Zweibrücken, in the Rhine-Main-Area (Frankfurt), in modern-day Saarland; and 1,500 found refuge in Hamburg, Bremen and Lower Sax

In [25]:
selected_contexts[16].content

'Frederick William, Elector of Brandenburg, invited Huguenots to settle in his realms, and a number of their descendants rose to positions of prominence in Prussia. Several prominent German military, cultural, and political figures were ethnic Huguenot, including poet Theodor Fontane, General Hermann von François, the hero of the First World War Battle of Tannenberg, Luftwaffe General and fighter ace Adolf Galland, Luftwaffe flying ace Hans-Joachim Marseille, and famed U-boat captain Lothar von Arnauld de la Perière. The last Prime Minister of the (East) German Democratic Republic, Lothar de Maizière, is also a descendant of a Huguenot family, as is the German Federal Minister of the Interior, Thomas de Maizière.'

### MAP Evaluations

In [26]:
from haystack.components.evaluators import DocumentMAPEvaluator
MAP_evaluator = DocumentMAPEvaluator()
MAP_result = MAP_evaluator.run(
    ground_truth_documents=[[s] for s in selected_contexts],
    retrieved_documents=results
)

In [27]:
print(f"MAP Score: {MAP_result['score']}")

MAP Score: 0.7011111111111111


In [28]:
MAP_individual_scores = np.array(MAP_result["individual_scores"])

In [29]:
MAP_individual_scores

array([0.5       , 1.        , 0.5       , 0.25      , 1.        ,
       1.        , 0.33333333, 1.        , 1.        , 1.        ,
       1.        , 1.        , 0.33333333, 0.25      , 0.11111111,
       0.1       , 0.5       , 0.5       , 1.        , 1.        ,
       1.        , 1.        , 1.        , 0.5       , 0.2       ,
       1.        , 1.        , 0.11111111, 1.        , 1.        ,
       1.        , 1.        , 1.        , 0.5       , 1.        ,
       0.5       , 1.        , 0.5       , 0.5       , 0.5       ,
       0.16666667, 0.5       , 1.        , 0.5       , 1.        ,
       1.        , 1.        , 0.        , 1.        , 0.2       ])

## Model-based Evaluation

### Context Relevance Evaluator

In [None]:
%env OPENAI_API_KEY=insert_your_token

In [31]:
from haystack.components.evaluators import ContextRelevanceEvaluator
CR_evaluator = ContextRelevanceEvaluator()
CR_result = CR_evaluator.run(questions=[[s] for s in selected_questions], contexts=[ [ r.content for r in  result] for result in results])

100%|██████████████████████████████████████████████████████████████████████████████████| 50/50 [03:10<00:00,  3.81s/it]


In [32]:
CR_result

{'results': [{'statements': ['In 1562, naval officer Jean Ribault led an expedition that explored Florida and the present-day Southeastern U.S., and founded the outpost of Charlesfort on Parris Island, South Carolina.',
    'In the early years, many Huguenots also settled in the area of present-day Charleston, South Carolina. In 1685, Rev. Elie Prioleau from the town of Pons in France, was among the first to settle there. He became pastor of the first Huguenot church in North America in that city.',
    'In 1565 the Spanish decided to enforce their claim to La Florida, and sent Pedro Menéndez de Avilés, who established the settlement of St. Augustine near Fort Caroline.',
    'In 1564 a group of Norman Huguenots under the leadership of Jean Ribault established the small colony of Fort Caroline on the banks of the St. Johns River in what is today Jacksonville, Florida.',
    'The first Huguenots to leave France sought freedom from persecution in Switzerland and the Netherlands.'],
   's

In [33]:
print(f"Context Relevance Score: {CR_result['score']}")

Context Relevance Score: 0.8811666666666667


In [34]:
CR_result['individual_scores']

[0.8,
 1.0,
 0.6666666666666666,
 1.0,
 1.0,
 0.8333333333333334,
 1.0,
 1.0,
 1.0,
 0.6666666666666666,
 0.5,
 1.0,
 1.0,
 0.875,
 1.0,
 1.0,
 1.0,
 0.6666666666666666,
 0.5,
 1.0,
 1.0,
 0.6666666666666666,
 1.0,
 1.0,
 1.0,
 0.9,
 1.0,
 1.0,
 1.0,
 1.0,
 0.3333333333333333,
 1.0,
 0.6,
 0.5,
 1.0,
 1.0,
 1.0,
 0.75,
 1.0,
 0.5,
 1.0,
 0.75,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 0.75,
 0.8,
 1.0]