# Retrieval Evaluation

We'd like to compare several retrieval techniques to find the best for our use case.

The options are:
- lexical search
- semantic search
- a hybrid of lexical and semantic search

We'll use ground truth data obtained by generating questions for each of the constitution's articles by use of an LLM.

We'll run these questions through the retrieval process and use two metrics for evaluating the results:
- hit rate
- mean reciprocal rank


As a baseline, we'll use the tiny [minsearch][0] library which implements simple lexical search using [TF-IDF][1].

[0]: https://github.com/alexeygrigorev/minsearch/blob/main/minsearch.py
[1]: https://en.wikipedia.org/wiki/Tf%E2%80%93idf

## Data preparation

In [12]:
pip install -q pandas tqdm

Note: you may need to restart the kernel to use updated packages.


In [2]:
import pandas as pd

Load previously generated ground truth data:

In [3]:
df_ground_truth = pd.read_csv("ground_truth_data.csv")
df_ground_truth

Unnamed: 0,question,article_number
0,Who holds all sovereign power in Kenya accordi...,1
1,How can the people of Kenya exercise their sov...,1
2,To which state organs is sovereign power deleg...,1
3,At which levels is the sovereign power of the ...,1
4,What makes this Constitution the ultimate auth...,2
...,...,...
1312,What was the previous constitution that was in...,264
1313,When did the previous constitution cease to be...,264
1314,What happened to the previous constitution on ...,264
1315,What is the Sixth Schedule in relation to the ...,264


It has two columns, the question and the associated article number. There are several questions for each article.

Next, we load the articles:

In [4]:
![ ! -f constitution.json ] && wget https://raw.githubusercontent.com/programmer-ke/constitution_kenya/refs/heads/master/json/ConstitutionKenya2010.json -O constitution.json

In [5]:
import json

with open('constitution.json', 'rt') as f:
    articles = json.load(f)

documents = []
for article in articles:
    
    article_text = "".join(article['lines'])
    article_title = f"Article {article['number']}: {article['title']}"
    chapter_number, chapter_title = article['chapter']
    chapter_text = f"Chapter {chapter_number}: {chapter_title}"
    part_text = ""
    
    if article['part']:
        part_num, part_title = article['part']
        part_text = f'Part {part_num}: {part_title}'
        
    documents.append({
        "title": article_title,
        "clauses": article_text,
        "chapter": chapter_text,
        "part": part_text,
        "number": article['number']
    })

In [6]:
documents[:2]

[{'title': 'Article 1: Sovereignty of the people.',
  'clauses': '(1)  All sovereign power belongs to the people of Kenya and shall be exercised\nonly in accordance with this Constitution.\n(2)  The people may exercise their sovereign power either directly or through their\ndemocratically elected representatives.\n(3)  Sovereign power under this Constitution is delegated to the following State\norgans, which shall perform their functions in accordance with this Constitution—\n(a) Parliament and the legislative assemblies in the county governments;\n(b) the national executive and the executive structures in the county\ngovernments; and\n(c) the Judiciary and independent tribunals.\n(4)  The sovereign power of the people is exercised at—\n(a) the national level; and\n(b) the county level.\n',
  'chapter': 'Chapter 1: SOVEREIGNTY OF THE PEOPLE AND SUPREMACY OF THIS CONSTITUTION',
  'part': '',
  'number': 1},
 {'title': 'Article 2: Supremacy of this Constitution.',
  'clauses': '(1)  This

In [7]:
ground_truth = df_ground_truth.to_dict(orient='records')
ground_truth[:3]

[{'question': 'Who holds all sovereign power in Kenya according to this Constitution?',
  'article_number': 1},
 {'question': 'How can the people of Kenya exercise their sovereign power?',
  'article_number': 1},
 {'question': 'To which state organs is sovereign power delegated?',
  'article_number': 1}]

## Baseline with minsearch

Get minsearch:

In [8]:
![ ! -f minsearch.py ] && wget https://raw.githubusercontent.com/alexeygrigorev/minsearch/refs/heads/main/minsearch.py -O minsearch.py

Next is to index the documents for search:

In [9]:
import minsearch

index = minsearch.Index(text_fields=['title', 'clauses', 'chapter', 'part'], keyword_fields=[])
index.fit(documents)

<minsearch.Index at 0x7f67225b58b0>

In [10]:
def search(query):
    return index.search(query, num_results=5)

Test search with the first question:

In [11]:
search(ground_truth[0]['question'])

[{'title': 'Article 2: Supremacy of this Constitution.',
  'clauses': '(1)  This Constitution is the supreme law of the Republic and binds all persons\nand all State organs at both levels of government.\n(2)  No person may claim or exercise State authority except as authorised under\nthis Constitution.\n(3)  The validity or legality of this Constitution is not subject to challenge by or\nbefore any court or other State organ.\n(4)  Any law, including customary law, that is inconsistent with this Constitution\nis void to the extent of the inconsistency, and any act or omission in contravention\nof this Constitution is invalid.\n(5)  The general rules of international law shall form part of the law of Kenya.\n(6)  Any treaty or convention ratified by Kenya shall form part of the law of Kenya\nunder this Constitution.\n',
  'chapter': 'Chapter 1: SOVEREIGNTY OF THE PEOPLE AND SUPREMACY OF THIS CONSTITUTION',
  'part': '',
  'number': 2},
 {'title': 'Article 255: Amendment of this Constitu

We'll determine the relevance of each set of results by indicating which of the results is the correct answer as per the ground truth:

In [15]:
from tqdm import tqdm
relevance = []

for q in tqdm(ground_truth):
    results = search(q['question'])
    hits = [result['number'] == q['article_number'] for result in results]
    relevance.append(hits)

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1317/1317 [00:05<00:00, 261.35it/s]


The results for the first few questions, showing where the expected response is in the 5 responses:

In [17]:
relevance[:7]

[[False, False, False, True, False],
 [True, False, False, False, False],
 [False, False, True, False, False],
 [True, False, False, False, False],
 [True, False, False, False, False],
 [False, False, False, False, False],
 [True, False, False, False, False]]

### Hit Rate

The hit rate metric will be defined as any time the correct answer to the question is found within the first five results

In [18]:
def hit_rate(relevance):
    target_found = [any(results) for results in relevance]
    hit_count = sum(target_found)
    return hit_count / len(relevance)

In [19]:
hit_rate(relevance[:7])

0.8571428571428571

There's an 86% hit rate for the first 5 questions.

### Mean Reciprocal Rank

MRR shows how highly ranked the expected response it. The higher it is ranked in the set of results, the greater the score.

The reciprocal rank is calculcated based on the position of the correct response in the set of results, as 1/N where N is the position. First position would result in 1, second 1/2, third 1/3 and so on. This is then averaged over all the questions.

In [22]:
def rr(question_results):
    for i, hit in enumerate(question_results):
        if hit:
            return 1/(i + 1)
    return 0

def mrr(relevance):
    scores = [rr(hits) for hits in relevance]
    total = sum(scores)
    return total / len(relevance)

Calculating MRR for the first few results:

In [24]:
mrr(relevance[:7])

0.6547619047619048

Calculating the two metrics over all questions:

In [25]:
def score(relevance):
    return {'hit_rate': hit_rate(relevance), 'mrr': mrr(relevance)}

In [26]:
score(relevance)

{'hit_rate': 0.5535307517084282, 'mrr': 0.41580612503163755}

## Lexical Search with Whoosh

[Whoosh][5] is a pure python embedeable search library with many useful features, striking a balance between the barebones minsearch library used previously and heavyweight search solutions like elastic search.

[5]: https://github.com/mchaput/whoosh