# Offline RAG Evaluation

We'd like to evaluate how well our RAG answers questions on the Kenyan Constitution. We already have synthetic evaluation data generated by an LLM. For each article of the constitution, we generated questions for which that article is the answer. To evaluation the rag, we'll use two measures:

- The cosine similarity between the expected answer (relevant article) and the RAG response
- LLM as judge: We'll provide the LLM a question, the expected answer and the generated response, and ask it whether the generated response is relevant.

## Data preparation

In [1]:
pip install -q pandas tqdm

Note: you may need to restart the kernel to use updated packages.


In [2]:
import pandas as pd

Load previously generated synthetic evaluation data:

In [3]:
![ -f rag_evaluation_data.csv ] || wget https://raw.githubusercontent.com/programmer-ke/katiba-chat/refs/heads/master/notebooks/rag_evaluation_data.csv

In [4]:
df_eval_data = pd.read_csv("rag_evaluation_data.csv")
df_eval_data

Unnamed: 0,question,article_number
0,Who holds all sovereign power in Kenya accordi...,1
1,How can the people of Kenya exercise their sov...,1
2,To which state organs is sovereign power deleg...,1
3,At which levels is the sovereign power of the ...,1
4,What makes this Constitution the ultimate auth...,2
...,...,...
1312,What was the previous constitution that was in...,264
1313,When did the previous constitution cease to be...,264
1314,What happened to the previous constitution on ...,264
1315,What is the Sixth Schedule in relation to the ...,264


It has two columns, a question and the associated article number that indicates the expected response. Each article has a number of associated questions.

Next, we load the articles:

In [5]:
![ ! -f constitution.json ] && wget https://raw.githubusercontent.com/programmer-ke/constitution_kenya/refs/heads/master/json/ConstitutionKenya2010.json -O constitution.json

In [6]:
import json

with open('constitution.json', 'rt') as f:
    articles = json.load(f)

documents = []
for article in articles:
    
    article_text = "".join(article['lines'])
    article_title = f"Article {article['number']}: {article['title']}"
    chapter_number, chapter_title = article['chapter']
    chapter_text = f"Chapter {chapter_number}: {chapter_title}"
    part_text = ""
    
    if article['part']:
        part_num, part_title = article['part']
        part_text = f'Part {part_num}: {part_title}'
        
    documents.append({
        "title": article_title,
        "clauses": article_text,
        "chapter": chapter_text,
        "part": part_text,
        "number": article['number']
    })

In [7]:
documents[:2]

[{'title': 'Article 1: Sovereignty of the people.',
  'clauses': '(1)  All sovereign power belongs to the people of Kenya and shall be exercised\nonly in accordance with this Constitution.\n(2)  The people may exercise their sovereign power either directly or through their\ndemocratically elected representatives.\n(3)  Sovereign power under this Constitution is delegated to the following State\norgans, which shall perform their functions in accordance with this Constitution—\n(a) Parliament and the legislative assemblies in the county governments;\n(b) the national executive and the executive structures in the county\ngovernments; and\n(c) the Judiciary and independent tribunals.\n(4)  The sovereign power of the people is exercised at—\n(a) the national level; and\n(b) the county level.\n',
  'chapter': 'Chapter 1: SOVEREIGNTY OF THE PEOPLE AND SUPREMACY OF THIS CONSTITUTION',
  'part': '',
  'number': 1},
 {'title': 'Article 2: Supremacy of this Constitution.',
  'clauses': '(1)  This

In [8]:
evaluation_data = df_eval_data.to_dict(orient='records')
evaluation_data[:3]

[{'question': 'Who holds all sovereign power in Kenya according to this Constitution?',
  'article_number': 1},
 {'question': 'How can the people of Kenya exercise their sovereign power?',
  'article_number': 1},
 {'question': 'To which state organs is sovereign power delegated?',
  'article_number': 1}]

## Retrieval

We'll use hybrid retrieval for this evaluation. This will a combination of lexical and semantic search. For more on retrieval techniques, check out [this notebook][3].

[3]: https://github.com/programmer-ke/coding-katas/blob/master/ipynb/search_swahili.ipynb

### Lexical Search

[Whoosh][5] is a pure python embeddable lexical search library.

Install it:

[5]: https://github.com/Sygil-Dev/whoosh-reloaded

In [9]:
!pip install -q whoosh-reloaded

Create an index:

In [10]:
from pathlib import Path

from tqdm import tqdm
from whoosh import fields as F
from whoosh import index
from whoosh import qparser

schema = F.Schema(
    title=F.TEXT(stored=True),
    clauses=F.TEXT(stored=True),
    chapter=F.TEXT(stored=True),
    part=F.TEXT(stored=True),
    number=F.STORED,
)

index_dirname = "whoosh_index"
p = Path(index_dirname)
if not p.exists():
    p.mkdir()

doc_index = index.create_in(index_dirname, schema)
writer = doc_index.writer()
for doc in tqdm(documents):
    writer.add_document(**doc)
writer.commit()

100%|██████████████████████████████████████████████████████████████| 264/264 [00:00<00:00, 2079.37it/s]


Create a search function:

In [11]:
def whoosh_search(question):
    with doc_index.searcher() as searcher:
        parser = qparser.MultifieldParser(['title', 'clauses', 'chapter', 'part'], schema=schema, group=qparser.OrGroup)
        query = parser.parse(question)
        results = searcher.search(query, limit=5)
        results = [dict(r) for r in results]
    return results

In [12]:
sample_question = evaluation_data[0]['question']
sample_question

'Who holds all sovereign power in Kenya according to this Constitution?'

In [13]:
whoosh_search(sample_question)

[{'chapter': 'Chapter 1: SOVEREIGNTY OF THE PEOPLE AND SUPREMACY OF THIS CONSTITUTION',
  'clauses': '(1)  All sovereign power belongs to the people of Kenya and shall be exercised\nonly in accordance with this Constitution.\n(2)  The people may exercise their sovereign power either directly or through their\ndemocratically elected representatives.\n(3)  Sovereign power under this Constitution is delegated to the following State\norgans, which shall perform their functions in accordance with this Constitution—\n(a) Parliament and the legislative assemblies in the county governments;\n(b) the national executive and the executive structures in the county\ngovernments; and\n(c) the Judiciary and independent tribunals.\n(4)  The sovereign power of the people is exercised at—\n(a) the national level; and\n(b) the county level.\n',
  'number': 1,
  'part': '',
  'title': 'Article 1: Sovereignty of the people.'},
 {'chapter': 'Chapter 14: NATIONAL SECURITY',
  'clauses': '(1)  There are estab

### Semantic Search

Vector DB is a lightweight embeddings-based vector search DB that we'll use for semantic search.

Let's install it:

In [14]:
!pip install -q vectordb2==0.1.2 --extra-index-url https://download.pytorch.org/whl/cpu
!pip install -q spacy
!python -c "import spacy; spacy.load('en_core_web_sm')" || python -m spacy download en_core_web_sm

And next, transform the documents into embeddings.

[7]: https://huggingface.co/BAAI/bge-small-en-v1.5

In [15]:
texts = [
    """
    chapter: {chapter}
    part: {part}
    title: {title}
    clauses: {clauses}
    """.strip().format(**doc) for doc in documents
]

In [16]:
print(texts[20])

chapter: Chapter 4: THE BILL OF RIGHTS
    part: Part 1: GENERAL PROVISIONS TO THE BILL OF RIGHTS
    title: Article 21: Implementation of rights and fundamental freedoms.
    clauses: (1)  It is a fundamental duty of the State and every State organ to observe,
respect, protect, promote and fulfil the rights and fundamental freedoms in the Bill
of Rights.
(2)  The State shall take legislative, policy and other measures, including the
setting of standards, to achieve the progressive realisation of the rights guaranteed
under Article 43.
(3)  All State organs and all public officers have the duty to address the needs
of vulnerable groups within society, including women, older members of society,
persons with disabilities, children, youth, members of minority or marginalised
communities, and members of particular ethnic, religious or cultural communities.
(4)  The State shall enact and implement legislation to fulfil its international
obligations in respect of human rights and fundamental

In [17]:
from vectordb import Memory
from pathlib import Path

path = Path('vectordb_memfile')
if path.exists():
    path.unlink()

memory = Memory(memory_file=str(path))
memory.save(texts, documents)

  from tqdm.autonotebook import tqdm, trange




Then create and test the search function:

In [18]:
def vector_search(query):
    results = memory.search(query)
    matching_docs = [r['metadata'] for r in results]
    return matching_docs

In [19]:
q = evaluation_data[0]['question']; q

'Who holds all sovereign power in Kenya according to this Constitution?'

In [20]:
vector_search(q)

[{'title': 'Article 1: Sovereignty of the people.',
  'clauses': '(1)  All sovereign power belongs to the people of Kenya and shall be exercised\nonly in accordance with this Constitution.\n(2)  The people may exercise their sovereign power either directly or through their\ndemocratically elected representatives.\n(3)  Sovereign power under this Constitution is delegated to the following State\norgans, which shall perform their functions in accordance with this Constitution—\n(a) Parliament and the legislative assemblies in the county governments;\n(b) the national executive and the executive structures in the county\ngovernments; and\n(c) the Judiciary and independent tribunals.\n(4)  The sovereign power of the people is exercised at—\n(a) the national level; and\n(b) the county level.\n',
  'chapter': 'Chapter 1: SOVEREIGNTY OF THE PEOPLE AND SUPREMACY OF THIS CONSTITUTION',
  'part': '',
  'number': 1},
 {'title': 'Article 2: Supremacy of this Constitution.',
  'clauses': '(1)  This

### Hybrid Retrieval

To implement hybrid retrieval, we'll use a combination of lexical and semantic search and re-rank the results based on a combined score of both approaches.

The technique is known as **Reciprocal Rank Fusion (RRF)**. For each document in the results, we calculate a score based on what position they rank for a given query, then add up the scores for the two retrieval approaches.

The score for a particular document in a specific retrieval is calculated by the formula: _1/(k + r(d))_, where _r(d)_ is the rank of the document in the retrieval results. The higher it ranks, the larger the score. _k_ is a constant that is typically set to 60. Some properties of this formula are explained [here][9].

[9]: https://medium.com/@devalshah1619/mathematical-intuition-behind-reciprocal-rank-fusion-rrf-explained-in-2-mins-002df0cc5e2a

Implementing the rrf score:

In [21]:
def rrf_score(rank, k=60):
    return 1 / (k + rank)

The score decreases with the rank:

In [22]:
[rrf_score(i) for i in range(1, 6)]

[0.01639344262295082,
 0.016129032258064516,
 0.015873015873015872,
 0.015625,
 0.015384615384615385]

Combining the scores:

In [23]:
def rrf(ranked_results, scores):
    for i, result in enumerate(ranked_results):
        doc_id = result['number']
        scores[doc_id] = rrf_score(i + 1) + scores.get(doc_id, 0)

We implement a combined search and rerank the documents using RRF:

In [24]:
def hybrid_search(query):
    lexical_search_results = whoosh_search(query)
    semantic_search_results = vector_search(query)

    combined_scores = {}
    rrf(lexical_search_results, combined_scores)
    rrf(semantic_search_results, combined_scores)

    # re-rank the results based on score
    reranked_scores = sorted(combined_scores.items(), key=lambda x: x[1], reverse=True)
    final_results = []
    for doc_id, score in reranked_scores[:5]:
        # doc id is article number, we can access article by index
        final_results.append(documents[doc_id - 1])
    return final_results        

Test hybrid search:

In [25]:
q = evaluation_data[0]['question']; q

'Who holds all sovereign power in Kenya according to this Constitution?'

In [26]:
hybrid_search(q)

[{'title': 'Article 1: Sovereignty of the people.',
  'clauses': '(1)  All sovereign power belongs to the people of Kenya and shall be exercised\nonly in accordance with this Constitution.\n(2)  The people may exercise their sovereign power either directly or through their\ndemocratically elected representatives.\n(3)  Sovereign power under this Constitution is delegated to the following State\norgans, which shall perform their functions in accordance with this Constitution—\n(a) Parliament and the legislative assemblies in the county governments;\n(b) the national executive and the executive structures in the county\ngovernments; and\n(c) the Judiciary and independent tribunals.\n(4)  The sovereign power of the people is exercised at—\n(a) the national level; and\n(b) the county level.\n',
  'chapter': 'Chapter 1: SOVEREIGNTY OF THE PEOPLE AND SUPREMACY OF THIS CONSTITUTION',
  'part': '',
  'number': 1},
 {'title': 'Article 4: Declaration of the Republic.',
  'clauses': '(1)  Kenya i

## Generation

For simplicity, we'll use Mistral AI here via the OpenAI python client.

With Mistral AI (at the moment of writing this) you get some free credit on signing up.

In [27]:
from openai import OpenAI
from ipython_secrets import get_secret

In [28]:
chat_endpoint = "https://api.mistral.ai/v1"  # for ollama point to the host/port e.g. http://localhost:11434/v1/
mistral_api_key = get_secret('MISTRAL_API_KEY')

client = OpenAI(base_url=chat_endpoint, api_key=mistral_api_key)

Confirm we can send queries to the LLM:

In [29]:
model_name = "open-mistral-nemo"
prompt = "Hello, world"

response = client.chat.completions.create(
        model=model_name,
        messages=[{"role": "user", "content": prompt}]
    )

response = response.choices[0].message.content
response

"Hello! How can I assist you today? Let's chat about anything you'd like. 😊"

Prompt template for RAG:

In [30]:
prompt_template = """
You are an legal professional and an expert in kenyan constitutional affairs. Answer the `QUESTION` based on the provided `CONTEXT`.
Use only facts from the `CONTEXT` when answering the `QUESTION`. The `CONTEXT` contains the relevant
articles from the Kenya 2010 constitution.

# QUESTION
{question}

# CONTEXT
{context}
"""

Build the prompt from the template:

In [43]:
from textwrap import dedent

def build_prompt(query, search_results):
    context = [
        f"""
chapter: {r['chapter']}
part: {r['part']}
title: {r['title']}
clauses: {r['clauses']}
    """
        for r in search_results
    ]
    context = "".join(context)
    prompt = prompt_template.format(context=context, question=query)
    return prompt.strip()

Sample prompt:

In [44]:
results = hybrid_search(q)
print(build_prompt(q, results))

You are an legal professional and an expert in kenyan constitutional affairs. Answer the `QUESTION` based on the provided `CONTEXT`.
Use only facts from the `CONTEXT` when answering the `QUESTION`. The `CONTEXT` contains the relevant
articles from the Kenya 2010 constitution.

# QUESTION
Who holds all sovereign power in Kenya according to this Constitution?

# CONTEXT

chapter: Chapter 1: SOVEREIGNTY OF THE PEOPLE AND SUPREMACY OF THIS CONSTITUTION
part: 
title: Article 1: Sovereignty of the people.
clauses: (1)  All sovereign power belongs to the people of Kenya and shall be exercised
only in accordance with this Constitution.
(2)  The people may exercise their sovereign power either directly or through their
democratically elected representatives.
(3)  Sovereign power under this Constitution is delegated to the following State
organs, which shall perform their functions in accordance with this Constitution—
(a) Parliament and the legislative assemblies in the county governments;
(b) 

Wrap our LLM client I/O:

In [45]:
def llm(prompt):
    response = client.chat.completions.create(
        model='open-mistral-nemo',
        messages = [
            {'role': 'user', 'content': prompt}
        ]
    )
    return response.choices[0].message.content


Finally, the entire RAG flow:

In [46]:
def rag(query):
    search_results = hybrid_search(query)
    prompt = build_prompt(query, search_results)
    response = llm(prompt)
    return response

Test it:

In [47]:
q

'Who holds all sovereign power in Kenya according to this Constitution?'

In [48]:
rag(q)

'According to the provided Constitution, all sovereign power belongs to the people of Kenya. This is stated in Article 1(1), which says, "All sovereign power belongs to the people of Kenya and shall be exercised only in accordance with this Constitution." Therefore, the people of Kenya hold all sovereign power in Kenya according to this Constitution.'

## Collect RAG Responses

Up next, is to get generated responses for each question in our evaluation data. For each question we get a response from the LLM. Incase we are throttled, we pause for a few seconds:

In [49]:
import time
import openai

In [67]:
rag_responses = {}

In [68]:
def get_rag_responses():
    for i, record in enumerate(tqdm(evaluation_data)):
        article_num = record['article_number']
        if i in rag_responses:
            continue
        llm_answer = rag(record['question'])
        original_answer = documents[article_num - 1]
        assert article_num == original_answer['number']
    
        rag_responses[i] = {
            'question': record['question'],
            'llm_answer': llm_answer,
            'original_answer': original_answer,
        }
        time.sleep(0.2)

while len(rag_responses) < len(evaluation_data):
    try:
        get_rag_responses()
    except openai.RateLimitError:
        time.sleep(30)

100%|██████████████████████████████████████████████████████████████| 1317/1317 [56:50<00:00,  2.59s/it]


In [75]:
list(rag_responses.values())[:2]

[{'question': 'Who holds all sovereign power in Kenya according to this Constitution?',
  'llm_answer': 'Based on the provided context, specifically Article 1 of the Kenya 2010 Constitution, all sovereign power belongs to the people of Kenya. Therefore, the people of Kenya hold all sovereign power according to this Constitution.',
  'original_answer': {'title': 'Article 1: Sovereignty of the people.',
   'clauses': '(1)  All sovereign power belongs to the people of Kenya and shall be exercised\nonly in accordance with this Constitution.\n(2)  The people may exercise their sovereign power either directly or through their\ndemocratically elected representatives.\n(3)  Sovereign power under this Constitution is delegated to the following State\norgans, which shall perform their functions in accordance with this Constitution—\n(a) Parliament and the legislative assemblies in the county governments;\n(b) the national executive and the executive structures in the county\ngovernments; and\n(c

Save the responses to disk:

In [78]:
import json

with open('rag_evaluation_responses.json', 'wt') as f:
    json.dump(rag_responses, f)

In [81]:
!ls

basic_rag.ipynb		      rag_evaluation_data_generation.ipynb
constitution.json	      rag_evaluation_responses.json
minsearch.py		      retrieval_evaluation.ipynb
offline_rag_evaluation.ipynb  vectordb_memfile
__pycache__		      whoosh_index
rag_evaluation_data.csv


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [76]:
def to_str(**kwargs):
    str_fmt = f"""\
chapter: {chapter}
part: {part}
title: {title}
clauses: {clauses}
    """
    return str_fmt.format(**kwargs).strip()