# Synthetic Data for Evaluation

There's need to evaluate several components of a RAG system to find which setup gives the best results. Different types of retrieval techiques can be used (lexical, semantic, hybrid search), different prompts can be tested, or different kinds of models used for generation.

Typically, to do this, we establish ground truth data and use standardized evaluation criteria to see which setup works best.

## Ground Truth Data

Ground truth data is accurate and verified inputs and outputs through which a process is evaluated, also known as the _gold standard_. In the case of retrieval, ground truth data could be a set of questions for which the best answer is already known. We evaluate the retrieval process by checking whether the best answer is returned for each question.

So for our use case, we'd collect a set of questions on the constitution for which the relevant article is already known and when we run these queries, we should have the correct articles returned. 

This however involves quite a bit of manual effort. To make things easier, we'll use an LLM itself to generate questions for each article, and use this synthetic data for evaluation instead.

## Data preparation

In [1]:
pip install -q pandas scikit-learn openai ipython-secrets tqdm

Note: you may need to restart the kernel to use updated packages.


In [2]:
![ ! -f constitution.json ] && wget https://raw.githubusercontent.com/programmer-ke/constitution_kenya/refs/heads/master/json/ConstitutionKenya2010.json -O constitution.json

In [3]:
import json

with open('constitution.json', 'rt') as f:
    articles = json.load(f)

In [4]:
articles[0]

{'number': 1,
 'title': 'Sovereignty of the people.',
 'lines': ['(1)  All sovereign power belongs to the people of Kenya and shall be exercised\n',
  'only in accordance with this Constitution.\n',
  '(2)  The people may exercise their sovereign power either directly or through their\n',
  'democratically elected representatives.\n',
  '(3)  Sovereign power under this Constitution is delegated to the following State\n',
  'organs, which shall perform their functions in accordance with this Constitution—\n',
  '(a) Parliament and the legislative assemblies in the county governments;\n',
  '(b) the national executive and the executive structures in the county\n',
  'governments; and\n',
  '(c) the Judiciary and independent tribunals.\n',
  '(4)  The sovereign power of the people is exercised at—\n',
  '(a) the national level; and\n',
  '(b) the county level.\n'],
 'part': None,
 'chapter': [1,
  'SOVEREIGNTY OF THE PEOPLE AND SUPREMACY OF THIS CONSTITUTION']}

For each article, we'll create text fields for title, clauses, chapter and part, and use the article's number as a unique identifier:

In [5]:
documents = []
for article in articles:
    
    article_text = "".join(article['lines'])
    article_title = f"Article {article['number']}: {article['title']}"
    chapter_number, chapter_title = article['chapter']
    chapter_text = f"Chapter {chapter_number}: {chapter_title}"
    part_text = ""
    
    if article['part']:
        part_num, part_title = article['part']
        part_text = f'Part {part_num}: {part_title}'
        
    documents.append({
        "title": article_title,
        "clauses": article_text,
        "chapter": chapter_text,
        "part": part_text,
        "number": article['number']
    })

In [6]:
documents[:2]

[{'title': 'Article 1: Sovereignty of the people.',
  'clauses': '(1)  All sovereign power belongs to the people of Kenya and shall be exercised\nonly in accordance with this Constitution.\n(2)  The people may exercise their sovereign power either directly or through their\ndemocratically elected representatives.\n(3)  Sovereign power under this Constitution is delegated to the following State\norgans, which shall perform their functions in accordance with this Constitution—\n(a) Parliament and the legislative assemblies in the county governments;\n(b) the national executive and the executive structures in the county\ngovernments; and\n(c) the Judiciary and independent tribunals.\n(4)  The sovereign power of the people is exercised at—\n(a) the national level; and\n(b) the county level.\n',
  'chapter': 'Chapter 1: SOVEREIGNTY OF THE PEOPLE AND SUPREMACY OF THIS CONSTITUTION',
  'part': '',
  'number': 1},
 {'title': 'Article 2: Supremacy of this Constitution.',
  'clauses': '(1)  This

We create a prompt to be used for generating the questions for each article:

In [11]:
prompt_template = """
You're emulating someone who's interested in learning more about the kenyan constitution.
Formulate 5 questions this person would ask based on the given `ARTICLE` of the constitution.
The `ARTICLE` should contain the answer to the questions, and the questions should be complete and not too short.
Use the fewest words possible from the `ARTICLE` in the questions.
Do not mention specific clause numbers in the questions.

# ARTICLE
chapter: {chapter}
part: {part}
title: {title}
clauses: {clauses}

Provide the output in a parsable JSON array like the following without using code blocks:

["question1", "question2", ..., "question5"]
""".strip()

We'll use Mistral AI and the `open-mistral-nemo` model.

In [7]:
from openai import OpenAI
from ipython_secrets import get_secret

chat_endpoint = "https://api.mistral.ai/v1"  # for ollama point to the host/port e.g. http://localhost:11434/v1/
mistral_api_key = get_secret('MISTRAL_API_KEY')

client = OpenAI(base_url=chat_endpoint, api_key=mistral_api_key)

In [8]:
def generate_questions(doc):
    prompt = prompt_template.format(**doc)

    response = client.chat.completions.create(
        model='open-mistral-nemo',
        messages=[{"role": "user", "content": prompt}]
    )

    response = response.choices[0].message.content
    return response

In [9]:
print(documents[19])

{'title': 'Article 20: Application of Bill of Rights.', 'clauses': '(1)  The Bill of Rights applies to all law and binds all State organs and all\npersons.\n(2)  Every person shall enjoy the rights and fundamental freedoms in the Bill of\nRights to the greatest extent consistent with the nature of the right or fundamental\nfreedom.\n(3)  In applying a provision of the Bill of Rights, a court shall—\n(a) develop the law to the extent that it does not give effect to a right or\nfundamental freedom; and\n(b) adopt the interpretation that most favours the enforcement of a right\nor fundamental freedom.\n(4)  In interpreting the Bill of Rights, a court, tribunal or other authority shall\npromote—\n(a) the values that underlie an open and democratic society based on\nhuman dignity, equality, equity and freedom; and\n(b) the spirit, purport and objects of the Bill of Rights.\n(5)  In applying any right under Article 43, if the State claims that it does not have\nthe resources to implement the

Generate sample questions of the article above:

In [12]:
generate_questions(documents[19])

'["Who is bound by the Bill of Rights in Kenya?", "How should courts interpret the Bill of Rights to maximize rights enforcement?", "What principles guide courts when the state claims lack of resources to implement Article 43 rights?", "What values should courts promote when interpreting the Bill of Rights?", "How should the state prioritize resource allocation for rights implementation?"]'

We iterate through all the articles and generate a set of questions for each:

In [30]:
article_questions = {}

In [31]:
from tqdm import tqdm
import time
import openai

def generate_evaluation_data():
    for doc in tqdm(documents):
        article_num = doc['number']
        if article_num in article_questions:
            continue
        questions = generate_questions(doc)
        article_questions[article_num] = questions
        time.sleep(0.2)

We may experience throttling by the LLM endpoint, in that case, we pause for some seconds:

In [32]:
while len(article_questions) < len(documents):
    try:
        generate_evaluation_data()
    except openai.RateLimitError:
        time.sleep(30)

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 264/264 [06:15<00:00,  1.42s/it]


In [34]:
article_questions[100]

'["How does the constitution ensure women\'s representation in Parliament?", "What groups should Parliament prioritize for increased representation?", "How can Parliament promote youth representation?", "Which minorities should Parliament focus on for better representation?", "How does the constitution aim to include marginalized communities in Parliament?"]'

Then parse the strings into JSON. Some parsing errors may be encountered because of incorrect formatting by the LLM. These can be fixed manually:

In [48]:
parsed_questions = {}
for article_num, questions in article_questions.items():
    parsed_questions[article_num] = json.loads(questions)

We store the resulting data on-disk as a list of questions each with the associated article number:

In [16]:
question_csv_lines = []
for article_number, questions in parsed_questions.items():
    for question in questions:
        question_csv_lines.append((question, article_number))

In [21]:
import csv

with open('rag_evaluation_data.csv', 'wt') as f:
    writer = csv.writer(f)
    writer.writerow(('question', 'article_number'))
    for row in question_csv_lines:
        writer.writerow(row)

In [21]:
!head rag_evaluation_data.csv

question,article_number
Who holds all sovereign power in Kenya according to this Constitution?,1
How can the people of Kenya exercise their sovereign power?,1
To which state organs is sovereign power delegated?,1
At which levels is the sovereign power of the people exercised in Kenya?,1
What makes this Constitution the ultimate authority in Kenya?,2
Who can claim or exercise state power and under what conditions?,2
Can the Constitution's validity be challenged in any way?,2
What happens to laws that contradict this Constitution?,2
How do international treaties and conventions become part of Kenyan law?,2
