# Basic Retrieval Augmented Generation (RAG) with the Kenya 2010 Constitution

The goal is to be able to be able to chat with an LLM about information contained in articles of the Kenya 2010 Constitution. This is useful for getting quick responses to specific questions without having to manually scan the entire document.

## Dependencies

In [28]:
pip install -q pandas scikit-learn openai ipython-secrets

Note: you may need to restart the kernel to use updated packages.


## Pre-processing

We have a JSON copy of the constitution's articles extracted from a PDF version of the constitution in [this][0] Github repo. We fetch the JSON:

[0]: https://github.com/programmer-ke/constitution_kenya

In [8]:
![ ! -f constitution.json ] && wget https://raw.githubusercontent.com/programmer-ke/constitution_kenya/refs/heads/master/json/ConstitutionKenya2010.json -O constitution.json

In [2]:
import json

In [3]:
with open('constitution.json', 'rt') as f:
    articles = json.load(f)

In [4]:
articles[64:67]

[{'number': 65,
  'title': 'Landholding by non-citizens.',
  'lines': ['(1)  A person who is not a citizen may hold land on the basis of leasehold\n',
   'tenure only, and any such lease, however granted, shall not exceed ninety-nine years.\n',
   '(2)  If a provision of any agreement, deed, conveyance or document of whatever\n',
   'nature purports to confer on a person who is not a citizen an interest in land greater\n',
   'than a ninety-nine year lease, the provision shall be regarded as conferring on the\n',
   'person a ninety-nine year leasehold interest, and no more.\n',
   '(3)  For purposes of this Article —\n',
   '(a) a body corporate shall be regarded as a citizen only if the body corporate\n',
   'is wholly owned by one or more citizens; and\n',
   '(b) property held in trust shall be regarded as being held by a citizen only\n',
   'if all of the beneficial interest of the trust is held by persons who are\n',
   'citizens.\n',
   '(4)  Parliament may enact legislation to 

It is a list of the articles contained in the constitution with the associated article number, the chapter it belongs to, if the chapter is divided into parts, we have the part number and title, and finally the lines that compose the article's clauses.

We do some processing to make searching easier:
- combine the article lines into a single text field
- have a single text field for each of chapter, part and article title
- Where part doesn't exist, have it as a zero length string

In [5]:
search_fields = []
for article in articles:
    
    article_text = "".join(article['lines'])
    article_title = f"Article {article['number']}: {article['title']}"
    chapter_number, chapter_title = article['chapter']
    chapter_text = f"Chapter {chapter_number}: {chapter_title}"
    part_text = ""
    
    if article['part']:
        part_num, part_title = article['part']
        part_text = f'Part {part_num}: {part_title}'
        
    search_fields.append({
        "article_title": article_title,
        "article_text": article_text,
        "chapter": chapter_text,
        "part": part_text
    })
    

In [6]:
search_fields[:3]

[{'article_title': 'Article 1: Sovereignty of the people.',
  'article_text': '(1)  All sovereign power belongs to the people of Kenya and shall be exercised\nonly in accordance with this Constitution.\n(2)  The people may exercise their sovereign power either directly or through their\ndemocratically elected representatives.\n(3)  Sovereign power under this Constitution is delegated to the following State\norgans, which shall perform their functions in accordance with this Constitution—\n(a) Parliament and the legislative assemblies in the county governments;\n(b) the national executive and the executive structures in the county\ngovernments; and\n(c) the Judiciary and independent tribunals.\n(4)  The sovereign power of the people is exercised at—\n(a) the national level; and\n(b) the county level.\n',
  'chapter': 'Chapter 1: SOVEREIGNTY OF THE PEOPLE AND SUPREMACY OF THIS CONSTITUTION',
  'part': ''},
 {'article_title': 'Article 2: Supremacy of this Constitution.',
  'article_text':

In [7]:
search_fields[64:67]

[{'article_title': 'Article 65: Landholding by non-citizens.',
  'article_text': '(1)  A person who is not a citizen may hold land on the basis of leasehold\ntenure only, and any such lease, however granted, shall not exceed ninety-nine years.\n(2)  If a provision of any agreement, deed, conveyance or document of whatever\nnature purports to confer on a person who is not a citizen an interest in land greater\nthan a ninety-nine year lease, the provision shall be regarded as conferring on the\nperson a ninety-nine year leasehold interest, and no more.\n(3)  For purposes of this Article —\n(a) a body corporate shall be regarded as a citizen only if the body corporate\nis wholly owned by one or more citizens; and\n(b) property held in trust shall be regarded as being held by a citizen only\nif all of the beneficial interest of the trust is held by persons who are\ncitizens.\n(4)  Parliament may enact legislation to make further provision for the operation\nof this Article.\n',
  'chapter'

## Retrieval

We'll use a simple in memory search module for the initial attempt. It implements search by using [TF-IDF][3] for vectorization and calculating the cosine similarity between the query and the corpus of documents. In this case the corpus of documents is the collection of articles of the constitution.

[3]: https://en.wikipedia.org/wiki/Tf%E2%80%93idf

We get the module from Github and use it to index the documents:

In [10]:
![ ! -f minsearch.py ] && wget https://raw.githubusercontent.com/alexeygrigorev/minsearch/refs/heads/main/minsearch.py -O minsearch.py

In [19]:
import minsearch

index = minsearch.Index(text_fields=['article_title', 'article_text', 'chapter', 'part'], keyword_fields=[])
index.fit(search_fields)

<minsearch.Index at 0x7f7cd55aac60>

We can run a sample query:

In [23]:
sample_query = "what guarantees are there regarding freedom of expression?"
results = index.search(sample_query, num_results=5)

In [24]:
results

[{'article_title': 'Article 33: Freedom of expression.',
  'article_text': '(1)  Every person has the right to freedom of expression, which includes—\n(a) freedom to seek, receive or impart information or ideas;\n(b) freedom of artistic creativity; and\n(c) academic freedom and freedom of scientific research.\n(2)  The right to freedom of expression does not extend to—\n(a) propaganda for war;\n(b) incitement to violence;\n(c) hate speech; or\n(d) advocacy of hatred that—\n(i) constitutes ethnic incitement, vilification of others or\nincitement to cause harm; or\n(ii) is based on any ground of discrimination specified or\ncontemplated in Article 27(4).\n(3)  In the exercise of the right to freedom of expression, every person shall\nrespect the rights and reputation of others.\n',
  'chapter': 'Chapter 4: THE BILL OF RIGHTS',
  'part': 'Part 2: RIGHTS AND FUNDAMENTAL FREEDOMS'},
 {'article_title': 'Article 20: Application of Bill of Rights.',
  'article_text': '(1)  The Bill of Rights a

We can see somewhat relevant results. We create a simple wrapper function for search:

In [25]:
def search(query):
    return index.search(query, num_results=5)

## Generation

There are several options for LLM text generation, ranging from locally hosted solutions via llama.cpp and Ollama, to external service providers like OpenAI, MistralAI etc.

For simplicity, we'll use Mistral AI here via the OpenAI python client. Both Mistral and Ollama implement an OpenAI compatible API, so the same library can be used on all the three options by only changing the URL and the API Key.

With Mistral AI (at the moment of writing this), you get some free credit on signing up.

In [30]:
from openai import OpenAI
from ipython_secrets import get_secret

In [31]:
chat_endpoint = "https://api.mistral.ai/v1"  # for ollama point to the host/port e.g. http://localhost:11434/v1/
mistral_api_key = get_secret('MISTRAL_API_KEY')

client = OpenAI(base_url=chat_endpoint, api_key=mistral_api_key)

You'll be prompted above for the Mistral API key above.

Next we can send a sample message to the LLM. We'll use the [open-mistral-nemo][8] 12b open source model as our LLM.

[8]: https://mistral.ai/news/mistral-nemo/

In [33]:
model_name = "open-mistral-nemo"
prompt = "Hello, world"

response = client.chat.completions.create(
        model=model_name,
        messages=[{"role": "user", "content": prompt}]
    )

response = response.choices[0].message.content
response

"Hello! How can I assist you today? Let's chat about anything you'd like. 😊"

Now we are connected to the LLM and ready to chat!