# RAG Workshop – Dutch Pension Law  
Author: *Generated on 2025-04-30*  

This notebook accompanies an internal workshop on **Retrieval‑Augmented Generation (RAG)** with Dutch pension‑related law texts from [wetten.overheid.nl](https://wetten.overheid.nl).  
You will need:

* Python 3.10+  
* `openai`, `azure‑identity`, `langchain`, `langchain‑community`, `langchain‑huggingface`, `tiktoken`, `networkx`, `rdflib`  

From an empty source folder in which you will run the notebook, run the following commands:

1. **Clone & install**
   ```bash
   git init
   python -m venv .venv && source .venv/bin/activate
   git clone https://github.com/iAMLAB-test/workshop_llm
   pip install -r requirements.txt
   ```

Using `config_template.yaml` fill it's contexts (provided seperately) and save under `config.yaml`:

2. **Configure environment**
   ```bash
   API_KEY: [Enter your API key here]
   AZURE_ENDPOINT: [Enter your Azure endpoint here]
   API_VERSION: [Enter your API version here]
   ```

In [None]:
%pip install -r requirements.txt

### Simple LLM call.

In [23]:
#Prompting API example.
import importlib,utils
importlib.reload(utils)
from utils import gpt_4o_mini

query = "Say 'Hello world' in five different languages."
print(gpt_4o_mini(query).choices[0].message.content)

Sure! Here’s "Hello world" in five different languages:

1. Spanish: ¡Hola mundo!
2. French: Bonjour le monde !
3. German: Hallo Welt!
4. Italian: Ciao mondo!
5. Japanese: こんにちは世界 (Konnichiwa sekai) 

Let me know if you need more!


## 1 – Layout of Dutch law texts  
We will inspect local `.txt` dumps in `data/docs/`. Each file contains **one complete law** (some > 60 k tokens).

Below we load the all three files and print a *single article* per file to understand the structure.

### Random article from input law texts.

In [70]:
from pathlib import Path
import random
import importlib,utils
importlib.reload(utils)
from utils import *

# Load a law text file.
DOC_PATH = Path("data/docs")
for law_file in sorted(DOC_PATH.glob("*.txt")):
    law_text = law_file.read_text(encoding="utf-8")
    articles = split_articles(law_text)  # drop preamble

    example_idx = random.randrange(len(articles))
    article_key = list(articles.keys())[example_idx]
    print(f"\n--- {law_file.name}, total articles {len(articles)} ---")
    print(article_key)
    print(wrap_at_spaces(articles[article_key],width=100))


--- Pensioenswet.txt, total articles 323 ---
Artikel 28.
Melding door pensioenfonds inzake premieachterstand  1.      Een pensioenfonds informeert elk
kwartaal schriftelijk het verantwoordingsorgaan of het belanghebbendenorgaan en, bij het ontbreken
daarvan, de deelnemers, gewezen deelnemers en pensioengerechtigden wanneer sprake is van een
premieachterstand ter grootte van 5% van de totale door het pensioenfonds te ontvangen jaarpremie.
2.      Gedurende de in het eerste lid bedoelde situatie informeert een pensioenfonds tevens elk
kwartaal de ondernemingsraad van de onderneming die nog premie aan het pensioenfonds verschuldigd
is.  3.      Bij een algemeen pensioenfonds worden de voorgaande twee leden toegepast per
afgescheiden vermogen.

--- Uitvoeringsbesluit loonbelasting 1965.txt, total articles 50 ---
Artikel 10b
1.      Als loonbestanddelen als bedoeld in artikel 18g, tweede lid, onderdeel a, van de wet komen
in aanmerking:         a.      alle loonbestanddelen, met uitzonderi

### Count tokens per file.

In [72]:
# Token count
import importlib, utils
importlib.reload(utils)
from utils import count_tokens_in_docs

print(count_tokens_in_docs())

Pensioenswet.txt: 126338 tokens
Uitvoeringsbesluit loonbelasting 1965.txt: 20766 tokens
Wet op loonbelasting 1964.txt: 64790 tokens

Total tokens across all files: 211894


## 2 – Direct LLM retrieval vs. Article‑wise retrieval  
We compare:

1. **Whole‑law prompt** – push the complete text (~60 k tokens) → costly ✔️ fast ❌ accurate ❌  
2. **Chunked/article prompts** – iterate per article → costly ❌ slow ❌ accurate ✔️  

In [None]:
import time
import importlib,utils
importlib.reload(utils)
from utils import gpt_4o_mini, SYSTEM,show_metrics,wrap_at_spaces

QUESTION = "Wat is de franchise, welk artikel gebruik je ervoor en wat zijn uitzonderingen op de wettelijke waarde? Hou het antwoord onder de 50 woorden."

# --- Whole law ---
with open("data/docs/Wet op loonbelasting 1964.txt", "r", encoding="utf-8") as f:
    law_text = f.read()
    start = time.time()
    response = gpt_4o_mini(user_message=QUESTION,system_message=SYSTEM.format(law_text = law_text))

    elapsed = time.time()-start
    out_tokens = response.usage.completion_tokens
    in_tokens = response.usage.prompt_tokens
    ans1 = response.choices[0].message.content
    show_metrics("Hele wetstekst ingeladen", in_tokens, out_tokens, elapsed)
    print(wrap_at_spaces(ans1,100))

{
  "scenario": "whole_law_single_call",
  "input tokens": 48514,
  "output tokens": 62,
  "USD cost": 0.014629,
  "elapsed seconds": 3.135,
  "accuracy": null
}
De franchise bedraagt € 18.475 en wordt vastgesteld in artikel 18a, tweede lid. Uitzonderingen op de
wettelijke waarde zijn opgenomen in artikel 18a, eerste lid, en hebben betrekking op specifieke
situaties zoals arbeidsongeschiktheid en premies voor bepaalde pensioenregelingen.


In [None]:
import time
import importlib,utils
importlib.reload(utils)
from utils import gpt_4o_mini, SYSTEM, show_metrics, wrap_at_spaces

QUESTION = "Wat is de franchise, en wat zijn uitzonderingen op de wettelijke waarde? "+\
"Indien deze tekst hier geen expliciete informatie over geeft, antwoord met enkel 'None'. Hou het antwoord onder de 50 woorden."

start = time.time()
in_tokens,out_tokens = 0,0
answers = []
with open("data/docs/Wet op loonbelasting 1964.txt", "r", encoding="utf-8") as f:
    law_text = f.read()
    article_dict = split_articles(law_text)
    for key, value in article_dict.items():
        response = gpt_4o_mini(user_message=QUESTION,system_message=SYSTEM.format(law_text = value))
        in_tokens += response.usage.prompt_tokens
        out_tokens += response.usage.completion_tokens
        if "None" in str(response.choices[0].message.content):
            continue
        answers.append((key,response.choices[0].message.content))
    elapsed = time.time()-start
    show_metrics("Zoeken per artikel", in_tokens, out_tokens, elapsed)
    for ans1 in answers:
        print(f"\n--- {ans1[0]} ---")
        print(wrap_at_spaces(ans1[1],100))

{
  "scenario": "whole_law_single_call",
  "input tokens": 59161,
  "output tokens": 454,
  "USD cost": 0.018293,
  "elapsed seconds": 51.567
}

--- Artikel 18a ---
De franchise bedraagt € 18.475, maar kan jaarlijks worden aangepast. Een lager bedrag mag in
aanmerking worden genomen indien een lager percentage per dienstjaar wordt toegepast dan de
wettelijke waarde.

--- Artikel 31a ---
De tekst geeft geen expliciete informatie over de franchise of uitzonderingen op de wettelijke
waarde.


**Takeaway:** Looping over articles gives more accurate answers, but takes a long time. 
Both methods are costly, token-wise.

# Function‑Calling Experiments

The next cell registers a simple calculator tool:
Example of many digit computation. A transformer does not 'reason', it infers the answer likely to be the case form a test-corpus.

## 1. Asking ChatGPT directly.
Answers are different because a transformer does not 'calculate'.

The larger the numbers, the more inaccurate the answers.

In [51]:
import importlib, utils
importlib.reload(utils)
from utils import gpt_4o_mini

x,y = 1255, 3726

print(gpt_4o_mini(f"What is {x} x {y}?, return me only the answer, no seperators").choices[0].message.content)
print(x*y)


4671300
4676130


## 2. Using a tool call, infers arguments from the prompt.

In [52]:
import importlib, utils
importlib.reload(utils)
from utils import gpt_4o_mini, tool_schema, multiplier
import json

x,y = 1255, 3726

#Call arguments.
arguments = json.loads(gpt_4o_mini(f"What is {x} x {y}?",tool_schema=tool_schema).choices[0].message.tool_calls[0].function.arguments)

#Calculate answer using the arguments.
print(multiplier(arguments["x"],arguments["y"]))

4676130


# Introduction to RAG  
Large calls are brittle – we instead *retrieve* only the most relevant context with **embeddings** and feed that to the LLM.

## 1 – Creating word/sentence embeddings & measuring semantic distance

In [None]:
from sentence_transformers import SentenceTransformer, util
for model in ["all-MiniLM-L6-v2"]:
    print(model)

    #Baseline similarity
    sim = util.cos_sim(SentenceTransformer(model).encode("cat"), SentenceTransformer(model).encode("kitten"))
    print(sim[0][0])

    #Decreasing attention for word that differs
    sim = util.cos_sim(SentenceTransformer(model).encode("cat falls from the sky"), SentenceTransformer(model).encode("kitten falls from the sky"))
    print(sim[0][0])

    #Decreasing attention for word that differs
    sim = util.cos_sim(SentenceTransformer(model).encode("cat falls from the sky and lands hard on the ground"), SentenceTransformer(model).encode("kitten falls from the sky and lands hard on the ground"))
    print(sim[0][0])


In [None]:
import torch
from transformers import GPT2Tokenizer, GPT2Model
from sklearn.metrics.pairwise import cosine_distances

tok_gpt2 = GPT2Tokenizer.from_pretrained("gpt2")
mdl_gpt2 = GPT2Model.from_pretrained("gpt2")
texts = ["De werknemer heeft recht op ouderdomspensioen.",
         "De werknemer ontvangt een bonus in december."]
def embed_gpt2(text):
    with torch.no_grad():
        ids = torch.tensor([tok_gpt2.encode(text)])
        return mdl_gpt2(ids).last_hidden_state.mean(dim=1).numpy()

embs = [embed_gpt2(t) for t in texts]
dist = cosine_distances(embs)[0,1]
print("GPT‑2 distance:", dist)

In [None]:
show_metrics("gpt2_encoder", sum(map(num_tokens,texts)), 0, 0, accuracy=None)

In [None]:
def embed_openai(text, model="text-embedding-ada-002"):
    resp = openai.Embedding.create(model=model, input=text)
    return resp.data[0].embedding

embs_ada = [embed_openai(t) for t in texts]
dist_ada = cosine_distances(embs_ada)[0,1]
print("ada‑002 distance:", dist_ada)
show_metrics("ada002_embeddings", sum(map(num_tokens,texts)), 0, 0, accuracy=None)

In [None]:
from langchain_huggingface import HuggingFaceEmbeddings
hf_emb = HuggingFaceEmbeddings(model_name="mixedbread-ai/mxbai-embed-large-v1")
embs_mx = hf_emb.embed_documents(texts)
dist_mx = cosine_distances([embs_mx[0]],[embs_mx[1]])[0,0]
print("MXBAI distance:", dist_mx)
show_metrics("mxbai_embeddings", sum(map(num_tokens,texts)), 0, 0, accuracy=None)

## 2 – Create a per‑article vector store with MXBAI embeddings

In [None]:
from langchain.docstore.document import Document
from langchain.vectorstores import FAISS

docs = [Document(page_content=art, metadata={"law":law_file.name, "article_idx":i})
        for i,art in enumerate(articles)]
vectorstore = FAISS.from_documents(docs, hf_emb)
print(vectorstore)

## 3 – RAG query helper with Azure GPT

In [None]:
from langchain.chat_models import AzureChatOpenAI
from langchain.chains import ConversationalRetrievalChain

llm = AzureChatOpenAI(
    openai_api_base=openai.api_base,
    deployment_name=DEPLOYMENT,
    openai_api_version=openai.api_version,
    openai_api_key=openai.api_key,
    temperature=0
)
qa_chain = ConversationalRetrievalChain.from_llm(llm, vectorstore.as_retriever(search_kwargs={"k":5}))

query = "Welke artikel legt expliciet de relatie uit tussen de franchise en de pensioenpremie?"
start = time.time()
result = qa_chain({"question": query, "chat_history": []})
elapsed = time.time()-start
print(result["answer"])
# naive tokens
in_tokens = num_tokens(query)+sum(num_tokens(d.page_content) for d in result["source_documents"])
out_tokens = num_tokens(result["answer"])
show_metrics("RAG_top5", in_tokens, out_tokens, elapsed, accuracy=None)

## 4 – (Placeholder) Vector‑store over Knowledge Graph context  
The knowledge graph is stored at `data/KG/Law_graph.trig` (RDF TriG).  
A typical pipeline looks like:

1. Load the graph with **RDFlib**.  
2. Derive per‑article contexts by following `skos:related`, `dct:references`, etc.  
3. Serialize each article node + referenced texts into a *document* string.  
4. Embed with the same MXBAI model and (optionally) push to the same FAISS index or a dedicated one.

```python
import rdflib, networkx as nx
g = rdflib.ConjunctiveGraph().parse("data/KG/Law_graph.trig", format="trig")

# build in‑memory NetworkX graph for easier traversal
nxg = nx.DiGraph()
for s,p,o in g:
    if p in (rdflib.SKOS.related, rdflib.RDFS.seeAlso):
        nxg.add_edge(str(s), str(o))

# TODO: walk nxg from article URI → text file segment
```

> 📚 See RDFlib docs: <https://rdflib.readthedocs.io/>  
> 📚 See LangChain KG‑RAG helpers: `langchain.retrievers.KGTripleRetriever`


# Wrap‑up / Key takeaways ✅  

* **Direct prompting** on entire laws is cost‑heavy and hits context limits.  
* **Chunking** improves alignment but sacrifices latency.  
* **RAG** with a high‑quality embedding model (MXBAI) gives the *best accuracy‑per‑dollar*.  
* Integrating domain‑specific knowledge graphs can further boost recall for cross‑article references.  

Feel free to extend the notebook by  
* replacing placeholder accuracies with manual grading,  
* adding caching for embeddings,  
* deploying the FAISS index as an API.  
