# Retrieval‑Augmented Generation (RAG) Workshop & Notebook Guide  
*A step‑by‑step companion for developers new to GenAI*

## 0  |  Purpose of this Notebook

The goal of the workshop is to **contrast “plain prompting” with Retrieval‑Augmented Generation (RAG)** and let you experience, hands‑on, how adding a retrieval step and function‑calling tools makes GPT‑4o‑mini:

* **more accurate** (grounded answers, fewer hallucinations)  
* **faster & cheaper** on larger corpora  
* **deterministic** when you need it (e.g. function calls)

## 1  |  Getting Ready

| What | Why |
|------|-----|
| **Python ≥ 3.10** | required by LangChain 0.2+ |
| **OpenAI account & `OPENAI_API_KEY`** | to access `gpt‑4o‑mini` |
| **Packages** | `pip install -r requirements.txt` |
| **Workshop data** | 50 short markdown files (≈ 90 kB total) provided in `/data/docs/` |

Clone the repo or pull the zipped folder, then open **`RAG‑workshop.ipynb`** in Jupyter or VS Code.

In [None]:
# Run tot install python packages.
%pip install -r requirements.txt

### 1.1. LLM call, OpenAI

In [81]:
#Prompting API example.
import importlib,utils
importlib.reload(utils)
from utils import gpt_4o_mini

query = "Say 'Hello world' in five different languages."
print(gpt_4o_mini(query).content)

Sure! Here’s "Hello world" in five different languages:

1. Spanish: ¡Hola mundo!
2. French: Bonjour le monde !
3. German: Hallo Welt!
4. Italian: Ciao mondo!
5. Japanese: こんにちは世界 (Konnichiwa sekai)! 

Let me know if you need anything else!


### 1.2. Baseline data, Pensioenswet


In [82]:
import glob

# Index of file to be printed
file_index = 1

for filepath in glob.glob("data/docs/*.txt")[file_index:file_index+1]:
    with open(filepath, "r", encoding="utf-8") as f:
        print(f"Firm:{filepath} \nText:{f.read()} ")


Firm:data/docs\Pensioenwet.txt 
Text:

Pensioenwet

Artikel 17. Gelijke premie

1.	De door of voor een deelnemer verschuldigde premie voor pensioen op opbouwbasis bedraagt voor alle deelnemers een gelijk percentage van het loon dat voor de pensioenberekening in aanmerking wordt genomen.

2.	De door of voor een deelnemer verschuldigde premie voor pensioen op risicobasis bedraagt bij een pensioenregeling uitgevoerd door een verplichtgesteld bedrijfstakpensioenfonds een gelijk percentage van het loon dat voor de pensioenberekening in aanmerking wordt genomen.

3.	Voor verschillende vormen van pensioen en voor verschillende pensioenregelingen kunnen verschillende premies worden vastgesteld. Voor verschillende pensioenregelingen die worden uitgevoerd door hetzelfde verplichtgestelde bedrijfstakpensioenfonds kunnen geen verschillende premies worden vastgesteld indien die pensioenregelingen dezelfde of nagenoeg dezelfde inhoud hebben.

4.	Het eerste en tweede lid zijn niet van toepassing op d

In [18]:
# Token count
import importlib, utils
importlib.reload(utils)
from utils import count_tokens_in_docs

print(count_tokens_in_docs())

AOW.txt: 300 tokens
Pensioenwet.txt: 840 tokens
Uitvoeringsbesluit loonbelasting 1965.txt: 837 tokens
Wet op loonbelasting 1964.txt: 1153 tokens
Wet Toekomst Pensioenen.txt: 3344 tokens
Wet verplichte beroepspensioenregeling.txt: 519 tokens

Total tokens across all files: 6993


## 2  |  Baselines — No RAG

### 2.1  Naïve “Stuff‑Everything” Prompt

Chunk all text together, question answered correctly at random.

In [95]:
import importlib, utils
importlib.reload(utils)
from utils import gpt_4o_mini
import glob
# TODO: Increase size of text chunks to demonstrate this.

# Chunks all text files together in a single string.
text = ""
for filepath in glob.glob("data/docs/*.txt"):
    with open(filepath, "r", encoding="utf-8") as f:
        text += f.read()

SYSTEM = f"""The text between triple backtics are multiple Dutch law texts, I will ask retrieval-based questions about them: ```{text}```"""
query = "In case a person abides to a premium percentage of 27%, what is the minimal applicable franchise-value? Keep the answer below 50 words."
response = gpt_4o_mini(user_message=query,system_message=SYSTEM)
print(f"Response: {response.content}\n")

Response: The minimal applicable franchise-value, given a premium percentage of 27%, would be €18,475, as specified in Article 18a of the Wet op de loonbelasting 1964.



### 2.2  File looping, asking the question seperately per firm.

In [97]:
import glob
for filepath in glob.glob("data/docs/*.txt")[3:4]:
    with open(filepath, "r", encoding="utf-8") as f:
        text = f.read()

    # Get the file name without the extension
    file_name = str(filepath).split("\\")[-1].split(".")[0].upper()

    # Query
    SYSTEM = f"""The text between triple backtics is one Dutch law text, I will ask retrieval-based questions about it: ```{text}```"""
    query = "In case a person abides to a premium percentage of 27%, what is the minimal applicable franchise-value? Keep the answer below 50 words."
    response = gpt_4o_mini(user_message=query,system_message=SYSTEM)
    print(f"File: {file_name} \nResponse: {response.content}\n")

File: WET OP LOONBELASTING 1964 
Response: The minimal applicable franchise value, given a premium percentage of 27%, is €18,475, as this is the base amount specified. However, lower amounts may be applicable under specific conditions set by regulations, but these details would require further clarification.



**Expected result:** Stuffed prompt does not denote that the 18475 does not hold for different percentages.

**Takeaway:** Reducing input information size improves answer accuracy.

## 3  |  Function‑Calling Experiments

The next cell registers a simple calculator tool:

Example of many digit computation. A transformer does not 'reason', it infers the answer likely to be the case form a test-corpus.

### 3.1. Asking ChatGPT directly, incorrect!

In [16]:
import importlib, utils
importlib.reload(utils)
from utils import gpt_4o_mini

# Wrong answer:
print(gpt_4o_mini("What is 12534 x 37245?, return me only the answer, no seperators").content)

466240730.


### 3.2. Using a tool call, infers arguments from the prompt.

In [9]:

import importlib, utils
importlib.reload(utils)
from utils import gpt_4o_mini, tool_schema
import json

#Define tool_schema for the function multiplier.
tool_schema = {
    "type": "function",
    "function": {
        "name": "multiplier",
        "parameters": {
            "type": "object",
            "description": "Multiply two numbers.",
            "properties": {
                "x": {
                    "type": "integer",
                    "description": "First number to multiply, must be integer or float.",
                },
                "y": {
                    "type": "integer",
                    "description": "First number to multiply, must be integer or float.",
                },
            },
            "required": ["x", "y"],
        },
    },
}

#Call arguments.
arguments = json.loads(gpt_4o_mini("What is 12534 x 37245?",tool_schema=tool_schema).tool_calls[0].function.arguments)

#Calculate answer using the arguments.
print(arguments["x"]*arguments["y"])


466828830


**Expected result:** Inaccurate computation when asking ChatGPT directly.

**Takeaway:** Questions with exact answers should be answered using a tool-call.

## 4  |  Intro to Embeddings

### 4.1  Text Embeddings & Cosine Similarity

In [None]:
from sentence_transformers import SentenceTransformer, util
for model in ["all-MiniLM-L6-v2"]:
    print(model)

    #Baseline similarity
    sim = util.cos_sim(SentenceTransformer(model).encode("cat"), SentenceTransformer(model).encode("kitten"))
    print(sim[0][0])

    #Decreasing attention for word that differs
    sim = util.cos_sim(SentenceTransformer(model).encode("cat falls from the sky"), SentenceTransformer(model).encode("kitten falls from the sky"))
    print(sim[0][0])

    #Decreasing attention for word that differs
    sim = util.cos_sim(SentenceTransformer(model).encode("cat falls from the sky and lands hard on the ground"), SentenceTransformer(model).encode("kitten falls from the sky and lands hard on the ground"))
    print(sim[0][0])


all-MiniLM-L6-v2
tensor(0.7882)
tensor(0.8582)
tensor(0.8805)


Expect similarity around **0.8 – 0.9** → words are semantically close.

### 4.2  Text Embeddings: Good vs Bad Models

| Model | |
|-------|---------------------------------------------|
| **`all-MiniLM-L6-v2`** | **≈ 0.80** (good) |
| **`...`** | **≈ 0.20** (bad) |

Some “general” models optimise for *token prediction*, not semantic retrieval.

## 5  |  Sentence embeddings and vector-store using Langchain

*Tip:* Keep the index serialized to disk so you can reuse it between sessions.

## 6  |  LangChain RAG Pipeline (Working Demo)

*Outcome:* GPT first fetches the top‑K relevant chunks, then answers with citations.

## 7  |  Scaling Experiments

### 7.1  “Brute‑Force” Paragraph Loop (Again)
Run against all 50 docs → ~30 s, 50 API calls.

### 7.2  RAG: Filter ➜ Generate

1. **Retrieve `k=4`** docs (~0.9 s)  
2. Loop only their paragraphs (~2 s total)

| Method | Latency | Tokens billed | Accuracy |
|--------|---------|---------------|----------|
| Brute loop | 30 s | 20 k | ✅ |
| RAG filter+loop | **2 s** | **5 k** | ✅ |