# 🧠 Custom Dutch Named Entity Recognition (NER) with spaCy
**By Phui San Louisa Cheong**  
*NLP & Data Science Portfolio – 2025*  

This project trains a custom Dutch Named Entity Recognition model using spaCy to detect:

- `ZORGVERZEKERAAR` (health insurer)
- `POLISNUMMER` (policy number)
- `BEDRIJFSNAAM` (company name)

The training data is synthetically generated with aligned entity spans.

In [None]:
!pip install -U spacy
!python -m spacy download nl_core_news_lg

In [None]:
import spacy
import random
from spacy.training.example import Example
nlp = spacy.load('nl_core_news_lg')

## 📚 Training Data

In [None]:
TRAIN_DATA = [
    ("Mijn verzekeraar is Zilveren Kruis sinds 2020.", {"entities": [(22, 36, "ZORGVERZEKERAAR")]}),
    ("Het polisnummer op uw document is CD123456X.", {"entities": [(34, 43, "POLISNUMMER")]}),
    ("d'Arschot en Gezelschap is een erkend bedrijf in Vlaanderen.", {"entities": [(0, 25, "BEDRIJFSNAAM")]})
]

## 🏋️ Model Training

In [None]:
ner = nlp.get_pipe("ner")
ner.add_label("ZORGVERZEKERAAR")
ner.add_label("POLISNUMMER")
ner.add_label("BEDRIJFSNAAM")

other_pipes = [pipe for pipe in nlp.pipe_names if pipe != "ner"]
with nlp.disable_pipes(*other_pipes):
    optimizer = nlp.resume_training()
    for i in range(10):
        random.shuffle(TRAIN_DATA)
        losses = {}
        for text, annotations in TRAIN_DATA:
            doc = nlp.make_doc(text)
            example = Example.from_dict(doc, annotations)
            nlp.update([example], losses=losses)
        print(f"Iteration {i} Losses: {losses}")

## 🧪 Testing the Model

In [None]:
test_text = "Mijn polisnummer is CD123456X en ik ben verzekerd bij Zilveren Kruis."
doc = nlp(test_text)
for ent in doc.ents:
    print(ent.text, ent.label_)

## 🎨 Entity Visualization

In [None]:
from spacy import displacy
displacy.render(doc, style='ent', jupyter=True)

## 💾 Save and Reload the Model

In [None]:
nlp.to_disk("custom_dutch_ner_model")
# Later:
nlp = spacy.load("custom_dutch_ner_model")

## ✅ Conclusion & Next Steps
This project demonstrated how to train a custom Dutch NER model using spaCy.
The model successfully detects insurers, policy numbers, and company names.

**Next steps:**
- Expand dataset with real-world samples
- Fine-tune further with spaCy projects
- Deploy in a Streamlit or FastAPI app for live annotation
