# Multilingual PII Detection & De-identification Guide

This notebook demonstrates **multilingual PII detection and de-identification** in OpenMed v0.5.6+, covering:

1. **Overview** - Supported languages and model catalog
2. **Quick Start** - Using the `lang` parameter
3. **French Clinical Notes** - Extraction & de-identification
4. **German Clinical Notes** - Extraction & de-identification
5. **Italian Clinical Notes** - Extraction & de-identification
6. **Spanish Clinical Notes** - Extraction & de-identification
7. **Cross-Language Comparison** - Detecting PII across languages
8. **Language-Specific Patterns** - NIR, Steuer-ID, Codice Fiscale, DNI/NIE
9. **Accent Normalization** - Handling accented text in Spanish
10. **De-identification Methods** - Mask, replace, hash with multilingual data
11. **Date Handling** - Locale-aware date parsing and shifting
12. **Batch Processing** - Processing multilingual documents
13. **Custom Model Selection** - Choosing models by architecture

---

**Requirements:**
```bash
pip install openmed
```

**Supported Languages:** English (en), French (fr), German (de), Italian (it), Spanish (es)

---

## Setup

In [None]:
from pprint import pprint

from openmed import (
    extract_pii,
    deidentify,
    reidentify,
    PIIEntity,
    DeidentificationResult,
)
from openmed import (
    merge_entities_with_semantic_units,
    find_semantic_units,
    PII_PATTERNS,
    PIIPattern,
)
from openmed import (
    SUPPORTED_LANGUAGES,
    DEFAULT_PII_MODELS,
    LANGUAGE_PII_PATTERNS,
    get_patterns_for_language,
    get_pii_models_by_language,
    get_default_pii_model,
)

print("All imports successful!")
print(f"Supported languages: {sorted(SUPPORTED_LANGUAGES)}")

---
## 1. Overview: Model Catalog

OpenMed provides **176+ multilingual PII models** across 5 languages, each with 35 architecture variants ranging from 33M to 600M parameters.

In [None]:
print("=" * 80)
print("MULTILINGUAL PII MODEL CATALOG")
print("=" * 80)

lang_names = {
    "en": "English", "fr": "French", "de": "German",
    "it": "Italian", "es": "Spanish",
}

for lang in sorted(SUPPORTED_LANGUAGES):
    models = get_pii_models_by_language(lang)
    default = get_default_pii_model(lang)
    print(f"\n{lang_names[lang]} ({lang}):")
    print(f"  Models available: {len(models)}")
    print(f"  Default model:    {default}")

# Total count
from openmed.core.model_registry import OPENMED_MODELS
total = len([k for k in OPENMED_MODELS if k.startswith("pii_")])
print(f"\nTotal PII models registered: {total}")

In [None]:
# Show a selection of model sizes for one language
print("Example: French PII Models (sorted by size)")
print("-" * 80)

fr_models = get_pii_models_by_language("fr")
for key, info in sorted(fr_models.items(), key=lambda x: x[1].model_id):
    print(f"  {key:40s} {info.size_category:8s}  {info.model_id}")

---
## 2. Quick Start: The `lang` Parameter

The key addition is the `lang` parameter on `extract_pii()` and `deidentify()`. It automatically selects the correct model and language-specific patterns.

In [None]:
# English (default behavior, unchanged)
en_result = extract_pii(
    "Patient: John Doe, DOB: 03/15/1975, SSN: 123-45-6789",
    confidence_threshold=0.5,
    use_smart_merging=True,
)
print("English:")
for e in en_result.entities:
    print(f"  [{e.label:20s}] '{e.text}' ({e.confidence:.3f})")

print()

# French - just add lang="fr"
fr_result = extract_pii(
    "Patient : Jean Dupont, ne le 15/01/1975, NIR : 1 75 01 78 006 084 47",
    lang="fr",
    confidence_threshold=0.5,
    use_smart_merging=True,
)
print("French (lang='fr'):")
for e in fr_result.entities:
    print(f"  [{e.label:20s}] '{e.text}' ({e.confidence:.3f})")

---
## 3. French Clinical Notes

French PII detection handles DD/MM/YYYY dates, +33 phone numbers, NIR/INSEE national IDs, and French-language context.

In [None]:
french_note = """
COMPTE RENDU DE CONSULTATION
===========================
Nom du patient : Marie Dupont
Date de naissance : 15/03/1980
Numero de securite sociale : 2 80 03 75 123 456 20
Telephone : 06 12 34 56 78
Email : marie.dupont@email.fr
Adresse : 15 Rue de la Paix, 75002 Paris

Date de consultation : 10/01/2025

MOTIF DE CONSULTATION :
Mme Dupont se presente pour un suivi de diabete de type 2.

EXAMEN CLINIQUE :
Tension arterielle : 130/85 mmHg
Poids : 72 kg, Taille : 165 cm
HbA1c : 7.2%

Dr. Pierre Martin
"""

print("=" * 80)
print("FRENCH CLINICAL NOTE - PII EXTRACTION")
print("=" * 80)

fr_result = extract_pii(
    french_note,
    lang="fr",
    confidence_threshold=0.5,
    use_smart_merging=True,
)

print(f"Found {len(fr_result.entities)} PII entities:\n")
for e in fr_result.entities:
    print(f"  [{e.label:20s}] '{e.text}' ({e.confidence:.3f})")

In [None]:
# De-identify the French note
print("=" * 80)
print("FRENCH CLINICAL NOTE - DE-IDENTIFICATION")
print("=" * 80)

fr_deid = deidentify(
    french_note,
    lang="fr",
    method="mask",
    confidence_threshold=0.5,
    use_smart_merging=True,
)

print(fr_deid.deidentified_text)

---
## 4. German Clinical Notes

German PII detection handles DD.MM.YYYY dates, +49 phone numbers, Steuer-ID, and German-language context.

In [None]:
german_note = """
ARZTBRIEF
=========
Patientenname: Hans Mueller
Geburtsdatum: 22.07.1968
Steuer-ID: 12345678912
Telefon: +49 30 1234567
E-Mail: hans.mueller@email.de
Adresse: Berliner Strasse 42, 10115 Berlin

Aufnahmedatum: 05.01.2025
Entlassungsdatum: 10.01.2025

DIAGNOSE:
Akuter Myokardinfarkt (STEMI)

VERLAUF:
Herr Mueller wurde am 05.01.2025 mit Brustschmerzen
in die Notaufnahme eingeliefert.

Dr. Anna Schmidt
"""

print("=" * 80)
print("GERMAN CLINICAL NOTE - PII EXTRACTION")
print("=" * 80)

de_result = extract_pii(
    german_note,
    lang="de",
    confidence_threshold=0.5,
    use_smart_merging=True,
)

print(f"Found {len(de_result.entities)} PII entities:\n")
for e in de_result.entities:
    print(f"  [{e.label:20s}] '{e.text}' ({e.confidence:.3f})")

In [None]:
# De-identify the German note
print("=" * 80)
print("GERMAN CLINICAL NOTE - DE-IDENTIFICATION")
print("=" * 80)

de_deid = deidentify(
    german_note,
    lang="de",
    method="mask",
    confidence_threshold=0.5,
    use_smart_merging=True,
)

print(de_deid.deidentified_text)

---
## 5. Italian Clinical Notes

Italian PII detection handles DD/MM/YYYY dates, +39 phone numbers, Codice Fiscale, and Italian-language context.

In [None]:
italian_note = """
REFERTO MEDICO
==============
Nome del paziente: Marco Rossi
Data di nascita: 10/05/1972
Codice fiscale: RSSMRC72E10H501Z
Telefono: +39 333 123 4567
Email: marco.rossi@email.it
Indirizzo: Via Roma 25, 00185 Roma

Data della visita: 08/01/2025

DIAGNOSI:
Ipertensione arteriosa essenziale

TERAPIA:
Ramipril 5mg una volta al giorno

Dott. Giulia Bianchi
"""

print("=" * 80)
print("ITALIAN CLINICAL NOTE - PII EXTRACTION")
print("=" * 80)

it_result = extract_pii(
    italian_note,
    lang="it",
    confidence_threshold=0.5,
    use_smart_merging=True,
)

print(f"Found {len(it_result.entities)} PII entities:\n")
for e in it_result.entities:
    print(f"  [{e.label:20s}] '{e.text}' ({e.confidence:.3f})")

In [None]:
# De-identify the Italian note
print("=" * 80)
print("ITALIAN CLINICAL NOTE - DE-IDENTIFICATION")
print("=" * 80)

it_deid = deidentify(
    italian_note,
    lang="it",
    method="mask",
    confidence_threshold=0.5,
    use_smart_merging=True,
)

print(it_deid.deidentified_text)

---
## 6. Spanish Clinical Notes

Spanish PII detection handles DD/MM/YYYY dates, dates with the unique "de" connector (15 de marzo de 2025), +34 phone numbers, DNI/NIE national IDs, and Spanish-language context.

**Accent normalization** is enabled by default for Spanish — models trained on accent-free text will correctly detect accented names like "María García".

In [None]:
spanish_note = """
INFORME CLÍNICO
===============
Nombre del paciente: María García López
Fecha de nacimiento: 15 de marzo de 1985
DNI: 12345678Z
Teléfono: +34 612 345 678
Email: maria.garcia@email.es
Dirección: Calle Serrano 42, 28001 Madrid

Fecha de ingreso: 03/02/2026

MOTIVO DE CONSULTA:
La paciente María García acude por dolor torácico de 2 horas
de evolución. Antecedentes: hipertensión arterial.

EXPLORACIÓN FÍSICA:
Tensión arterial: 145/90 mmHg
Frecuencia cardíaca: 88 lpm

Dra. Ana Martínez
"""

print("=" * 80)
print("SPANISH CLINICAL NOTE - PII EXTRACTION")
print("=" * 80)

es_result = extract_pii(
    spanish_note,
    lang="es",
    confidence_threshold=0.5,
    use_smart_merging=True,
)

print(f"Found {len(es_result.entities)} PII entities:\n")
for e in es_result.entities:
    print(f"  [{e.label:20s}] '{e.text}' ({e.confidence:.3f})")

In [None]:
# De-identify the Spanish note
print("=" * 80)
print("SPANISH CLINICAL NOTE - DE-IDENTIFICATION")
print("=" * 80)

es_deid = deidentify(
    spanish_note,
    lang="es",
    method="mask",
    confidence_threshold=0.5,
    use_smart_merging=True,
)

print(es_deid.deidentified_text)

---
## 7. Cross-Language Comparison

Compare PII detection across all five languages with equivalent clinical text.

In [None]:
# Equivalent clinical texts in 5 languages
texts = {
    "en": "Patient: John Smith, DOB: 03/15/1975, Email: john@email.com",
    "fr": "Patient : Jean Dupont, né le 15/03/1975, Email : jean@email.fr",
    "de": "Patient: Hans Schmidt, Geburtsdatum: 15.03.1975, E-Mail: hans@email.de",
    "it": "Paziente: Marco Rossi, data di nascita: 15/03/1975, Email: marco@email.it",
    "es": "Paciente: María García, fecha de nacimiento: 15/03/1975, Email: maria@email.es",
}

lang_names = {
    "en": "English", "fr": "French", "de": "German",
    "it": "Italian", "es": "Spanish",
}

print("=" * 80)
print("CROSS-LANGUAGE PII DETECTION COMPARISON")
print("=" * 80)

for lang, text in texts.items():
    result = extract_pii(
        text,
        lang=lang,
        confidence_threshold=0.5,
        use_smart_merging=True,
    )
    print(f"\n{lang_names[lang]} ({lang}): {len(result.entities)} entities")
    for e in result.entities:
        print(f"  [{e.label:20s}] '{e.text}'")

---
## 8. Language-Specific Patterns

Each language has specialized regex patterns for national IDs, phone numbers, dates, and addresses.

In [None]:
import re

print("=" * 80)
print("LANGUAGE-SPECIFIC PII PATTERNS")
print("=" * 80)

for lang in ["fr", "de", "it", "es"]:
    patterns = LANGUAGE_PII_PATTERNS[lang]
    print(f"\n{lang_names[lang]} ({lang}): {len(patterns)} language-specific patterns")

    # Group by entity type
    from collections import defaultdict
    by_type = defaultdict(list)
    for p in patterns:
        by_type[p.entity_type].append(p)

    for etype in sorted(by_type.keys()):
        count = len(by_type[etype])
        print(f"  {etype}: {count} pattern(s)")

In [None]:
# Demonstrate national ID pattern matching
from openmed.core.pii_i18n import (
    validate_french_nir,
    validate_german_steuer_id,
    validate_italian_codice_fiscale,
    validate_spanish_dni,
    validate_spanish_nie,
)

print("=" * 80)
print("NATIONAL ID PATTERNS & VALIDATORS")
print("=" * 80)

# French NIR/INSEE
fr_nir_patterns = [p for p in LANGUAGE_PII_PATTERNS["fr"] if p.entity_type == "national_id"]
nir_text = "1 80 03 75 123 456 20"
for p in fr_nir_patterns:
    match = re.search(p.pattern, nir_text, p.flags)
    if match:
        print(f"French NIR: '{nir_text}' -> matched")

# German Steuer-ID
de_id_patterns = [p for p in LANGUAGE_PII_PATTERNS["de"] if p.entity_type == "national_id"]
steuer_text = "12345678912"
for p in de_id_patterns:
    match = re.search(p.pattern, steuer_text, p.flags)
    if match:
        print(f"German Steuer-ID: '{steuer_text}' -> matched")

# Italian Codice Fiscale
it_id_patterns = [p for p in LANGUAGE_PII_PATTERNS["it"] if p.entity_type == "national_id"]
cf_text = "RSSMRC72E10H501Z"
for p in it_id_patterns:
    match = re.search(p.pattern, cf_text, p.flags)
    if match:
        print(f"Italian Codice Fiscale: '{cf_text}' -> matched")

# Spanish DNI
es_id_patterns = [p for p in LANGUAGE_PII_PATTERNS["es"] if p.entity_type == "national_id"]
dni_text = "12345678Z"
for p in es_id_patterns:
    match = re.search(p.pattern, dni_text, p.flags)
    if match:
        print(f"Spanish DNI: '{dni_text}' -> matched")

# Spanish NIE
nie_text = "X1234567L"
for p in es_id_patterns:
    match = re.search(p.pattern, nie_text, p.flags)
    if match:
        print(f"Spanish NIE: '{nie_text}' -> matched")

# Checksum validators
print("\nChecksum validation:")
print("-" * 50)
print(f"  validate_spanish_dni('12345678Z') = {validate_spanish_dni('12345678Z')}")   # True
print(f"  validate_spanish_dni('12345678A') = {validate_spanish_dni('12345678A')}")   # False
print(f"  validate_spanish_nie('X1234567L') = {validate_spanish_nie('X1234567L')}")   # True
print(f"  validate_spanish_nie('A1234567L') = {validate_spanish_nie('A1234567L')}")   # False

# Show context-aware scoring with semantic units
print("\nContext-aware scoring with semantic units:")
print("-" * 50)

nir_with_context = "Numero de securite sociale : 1 80 03 75 123 456 20"
fr_patterns = get_patterns_for_language("fr")
nir_only = [p for p in fr_patterns if p.entity_type == "national_id"]

units = find_semantic_units(nir_with_context, nir_only)
for start, end, etype, score, pattern in units:
    print(f"  [{etype}] '{nir_with_context[start:end]}' score={score:.3f}")

In [None]:
# Demonstrate date patterns per language
print("=" * 80)
print("DATE PATTERNS BY LANGUAGE")
print("=" * 80)

date_examples = {
    "fr": ["15/01/2025", "15 janvier 2025", "1/3/2025"],
    "de": ["15.01.2025", "15 Januar 2025", "1.3.2025"],
    "it": ["15/01/2025", "15 gennaio 2025", "1/3/2025"],
    "es": ["15/01/2025", "15 de enero de 2025", "1/3/2025"],
}

for lang, examples in date_examples.items():
    date_patterns = [p for p in LANGUAGE_PII_PATTERNS[lang] if p.entity_type == "date"]
    print(f"\n{lang_names[lang]} date patterns:")
    for text in examples:
        matched = any(re.search(p.pattern, text, p.flags) for p in date_patterns)
        status = "matched" if matched else "no match"
        print(f"  '{text}' -> {status}")

---
## 9. Accent Normalization

Spanish PII models were trained on accent-free text, so "María López" would not be detected without normalization. OpenMed transparently handles this:

- **Auto-enabled for Spanish** (`lang="es"`) — accents are stripped before model inference
- **Entity positions map back** to the original accented text
- **Controllable** — disable with `normalize_accents=False` if using an accent-aware model
- **Works for any language** — enable with `normalize_accents=True` on any `lang`

In [None]:
from openmed.core.pii import _strip_accents

# Demonstrate the normalization function
print("=" * 80)
print("ACCENT NORMALIZATION")
print("=" * 80)

examples = [
    "María García López",
    "José Sánchez Fernández",
    "niño, pingüino, teléfono",
    "DNI: 12345678Z — no accents, unchanged",
]

print("\n_strip_accents() examples:")
print("-" * 50)
for text in examples:
    stripped = _strip_accents(text)
    print(f"  '{text}'")
    print(f"  → '{stripped}'  (len: {len(text)} → {len(stripped)})")
    print()

# Show that accent normalization is transparent to the user
print("=" * 80)
print("SPANISH EXTRACTION — WITH vs WITHOUT ACCENT NORMALIZATION")
print("=" * 80)

accented_text = "Paciente: María García López, teléfono: +34 612 345 678"

# Default: normalize_accents=True for Spanish
result_normalized = extract_pii(
    accented_text,
    lang="es",
    confidence_threshold=0.5,
)

print(f"\nWith normalization (default for es):")
for e in result_normalized.entities:
    print(f"  [{e.label:20s}] '{e.text}' ({e.confidence:.3f})")

# Explicitly disable normalization
result_raw = extract_pii(
    accented_text,
    lang="es",
    normalize_accents=False,
    confidence_threshold=0.5,
)

print(f"\nWithout normalization (normalize_accents=False):")
if result_raw.entities:
    for e in result_raw.entities:
        print(f"  [{e.label:20s}] '{e.text}' ({e.confidence:.3f})")
else:
    print("  (fewer entities detected — model can't match accented text)")

---
## 10. De-identification Methods with Multilingual Data

All de-identification methods (mask, replace, hash, shift_dates) work with multilingual text. The `replace` method uses language-appropriate fake data.

In [None]:
# Show language-specific fake data
from openmed.core.pii_i18n import LANGUAGE_FAKE_DATA

print("=" * 80)
print("LANGUAGE-SPECIFIC FAKE DATA FOR REPLACEMENT")
print("=" * 80)

for lang in ["en", "fr", "de", "it", "es"]:
    data = LANGUAGE_FAKE_DATA[lang]
    print(f"\n{lang_names[lang]} ({lang}):")
    print(f"  Names: {data['NAME'][:3]}")
    print(f"  Phones: {data['PHONE'][:2]}")
    print(f"  Emails: {data['EMAIL'][:2]}")
    if 'LOCATION' in data:
        print(f"  Locations: {data['LOCATION'][:2]}")

In [None]:
# Compare de-identification methods on a French note
fr_text = "Patient : Marie Dupont, telephone : 06 12 34 56 78, email : marie@email.fr"

print("=" * 80)
print("DE-IDENTIFICATION METHODS (French)")
print("=" * 80)
print(f"Original: {fr_text}\n")

for method in ["mask", "replace", "hash", "remove"]:
    result = deidentify(
        fr_text,
        lang="fr",
        method=method,
        confidence_threshold=0.5,
        use_smart_merging=True,
    )
    print(f"{method:8s}: {result.deidentified_text}")

---
## 11. Date Handling

The `lang` parameter controls date parsing and formatting:
- **English**: MM/DD/YYYY (month-first)
- **French/Italian/Spanish**: DD/MM/YYYY (day-first)
- **German**: DD.MM.YYYY (day-first, dot separator)
- **Spanish (unique)**: "15 de marzo de 2025" (with "de" connector)

In [None]:
# Date shifting with locale-aware formatting
date_texts = {
    "en": "Admission: 03/15/2025",
    "fr": "Admission : 15/03/2025",
    "de": "Aufnahme: 15.03.2025",
    "it": "Ricovero: 15/03/2025",
    "es": "Ingreso: 15/03/2025",
}

print("=" * 80)
print("DATE SHIFTING BY LANGUAGE")
print("=" * 80)

for lang, text in date_texts.items():
    result = deidentify(
        text,
        lang=lang,
        method="shift_dates",
        date_shift_days=30,
        confidence_threshold=0.3,
        use_smart_merging=True,
    )
    print(f"  {lang_names[lang]:8s}: {text:30s} -> {result.deidentified_text}")

---
## 12. Batch Processing Across Languages

Process multilingual documents by grouping them by language.

In [None]:
# Multilingual batch processing
documents = [
    {"lang": "en", "text": "Patient: John Smith, DOB: 03/15/1975, Email: john@email.com"},
    {"lang": "fr", "text": "Patient : Marie Dupont, née le 15/03/1975, Email : marie@email.fr"},
    {"lang": "de", "text": "Patient: Hans Mueller, Geburtsdatum: 15.03.1975, E-Mail: hans@email.de"},
    {"lang": "it", "text": "Paziente: Marco Rossi, data di nascita: 15/03/1975, Email: marco@email.it"},
    {"lang": "es", "text": "Paciente: María García, fecha de nacimiento: 15/03/1975, Email: maria@email.es"},
]

print("=" * 80)
print("MULTILINGUAL BATCH PROCESSING")
print("=" * 80)

for doc in documents:
    deid = deidentify(
        doc["text"],
        lang=doc["lang"],
        method="mask",
        confidence_threshold=0.5,
        use_smart_merging=True,
    )
    print(f"\n[{doc['lang']}] Original:      {doc['text']}")
    print(f"[{doc['lang']}] De-identified: {deid.deidentified_text}")

---
## 13. Custom Model Selection

While `lang` auto-selects the recommended default model (SuperClinical-Small-44M), you can choose any of the 35 architectures per language.

In [None]:
print("=" * 80)
print("AVAILABLE MODEL ARCHITECTURES PER LANGUAGE")
print("=" * 80)

# Show all architectures for Spanish as an example
es_models = get_pii_models_by_language("es")

print(f"\nSpanish models ({len(es_models)} total):")
for key, info in sorted(es_models.items(), key=lambda x: x[1].model_id):
    is_default = info.model_id == get_default_pii_model("es")
    marker = " <- DEFAULT" if is_default else ""
    print(f"  {info.size_category:8s} {info.model_id}{marker}")

print("\nTo use a specific model:")
print('  extract_pii(text, model_name="OpenMed/OpenMed-PII-Spanish-SuperClinical-Large-434M-v1")')
print("  # model_name overrides the lang default")

In [None]:
# Example: use a specific model while keeping lang for patterns
text = "Paciente: María García, DNI: 12345678Z, nacida el 15/03/1975"

# Using lang default (SuperClinical-Small-44M)
result_default = extract_pii(
    text,
    lang="es",
    confidence_threshold=0.5,
    use_smart_merging=True,
)

# Using a custom larger model
result_custom = extract_pii(
    text,
    lang="es",
    model_name="OpenMed/OpenMed-PII-Spanish-SuperClinical-Large-434M-v1",
    confidence_threshold=0.5,
    use_smart_merging=True,
)

print("Default model:")
for e in result_default.entities:
    print(f"  [{e.label:20s}] '{e.text}' ({e.confidence:.3f})")

print("\nLarger model:")
for e in result_custom.entities:
    print(f"  [{e.label:20s}] '{e.text}' ({e.confidence:.3f})")

---
## Summary

### Key Points

| Feature | How to Use |
|---------|------------|
| English PII | `extract_pii(text)` (default) |
| French PII | `extract_pii(text, lang="fr")` |
| German PII | `extract_pii(text, lang="de")` |
| Italian PII | `extract_pii(text, lang="it")` |
| Spanish PII | `extract_pii(text, lang="es")` |
| Custom model | `extract_pii(text, lang="es", model_name="...")` |
| De-identify | `deidentify(text, lang="es", method="mask")` |
| Browse models | `get_pii_models_by_language("es")` |
| Disable accent norm | `extract_pii(text, lang="es", normalize_accents=False)` |

### Architecture

- **176+ models** across 5 languages (35 architectures each)
- **Language-specific patterns** for dates, phones, national IDs, addresses
- **National ID validators** with checksum verification (NIR, Steuer-ID, Codice Fiscale, DNI, NIE)
- **Language-specific fake data** for the `replace` de-identification method
- **Locale-aware date handling** (day-first for European languages, "de" connector for Spanish)
- **Accent normalization** for models trained on accent-free text (auto-enabled for Spanish)
- **Backward compatible** — all existing English code works unchanged

### Resources

- [HuggingFace Collection — Multilingual](https://huggingface.co/collections/OpenMed/multilingual-pii-and-de-identification)
- [HuggingFace Collection — Spanish](https://huggingface.co/collections/OpenMed/spanish-pii-and-de-identification)
- [OpenMed GitHub](https://github.com/maziyarpanahi/openmed)
- [PII Detection Complete Guide](./PII_Detection_Complete_Guide.ipynb) (English-focused)

---

**Version:** OpenMed v0.5.6+

**Last Updated:** 2026-02-18