v0.5.0
Replaces Microsoft Presidio with a self-hosted open-source PII detector.
Changes:
- new in-repo detector (
detector/, Python/FastAPI,/analyze): multilingual GLiNER NER plus a deterministic regex/checksum layer for email, phone, IBAN, credit card, and IP (#100) - adds
VAT_CODE(EU VAT, checksum-validated); matches EU member-state prefixes only, case-insensitive and overlap-safe so a label or word can't hide a valid number /api/masknow detects secrets before PII, so a connection string is no longer partly masked as an email- language-agnostic detection — no per-language images and no spaCy models to load
- single all-in-one GHCR image (proxy + detector); the per-language
:en/:eutags are gone - publishes GHCR images for
latestand0.5.0(linux/amd64, linux/arm64)
Breaking:
- config key
presidio_urlis renamed todetector_url; thelanguageslist is now only a hint (detection is language-agnostic)
Validation:
- CI passed on main (proxy: bun test / typecheck / biome; detector: pytest / ruff / pyright)
- release workflow builds and pushes the multi-arch all-in-one image