Skip to content

v0.5.0

Choose a tag to compare

@sgasser sgasser released this 22 Jun 10:44
· 20 commits to main since this release
08ddb1d

Replaces Microsoft Presidio with a self-hosted open-source PII detector.

Changes:

  • new in-repo detector (detector/, Python/FastAPI, /analyze): multilingual GLiNER NER plus a deterministic regex/checksum layer for email, phone, IBAN, credit card, and IP (#100)
  • adds VAT_CODE (EU VAT, checksum-validated); matches EU member-state prefixes only, case-insensitive and overlap-safe so a label or word can't hide a valid number
  • /api/mask now detects secrets before PII, so a connection string is no longer partly masked as an email
  • language-agnostic detection — no per-language images and no spaCy models to load
  • single all-in-one GHCR image (proxy + detector); the per-language :en/:eu tags are gone
  • publishes GHCR images for latest and 0.5.0 (linux/amd64, linux/arm64)

Breaking:

  • config key presidio_url is renamed to detector_url; the languages list is now only a hint (detection is language-agnostic)

Validation:

  • CI passed on main (proxy: bun test / typecheck / biome; detector: pytest / ruff / pyright)
  • release workflow builds and pushes the multi-arch all-in-one image