Automatically detects and anonymises PII in any PostgreSQL database. Auto-discovers schemas, detects PII columns by name patterns, and replaces real data with realistic fake data while maintaining referential integrity.
- Auto-discovery — scans all schemas automatically, no config needed
- Smart PII detection — matches 60+ column name patterns (names, addresses, phones, emails, banking, tax IDs, passwords, and more)
- Referential integrity — same original value always maps to the same fake value across all tables
- Safe by default —
--dry-runto preview, option to duplicate the database before anonymising - Before/after comparison —
--sampleflag shows what changed - False positive protection — skips common non-PII columns like
table_name,hostname,filename, etc.
pip install psycopg2-binary faker
# Preview what would be anonymised
python pgghost.py --dry-run --verbose
# Live run with before/after sample
python pgghost.py --verbose --sample
# Just run it
python pgghost.py| Flag | Description |
|---|---|
--dry-run |
Preview changes without modifying data |
--verbose |
Detailed progress output |
--sample |
Snapshot rows and show before/after comparison |
On startup, pgGhost will:
- Ask for host, port, username, password
- List all available databases and let you pick one
- Offer to duplicate the database before anonymising (safe mode)
- Auto-discover all schemas and scan for PII
| Category | Example columns |
|---|---|
| Names | first_name, surname, company_name, contact_name |
| Addresses | address1, street, suburb, city, postcode, zip_code |
| Contact | phone, mobile, fax, email, email_address |
| Banking | bank_account, bsb, sort_code, iban, swift |
| Identity | abn, acn, ssn, tfn, passport, drivers_licence |
| Auth | password, api_key, token, username |
| Network | ip_address, url, website |
Without the duplicate option, this overwrites data in-place. Use --dry-run first, and consider duplicating the database when prompted.
MIT