Skip to content

ompster/PGghost

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

pgGhost 👻

Automatically detects and anonymises PII in any PostgreSQL database. Auto-discovers schemas, detects PII columns by name patterns, and replaces real data with realistic fake data while maintaining referential integrity.

Features

  • Auto-discovery — scans all schemas automatically, no config needed
  • Smart PII detection — matches 60+ column name patterns (names, addresses, phones, emails, banking, tax IDs, passwords, and more)
  • Referential integrity — same original value always maps to the same fake value across all tables
  • Safe by default--dry-run to preview, option to duplicate the database before anonymising
  • Before/after comparison--sample flag shows what changed
  • False positive protection — skips common non-PII columns like table_name, hostname, filename, etc.

Requirements

pip install psycopg2-binary faker

Usage

# Preview what would be anonymised
python pgghost.py --dry-run --verbose

# Live run with before/after sample
python pgghost.py --verbose --sample

# Just run it
python pgghost.py

Flags

Flag Description
--dry-run Preview changes without modifying data
--verbose Detailed progress output
--sample Snapshot rows and show before/after comparison

Interactive Prompts

On startup, pgGhost will:

  1. Ask for host, port, username, password
  2. List all available databases and let you pick one
  3. Offer to duplicate the database before anonymising (safe mode)
  4. Auto-discover all schemas and scan for PII

What it detects

Category Example columns
Names first_name, surname, company_name, contact_name
Addresses address1, street, suburb, city, postcode, zip_code
Contact phone, mobile, fax, email, email_address
Banking bank_account, bsb, sort_code, iban, swift
Identity abn, acn, ssn, tfn, passport, drivers_licence
Auth password, api_key, token, username
Network ip_address, url, website

⚠️ Warning

Without the duplicate option, this overwrites data in-place. Use --dry-run first, and consider duplicating the database when prompted.

License

MIT

About

Anonymize user data in Postgres databases

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages