A robust and secure solution for creating anonymized copies of PostgreSQL databases while maintaining data integrity and referential consistency. This tool is designed to help organizations comply with data protection regulations while providing realistic test data.
- Flexible rule-based anonymization configuration using YAML
- Support for multiple masking functions and strategies
- Preserves database structure and relationships
- Docker-based implementation for portability
- Automated cleanup and resource management
- Configurable output formats and locations
- Built-in validation and error handling
- Docker and Docker Compose
- PostgreSQL client tools (
psql,pg_dump) - Bash shell environment
- Node.js and npm (for testing/development)
lsofcommand-line utility
- Clone the repository:
git clone https://github.com/yourusername/db-anonymizer.git
cd db-anonymizer- Install Node.js dependencies (for development/testing):
npm install- Make the scripts executable:
chmod +x *.sh
chmod +x lib/*.sh.
├── anonymizer.sh # Main script
├── lib/
│ ├── rules_processor.sh # Rules processing logic
│ ├── sql_generator.sh # SQL generation utilities
│ └── utils.sh # Common utility functions
├── config/
│ ├── database/ # Database connection configs
│ └── rules/ # Anonymization rules
├── dumps/ # Output directory for anonymized dumps
├── docker-compose.yml # Development environment setup
└── examples/ # Example implementations and test data
Create a YAML file in config/database/ with your database connection details:
host: your-database-host
port: 5432
database: your-database-name
user: your-username
password: your-password
schemas:
- public
rules:
- rule-set-nameCreate YAML files in config/rules/ to define anonymization rules:
table: users
columns:
- email
- first_name
- last_name
- password_hash
mask_functions:
email: anon.random_email()
first_name: anon.fake_first_name()
last_name: anon.fake_last_name()
password_hash: anon.hash(password_hash)- Run the anonymizer with a specific database configuration:
./anonymizer.sh -d config/database/your-config.yml- The anonymized dump will be created in the
dumps/directory with a timestamp:
dumps/20250102_123456_your-database_dump.sql
anon.random_email(): Generates random email addressesanon.fake_first_name(): Generates random first namesanon.fake_last_name(): Generates random last namesanon.hash(): Creates consistent hashesanon.random_string(): Generates random stringsanon.random_date(): Generates random datesanon.random_int(): Generates random integersanon.mask_credit_card(): Masks credit card numbersanon.mask_phone(): Masks phone numbersanon.mask_address(): Masks addresses
A development environment with PostgreSQL and pgAdmin is provided:
- Start the environment:
docker-compose up -d- Access pgAdmin:
- URL: http://localhost:8080
- Email: admin@example.com
- Password: admin
- Generate test data using the example implementation:
cd examples
npm install
node mock_db.js- All passwords and sensitive data are handled securely
- Temporary files are cleaned up automatically
- Docker containers are isolated and removed after use
- Original database connection details are never exposed
- Masked data maintains referential integrity
The tool includes comprehensive error handling:
- Validates all configuration files
- Checks for required dependencies
- Verifies database connections
- Ensures proper cleanup on failure
- Provides detailed error messages
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- PostgreSQL Anonymizer project
- Faker.js for test data generation
- Contributors and maintainers
For support, please open an issue on the GitHub repository or contact the maintainers.
Made with 🖤 by kur0