A comprehensive, production-ready MySQL to PostgreSQL migration operator that handles schema translation, initial snapshot migration, continuous CDC replication, and data verification with row-level hashing.
- Schema Discovery & Translation: Automatically discovers MySQL schema and translates to PostgreSQL DDL
- Initial Snapshot: Fast, parallel, consistent snapshot migration with chunking
- Continuous CDC: Real-time replication via MySQL binlog with ROW format
- Row Hashing: Per-row and per-chunk hashing for fast drift detection
- Incremental Repair: Precise repair of only mismatched data without full rescans
- DDL Handling: Safe DDL change handling with pause/approve/resume mechanics
- Resumable: Handles failures gracefully with exact resume points
- Observable: Comprehensive logging and status reporting
The operator consists of several key components:
- Operator: Main orchestration and CLI interface
- MySQL Connection: Schema discovery and binlog reading
- PostgreSQL Connection: Schema application and data writing
- Hash Manager: Row hashing and chunk verification
- Snapshot Manager: Initial bulk migration
- CDC Manager: Continuous change capture and apply
- Verifier: Data consistency verification and repair
Create a config.yaml
file:
mysql:
dsn: "user:password@tcp(localhost:3306)/source_db?charset=utf8mb4&parseTime=true&loc=UTC"
binlog_format: "ROW"
binlog_row_image: "FULL"
postgresql:
dsn: "postgres://user:password@localhost:5432/target_db?sslmode=disable"
performance:
snapshot_concurrency: 4
chunk_size: 10000
batch_size: 1000
web:
port: 8080
host: "0.0.0.0"
static_path: "./web/static"
make build
# Start the web interface
./bin/mysql2pg web
# Open your browser to http://localhost:8080
# Translate schema
./bin/mysql2pg translate
# Apply schema to PostgreSQL
./bin/mysql2pg apply-schema
# Perform initial snapshot
./bin/mysql2pg snapshot
# Start continuous CDC
./bin/mysql2pg start-cdc
# Verify data consistency
./bin/mysql2pg verify
# Check status
./bin/mysql2pg status
web
: Start the web interface for visual management
translate
: Translate MySQL schema to PostgreSQL DDLapply-schema
: Apply translated schema to PostgreSQLsnapshot
: Perform initial snapshot migrationstart-cdc
: Start continuous CDC replicationverify
: Verify data consistencyrepair
: Repair data inconsistenciesddl-approve
: Approve pending DDL changescutover
: Perform cutover to PostgreSQLstatus
: Show operator status
dsn
: MySQL connection stringbinlog_format
: Must be "ROW" for CDCbinlog_row_image
: Must be "FULL" for CDCserver_id
: Unique server ID for replication
dsn
: PostgreSQL connection string
snapshot_concurrency
: Number of parallel table snapshotschunk_size
: Rows per chunk during snapshotbatch_size
: Rows per batch during CDC applyapply_concurrency
: Number of parallel table appliers
stop_on_ddl
: Pause on DDL changesstop_on_mismatch
: Pause on data mismatchesdrift_tolerance
: Number of mismatched chunks before pausing
chunk_buckets
: Number of hash buckets per table (default: 4096)hash_schema
: Schema name for hash tablesrollup_refresh
: How often to refresh chunk rollups
The operator includes a modern web interface for easy management and monitoring:
- Real-time Dashboard: Live status updates and replication monitoring
- Visual Controls: Click-to-execute operations for all migration tasks
- Live Logs: Real-time log streaming with WebSocket updates
- Status Monitoring: Visual representation of table and replication status
- Responsive Design: Works on desktop and mobile devices
# Start the web interface
./bin/mysql2pg web
# Access at http://localhost:8080
The web interface includes a comprehensive configuration panel where you can:
-
Set Database Connections:
- MySQL DSN:
user:password@tcp(host:port)/database?charset=utf8mb4&parseTime=true&loc=UTC
- PostgreSQL DSN:
postgres://user:password@host:port/database?sslmode=disable
- MySQL DSN:
-
Test Connections: Verify database connectivity before saving configuration
-
Performance Tuning: Adjust concurrency, chunk sizes, and batch sizes
-
Safety Settings: Configure DDL handling and drift tolerance
-
Save Configuration: Store settings for future use
- Configuration Management: Set database connections, performance, and safety settings
- Connection Testing: Test MySQL and PostgreSQL connections before saving
- Schema Translation: Translate and apply schema with one click
- Snapshot Migration: Control initial bulk migration
- CDC Management: Start/stop continuous replication
- Data Operations: Verify consistency and repair issues
- Workflow Management: DDL approval and cutover processes
- Real-time Monitoring: Live status updates and logging
- Schema Discovery: Operator discovers MySQL schema and translates to PostgreSQL
- Initial Snapshot: Parallel table migration with chunking and row hashing
- CDC Capture: Continuous binlog reading from snapshot GTID point
- CDC Apply: Transactional application of changes to PostgreSQL
- Verification: Continuous chunk-level verification using hashes
- Repair: Incremental repair of only mismatched data
The operator maintains a sophisticated hashing system:
- Per-row hashes: SHA-256 of canonicalized row data
- Chunk rollups: Aggregated hashes for efficient verification
- Canonicalization: Stable string representation across databases
- Versioning: Monotonic version clocks for conflict resolution
- Consistent snapshots: GTID-based consistency points
- Idempotent operations: Safe restart and recovery
- DDL approval: Manual approval for schema changes
- Drift detection: Automatic detection of data inconsistencies
- Rollback capability: Ability to revert to previous states
- Parallel processing: Concurrent table and chunk processing
- Bulk operations: Efficient batch inserts and updates
- Chunked verification: Only verify changed chunks
- Lazy rollup refresh: Debounced hash rollup updates
- Connection pooling: Efficient database connection management
- Structured logging: JSON-formatted logs with context
- Metrics: Performance and consistency metrics
- Status reporting: Real-time operator status
- Event logging: Persistent change event log
- Health checks: Database connection and replication health
- Binlog format: Ensure MySQL uses ROW format
- Permissions: MySQL user needs REPLICATION SLAVE and REPLICATION CLIENT
- Network: Verify connectivity between operator and databases
- Disk space: Ensure sufficient space for event logs and hash tables
- Restart operator: Automatically resumes from last known position
- Verify consistency: Use
verify
command to check data integrity - Repair data: Use
repair
commands to fix inconsistencies - Check logs: Review operator and database logs for errors
- Go 1.21+
- MySQL 5.7+ with binlog enabled
- PostgreSQL 12+
make install # Install dependencies
make build # Build binary
make test # Run tests
make lint # Run linting
# Unit tests
go test ./...
# Integration tests (requires test databases)
go test -tags=integration ./...
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
MIT License - see LICENSE file for details.
For issues and questions:
- Create an issue in the repository
- Check the troubleshooting section
- Review the configuration examples
- Enhanced DDL handling
- Performance benchmarking tools
- Kubernetes operator
- Cloud-native deployment
- Advanced monitoring dashboards
- Multi-database support