Skip to content

noops-studio/go-sql-sync

Repository files navigation

MySQL to PostgreSQL Migration Operator

A comprehensive, production-ready MySQL to PostgreSQL migration operator that handles schema translation, initial snapshot migration, continuous CDC replication, and data verification with row-level hashing.

Features

  • Schema Discovery & Translation: Automatically discovers MySQL schema and translates to PostgreSQL DDL
  • Initial Snapshot: Fast, parallel, consistent snapshot migration with chunking
  • Continuous CDC: Real-time replication via MySQL binlog with ROW format
  • Row Hashing: Per-row and per-chunk hashing for fast drift detection
  • Incremental Repair: Precise repair of only mismatched data without full rescans
  • DDL Handling: Safe DDL change handling with pause/approve/resume mechanics
  • Resumable: Handles failures gracefully with exact resume points
  • Observable: Comprehensive logging and status reporting

Architecture

The operator consists of several key components:

  • Operator: Main orchestration and CLI interface
  • MySQL Connection: Schema discovery and binlog reading
  • PostgreSQL Connection: Schema application and data writing
  • Hash Manager: Row hashing and chunk verification
  • Snapshot Manager: Initial bulk migration
  • CDC Manager: Continuous change capture and apply
  • Verifier: Data consistency verification and repair

Quick Start

1. Configuration

Create a config.yaml file:

mysql:
  dsn: "user:password@tcp(localhost:3306)/source_db?charset=utf8mb4&parseTime=true&loc=UTC"
  binlog_format: "ROW"
  binlog_row_image: "FULL"

postgresql:
  dsn: "postgres://user:password@localhost:5432/target_db?sslmode=disable"

performance:
  snapshot_concurrency: 4
  chunk_size: 10000
  batch_size: 1000

web:
  port: 8080
  host: "0.0.0.0"
  static_path: "./web/static"

2. Build

make build

3. Run

Web Interface (Recommended)

# Start the web interface
./bin/mysql2pg web

# Open your browser to http://localhost:8080

Command Line Interface

# Translate schema
./bin/mysql2pg translate

# Apply schema to PostgreSQL
./bin/mysql2pg apply-schema

# Perform initial snapshot
./bin/mysql2pg snapshot

# Start continuous CDC
./bin/mysql2pg start-cdc

# Verify data consistency
./bin/mysql2pg verify

# Check status
./bin/mysql2pg status

Commands

Web Interface

  • web: Start the web interface for visual management

Command Line Interface

  • translate: Translate MySQL schema to PostgreSQL DDL
  • apply-schema: Apply translated schema to PostgreSQL
  • snapshot: Perform initial snapshot migration
  • start-cdc: Start continuous CDC replication
  • verify: Verify data consistency
  • repair: Repair data inconsistencies
  • ddl-approve: Approve pending DDL changes
  • cutover: Perform cutover to PostgreSQL
  • status: Show operator status

Configuration

MySQL Settings

  • dsn: MySQL connection string
  • binlog_format: Must be "ROW" for CDC
  • binlog_row_image: Must be "FULL" for CDC
  • server_id: Unique server ID for replication

PostgreSQL Settings

  • dsn: PostgreSQL connection string

Performance Settings

  • snapshot_concurrency: Number of parallel table snapshots
  • chunk_size: Rows per chunk during snapshot
  • batch_size: Rows per batch during CDC apply
  • apply_concurrency: Number of parallel table appliers

Safety Settings

  • stop_on_ddl: Pause on DDL changes
  • stop_on_mismatch: Pause on data mismatches
  • drift_tolerance: Number of mismatched chunks before pausing

Hashing Settings

  • chunk_buckets: Number of hash buckets per table (default: 4096)
  • hash_schema: Schema name for hash tables
  • rollup_refresh: How often to refresh chunk rollups

Web Interface

The operator includes a modern web interface for easy management and monitoring:

Features

  • Real-time Dashboard: Live status updates and replication monitoring
  • Visual Controls: Click-to-execute operations for all migration tasks
  • Live Logs: Real-time log streaming with WebSocket updates
  • Status Monitoring: Visual representation of table and replication status
  • Responsive Design: Works on desktop and mobile devices

Getting Started

# Start the web interface
./bin/mysql2pg web

# Access at http://localhost:8080

Configuration Interface

The web interface includes a comprehensive configuration panel where you can:

  1. Set Database Connections:

    • MySQL DSN: user:password@tcp(host:port)/database?charset=utf8mb4&parseTime=true&loc=UTC
    • PostgreSQL DSN: postgres://user:password@host:port/database?sslmode=disable
  2. Test Connections: Verify database connectivity before saving configuration

  3. Performance Tuning: Adjust concurrency, chunk sizes, and batch sizes

  4. Safety Settings: Configure DDL handling and drift tolerance

  5. Save Configuration: Store settings for future use

Web Interface Operations

  • Configuration Management: Set database connections, performance, and safety settings
  • Connection Testing: Test MySQL and PostgreSQL connections before saving
  • Schema Translation: Translate and apply schema with one click
  • Snapshot Migration: Control initial bulk migration
  • CDC Management: Start/stop continuous replication
  • Data Operations: Verify consistency and repair issues
  • Workflow Management: DDL approval and cutover processes
  • Real-time Monitoring: Live status updates and logging

Data Flow

  1. Schema Discovery: Operator discovers MySQL schema and translates to PostgreSQL
  2. Initial Snapshot: Parallel table migration with chunking and row hashing
  3. CDC Capture: Continuous binlog reading from snapshot GTID point
  4. CDC Apply: Transactional application of changes to PostgreSQL
  5. Verification: Continuous chunk-level verification using hashes
  6. Repair: Incremental repair of only mismatched data

Hash Strategy

The operator maintains a sophisticated hashing system:

  • Per-row hashes: SHA-256 of canonicalized row data
  • Chunk rollups: Aggregated hashes for efficient verification
  • Canonicalization: Stable string representation across databases
  • Versioning: Monotonic version clocks for conflict resolution

Safety Features

  • Consistent snapshots: GTID-based consistency points
  • Idempotent operations: Safe restart and recovery
  • DDL approval: Manual approval for schema changes
  • Drift detection: Automatic detection of data inconsistencies
  • Rollback capability: Ability to revert to previous states

Performance Optimizations

  • Parallel processing: Concurrent table and chunk processing
  • Bulk operations: Efficient batch inserts and updates
  • Chunked verification: Only verify changed chunks
  • Lazy rollup refresh: Debounced hash rollup updates
  • Connection pooling: Efficient database connection management

Monitoring & Observability

  • Structured logging: JSON-formatted logs with context
  • Metrics: Performance and consistency metrics
  • Status reporting: Real-time operator status
  • Event logging: Persistent change event log
  • Health checks: Database connection and replication health

Troubleshooting

Common Issues

  1. Binlog format: Ensure MySQL uses ROW format
  2. Permissions: MySQL user needs REPLICATION SLAVE and REPLICATION CLIENT
  3. Network: Verify connectivity between operator and databases
  4. Disk space: Ensure sufficient space for event logs and hash tables

Recovery Procedures

  1. Restart operator: Automatically resumes from last known position
  2. Verify consistency: Use verify command to check data integrity
  3. Repair data: Use repair commands to fix inconsistencies
  4. Check logs: Review operator and database logs for errors

Development

Prerequisites

  • Go 1.21+
  • MySQL 5.7+ with binlog enabled
  • PostgreSQL 12+

Building

make install    # Install dependencies
make build      # Build binary
make test       # Run tests
make lint       # Run linting

Testing

# Unit tests
go test ./...

# Integration tests (requires test databases)
go test -tags=integration ./...

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

License

MIT License - see LICENSE file for details.

Support

For issues and questions:

  • Create an issue in the repository
  • Check the troubleshooting section
  • Review the configuration examples

Roadmap

  • Enhanced DDL handling
  • Performance benchmarking tools
  • Kubernetes operator
  • Cloud-native deployment
  • Advanced monitoring dashboards
  • Multi-database support

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published