Binlog-ETL is a system designed for synchronising the data in MySQL to HIVE data warehouse on Hadoop. The best feature is that it can keep
the latest snapshot of MySQL tables in the DW while MySQL databases are updating.
#Features
- Create HIVE tables and partitions
- Parse MySQL binary logs and extract updates
- Upload data to HDFS
- Support multi-processes
- Support MySQL sharding databases and tables
- Keep the latest snapshot and remove duplicates
- The above features are all automatical
#Requriements
- MySQL version 5.6 or newer
- Enable binary log in my.cnf
#Data Flow The Binlog-ETL system lays between MySQL cluster and HIVE data warehouse. It requests bin-logs from MySQL cluster initiativly and creates snapshots automatically.
- Step 1, request latest bin-logs and download them to local disks
- Step 2, parse bin-logs and transform the data to target format
- Step 3, upload the new data to HDFS. The data are the latest updates of MySQL tables
- Step 4, merge the new data with the old snapshot, keep the latest updates and remove the duplicates.
- Step 5, Create new a snapshot and end.