Change-Data-Capture

A data engineering project that uses AWS Tools to migrate the data changes from MySQL instance to a target S3 bucket

Project Architecture

The below explained mechanism remains the same for the case of full load (initial migration) as well as replication ongoing.

The data comes from the local MySQL instance. I'm using AWS Data Migration Service (DMS) to migrate the data from the MySQL instance to a source S3 bucket.
As soon as there is a change in source S3 bucket content, it triggers an AWS Lambda function, which in turn triggers an AWS Glue job.
This Glue job reads the data from the source S3 bucket and transforms the data using Pyspark and finally stores the data in a target S3 bucket as one file.

The below diagram shows the project architecture.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
cdc_architecture.JPG		cdc_architecture.JPG
glue_job.py		glue_job.py
lambda_function.py		lambda_function.py