Trigger a Glue crawler and Glue ETL job every time a file is uploaded in an S3 bucket including SNS email notifications
Alot of times, data engineering teams spend a considerable amount of time on routine and repeatitive tasks. In this project, we are attempting to remedy this We set up Glue crawlers that run every time a file is added to a given S3 bucket. The crawler crawls and adds the new file/data to the Meta data catalogue. We create new tables or append to exiting ones and make the data available for querrying with Athena and Redshift spectrum We also run a Glue Extrat-Transform-Load (ETL) in Glue studio to clean the data before uploading it into data catalog tables
- S3
- Glue
- Simple Notification Services (SNS)
- EventBridge
- Lambda
- Athena
- Set the S3 path dynamically so that crawler only goes through the folder where the new folder is instead of crawling the entire bucket
- Include crawler name in the EventsBridge rules
- Improve the formart of the message to SNS from Lambda