This repo holds all the aws data engineering practices and general data pipeline tutorials I have done. This only holds the submodule mapping to the repos that contain the actual content of these exercises.
aws-sam-cicd is a simple aws data pipeline that streams, validates, and loads tweets.
twitter-archive is a github action workflow that retrieves tweet using YAML configuration.
terraform-labs data engineering schema config with terraform hcl.
dra-data open source data collection with github action flat and manifest file.
pyspark-etl-example a pyspark etl example that extracts, transforms, and loads dummy data.
yelp-to-xml a small data collection app/lab of yelp reviews; converted to xml, cleaned, wrangled and managed.