Skip to content

shiyis/data-labs

Repository files navigation

data-engineering-things

This repo holds all the aws data engineering practices and general data pipeline tutorials I have done. This only holds the submodule mapping to the repos that contain the actual content of these exercises.

aws-sam-cicd is a simple aws data pipeline that streams, validates, and loads tweets.

twitter-archive is a github action workflow that retrieves tweet using YAML configuration.

terraform-labs data engineering schema config with terraform hcl.

dra-data open source data collection with github action flat and manifest file.

pyspark-etl-example a pyspark etl example that extracts, transforms, and loads dummy data.

yelp-to-xml a small data collection app/lab of yelp reviews; converted to xml, cleaned, wrangled and managed.

About

This repo contains data collection, wrangling, modeling, and engineering practices.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors