Skip to content

kingyiusuen/udacity-data-engineering-nanodegree

Repository files navigation

Udacity Data Engineering with AWS Nanodegree

This repository stores the projects that I completed in Udacity's Data Engineering with AWS Nanodegree Program.

  1. Data modeling with Apache Cassandra
    • Built an ETL pipeline that loaded data from CSV files to a non-relational database built using Apache Cassandra.
  2. Data Warehouses
    • Built an ELT pipeline that loaded data from S3 buckets, staged it in Redshift, and transformed it into a set of dimensional tables.
  3. Spark and Data Lakes
    • Built an ELT pipeline that loaded data from an AWS S3 data lake, processed data into analytics tables using Spark and AWS Glue, and loaded them back into lakehouse architecture.
  4. Data Pipeline with Airflow
    • Built an ETL pipeline with Airflow that moved JSON logs of user activity from S3 and processed it in Redshift.