This repository stores the projects that I completed in Udacity's Data Engineering with AWS Nanodegree Program.
- Data modeling with Apache Cassandra
- Built an ETL pipeline that loaded data from CSV files to a non-relational database built using Apache Cassandra.
- Data Warehouses
- Built an ELT pipeline that loaded data from S3 buckets, staged it in Redshift, and transformed it into a set of dimensional tables.
- Spark and Data Lakes
- Built an ELT pipeline that loaded data from an AWS S3 data lake, processed data into analytics tables using Spark and AWS Glue, and loaded them back into lakehouse architecture.
- Data Pipeline with Airflow
- Built an ETL pipeline with Airflow that moved JSON logs of user activity from S3 and processed it in Redshift.