A repository containing data engineering and data analysis projects.
- Created a Postgre database schema and ETL pipeline to perform analysis on song play.
- Schema has been optimized to perform the above analysis.
- Performed analysis on the Super Bowl data to extract insights such as point distribution, viewership and ads distribution etc.
- Created Apache Cassandra database schema, as a part of ETL pipeline
- The purpose of this project is to understand the US immigration trends.
- Built a data pipeline using Spark.
- Built an ETL pipeline that extracts their data from S3, processes them using Spark, and loads the data back into S3 as a set of dimensional tables.
- Built a data warehouse ETL pipelines using Apache Airflow.
- Built an ETL pipeline that extracts their data from S3, stages them in Redshift, and transforms data into a set of dimensional tables to find insights about what songs their users are listening to.
Updates in progress...