Skip to content

orlevit/Data_Engineering_with_AWS

Repository files navigation

AWS Data Engineering with Udacity

This repository contains course details and projects from the AWS Data Engineering program on Udacity. This program focuses on designing data models, building data warehouses and data lakes, automating data pipelines, and managing massive datasets using AWS tools.

Table of Contents

Course Overview

  • Data Modeling with Apache Cassandra

    • Model event data for a non-relational database.
    • Build an ETL pipeline for a music streaming app using Apache Cassandra.
    • Define queries and tables.
  • Data Warehouse

    • Act as a data engineer for a streaming music service.
    • Build an ELT pipeline extracting data from S3, staging it in Redshift, and transforming it into dimensional tables.
    • Provide insights into what songs users are listening to.
  • Data Lakehouse with AWS

    • Develop a data lakehouse solution for sensor data.
    • Build an ELT pipeline for lakehouse architecture using Spark and AWS Glue.
    • Process data into analytics tables and load them back into the lakehouse.
  • Automate Data Pipelines

    • Learn data pipeline concepts and their application in data engineering.
    • Use Apache Airflow for creating and managing data pipelines.
    • Cover concepts including data validation, DAGs, data quality, and more.
    • Implement and put data pipelines into production.

Tools and Technologies

  • AWS Tools: Redshift, S3, Glue, and more.
  • Open-Source Tools: Apache Cassandra, Apache Airflow.
  • Languages and Frameworks: Spark.

Projects

  1. Data Modeling with Apache Cassandra

    • Create a non-relational database for a music streaming app.
    • Define queries and tables using Apache Cassandra.
  2. Data Warehouse

    • Build an ELT pipeline for a streaming music service.
    • Extract data from S3, stage in Redshift, and transform into dimensional tables.
  3. Data Lakehouse with AWS

    • Develop a data lakehouse solution for sensor data.
    • Use Spark and AWS Glue for processing and analytics.
  4. Automate Data Pipelines

    • Create and manage data pipelines using Apache Airflow.
    • Implement data validation, DAGs, and data quality concepts.
    • Extend Airflow with plugins and refactor DAGs for production.

Certificate

About

Data Engineering with AWS course in Udacity

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published