Skip to content

nenalukic/de-zoom-camp

Repository files navigation

Data Engineering Zoomcamp 2024

Orginised by DataTalks.Club. DEZoomcamp GitHub repository.

Syllabus

NYC TLC data.

Module 1: Containerization and Infrastructure as Code

  • Course overview
  • Introduction to GCP
  • Docker and docker-compose
  • Running Postgres locally with Docker
  • Setting up infrastructure on GCP with Terraform
  • Preparing the environment for the course
  • Homework

Module 2: Workflow Orchestration

  • Data Lake
  • Workflow orchestration
  • Workflow orchestration with Mage
  • Homework

Workshop 1: Data Ingestion

Module 3: Data Warehouse

  • Data Warehouse
  • BigQuery
  • Partitioning and clustering
  • BigQuery best practices
  • Internals of BigQuery
  • BigQuery Machine Learning

Module 4: Analytics engineering

  • Basics of analytics engineering
  • dbt (data build tool)
  • BigQuery and dbt
  • Postgres and dbt
  • dbt models
  • Testing and documenting
  • Deployment to the cloud and locally
  • Visualizing the data with google data studio and metabase

Module 5: Batch processing

  • Batch processing
  • What is Spark
  • Spark Dataframes
  • Spark SQL
  • Internals: GroupBy and joins

Module 6: Streaming

  • Introduction to Kafka
  • Schemas (avro)
  • Kafka Streams
  • Kafka Connect and KSQL

Workshop 2: Stream Processing with SQL

Project

  • Putting everything we learned to practice

  • Week 1 and 2: working on your project Week 3: reviewing your peers

curriculum

Course UI

Alternatively, you can access this course using the provided UI app, the app provides a user-friendly interface for navigating through the course material.

Visit the following link: DE Zoomcamp UI

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published