My study material to the Data Engineering class by DataTalks.Club.
- Course overview
- Introduction to GCP
- Docker and docker-compose
- Running Postgres locally with Docker
- Setting up infrastructure on GCP with Terraform
- Preparing the environment for the course
- Homework
- Reading from apis
- Building scalable pipelines
- Normalising data
- Incremental loading
- Homework
- Data Warehouse
- BigQuery
- Partitioning and clustering
- BigQuery best practices
- Internals of BigQuery
- BigQuery Machine Learning
- Basics of analytics engineering
- dbt (data build tool)
- BigQuery and dbt
- Postgres and dbt
- dbt models
- Testing and documenting
- Deployment to the cloud and locally
- Visualizing the data with google data studio and metabase
- Batch processing
- What is Spark
- Spark Dataframes
- Spark SQL
- Internals: GroupBy and joins
Google Cloud Platform (GCP): Cloud-based auto-scaling platform by GoogleGoogle Cloud Storage (GCS): Data LakeBigQuery: Data Warehouse
Terraform: Infrastructure-as-Code (IaC)Docker: ContainerizationSQL: Data Analysis & ExplorationMage: Workflow Orchestrationdbt: Data TransformationSpark: Distributed ProcessingKafka: Streaming
- Docker and docker-compose
- Python3
- Google Cloud SDK
- Terraform
