This document aims to provide learning resources to help in training for an advanced level. This list is not exhaustive and is simply to help learning some of the core concepts we have around data engineering for that level. We have given a variety of resources from articles to online courses to help with progressing towards completing these learning objectives. We have also put at the end optional certifications you can pursue to concrete your knowledge. Any comments, feedback or reports of missing/broken links please slack the cop-data channel.
If you enjoyed using these learning paths or have feedback, please use this feedback form
Develops optimal collection and data strategies, applying different capture techniques, determining types of data to optimise processes
Designing Data Intensive Applications - Martin Kleppmann (book)
Designing distributed systems - Brendan Burns (book)
How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh (article)
Data Mesh by - Zhamak Dehghani (book)
Devises and implements a robust data governance strategy, advising senior stakeholders on managing availability, usability, integrity and security of data in enterprise systems, based on internal data standards and policies.
Introduction to Governance (article)
Data Governance - The Definitive Guide - Evren Eryurek (book)
Multi-Cloud Architecture and Governance - Jeroen Mulder (book)
Data Governance Fundmentals (course)
Designs and implements efficient data transformation processes at scale, using secure data sharing tools (e.g. Snowflake, Databricks, AirFlow), and dimension modelling techniques.
Data Engineering on Azure - Vlad Riscutia (book)
PySpark & AWS: Master Big Data With PySpark and AWS (course)
What is dimensional Modelling (article)
Definitive Guide to Dimensional Modelling - Ralph Kimball (book)
Displays strong leadership qualities, empowering, supporting, and inspiring the team to strive to deliver data projects in challenging environments
Systems of Engineering Management - Will Larson (book)
Thinking Fast and Slow - Daniel Kahneman (book)
The Five Dysfunctions of a Team - Patrick M. Lencioni (book)
The Manager's Path - Camille Fournier (book)
The Phoenix Project - Gene Kim, Kevin Behr, George Spafford (book)
The Practical Leadership (course)
ML Fundamentals - Regression and Classification (course)
Introduction to Drift (article)
Post Production Monitoring (article)
Compares and evaluates data management solutions, including the required tooling, to ensure proposed solutions are matched to client needs
Pitching to stakeholders (article)
Please refer to data 102 for additional supplementary certifications