Skip to content

Latest commit

 

History

History
84 lines (42 loc) · 6.9 KB

data_103.md

File metadata and controls

84 lines (42 loc) · 6.9 KB

Data 103 Learning Path

This document aims to provide learning resources to help in training for an advanced level. This list is not exhaustive and is simply to help learning some of the core concepts we have around data engineering for that level. We have given a variety of resources from articles to online courses to help with progressing towards completing these learning objectives. We have also put at the end optional certifications you can pursue to concrete your knowledge. Any comments, feedback or reports of missing/broken links please slack the cop-data channel.

If you enjoyed using these learning paths or have feedback, please use this feedback form

Develops optimal collection and data strategies, applying different capture techniques, determining types of data to optimise processes

Designing Data Intensive Applications - Martin Kleppmann (book)

Designing distributed systems - Brendan Burns (book)

How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh (article)

Data Mesh by - Zhamak Dehghani (book)

Devises and implements a robust data governance strategy, advising senior stakeholders on managing availability, usability, integrity and security of data in enterprise systems, based on internal data standards and policies.

Introduction to Governance (article)

Data Governance - The Definitive Guide - Evren Eryurek (book)

Multi-Cloud Architecture and Governance - Jeroen Mulder (book)

Data Governance Fundmentals (course)

Designs and implements efficient data transformation processes at scale, using secure data sharing tools (e.g. Snowflake, Databricks, AirFlow), and dimension modelling techniques.

Data Engineering on Azure - Vlad Riscutia (book)

PySpark & AWS: Master Big Data With PySpark and AWS (course)

What is dimensional Modelling (article)

Definitive Guide to Dimensional Modelling - Ralph Kimball (book)

Displays strong leadership qualities, empowering, supporting, and inspiring the team to strive to deliver data projects in challenging environments

Systems of Engineering Management - Will Larson (book)

Thinking Fast and Slow - Daniel Kahneman (book)

The Five Dysfunctions of a Team - Patrick M. Lencioni (book)

The Manager's Path - Camille Fournier (book)

The Phoenix Project - Gene Kim, Kevin Behr, George Spafford (book)

The Practical Leadership (course)

Further knowledge of machine learning to identify model requirements and build robust ML pipelines.

ML Fundamentals - Regression and Classification (course)

Case Study Approach (course)

Performance Metrics (article)

Introduction to Drift (article)

Post Production Monitoring (article)

ML on AWS (course)

Machine Learning Design Patterns: Solutions to Common Challenges in Data Preparation, Model Building, and MLOps - Valliappa Lakshmanan (book)

Feature Stores

Compares and evaluates data management solutions, including the required tooling, to ensure proposed solutions are matched to client needs

Pitching to stakeholders (article)

Articulating Design Decisions: Communicate with Stakeholders, Keep Your Sanity, and Deliver the Best User Experience - Tom Greever (book)

Lead Data Engineer Exams

Please refer to data 102 for additional supplementary certifications

Azure AI Engineer

Databricks Certified Data Engineer Professional

AWS Certified Data Analytics - Specialty