Skip to content

lfpll/data-engineer-mentorship

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mentoring of Data Engineers

This is a fast paced project based mentorship for people who want to be data enginners mostly.

Greate resource for data engineers: CookBook

Python and small SQL knowledge is expected, the mentor should adapt for the level of knowledge of the mentored.

Folder Structure

  • databases: Concepts of sql and Tables also a little of data modeling
  • datalake: concepts of datalake, start of ETLs with pandas and concepets of parquet and avro
  • ETL: talking about pyspark, streaming and batch
  • orchestrating: understanding about airflow and orchestrating of batch

How it rolls?

  • One week meetings 1~2 hours
  • Review from the past week
  • Follow up exercise, posts to read and discussion about the topics

What there is in the end?

In the end is expected for the mentored to have developed:

  • Raw Ingestion
  • ETL
  • Airflow Scripts
  • Design end tables on bigquery

Plus

For more advanced ones:

  • Real time with Scio and DataFlow

About

Mentorship of data engineering material

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published