This is a fast paced project based mentorship for people who want to be data enginners mostly.
Greate resource for data engineers: CookBook
Python and small SQL knowledge is expected, the mentor should adapt for the level of knowledge of the mentored.
- databases: Concepts of sql and Tables also a little of data modeling
- datalake: concepts of datalake, start of ETLs with pandas and concepets of parquet and avro
- ETL: talking about pyspark, streaming and batch
- orchestrating: understanding about airflow and orchestrating of batch
- One week meetings 1~2 hours
- Review from the past week
- Follow up exercise, posts to read and discussion about the topics
In the end is expected for the mentored to have developed:
- Raw Ingestion
- ETL
- Airflow Scripts
- Design end tables on bigquery
For more advanced ones:
- Real time with Scio and DataFlow