This repository is a 1-week structured refresher in Python, designed for Data Engineers. It revisits the key Python concepts and tools used in real-world data workflows — from scripting and automation to databases and ETL pipelines.
Each day builds on the previous one, moving from Python basics to practical data engineering patterns.
To rebuild strong Python foundations for data engineering by working through:
- Modern Python syntax & clean code principles
- Data manipulation with pandas
- File handling (JSON, CSV, etc.)
- API data extraction and cleaning
- Database integration with PostgreSQL / SQLAlchemy
- Building simple, modular ETL pipelines
| Day | Focus | Key Topics |
|---|---|---|
| Day 1 | Python Refresher | Variables, control flow, functions, exceptions, file I/O |
| Day 2 | Working with Files | JSON, CSV handling, file cleaning utilities |
| Day 3 | Pandas Foundations | DataFrames, cleaning, transformations, aggregations |
| Day 4 | APIs & Data Extraction | Fetching API data, normalizing JSON, saving clean datasets |
| Day 5 | Databases (SQLite) | Loading and querying structured data |
| Day 6 | Automation & Logging | Modular ETL pipeline, logging, error handling |
| Day 7 | PostgreSQL Integration | Real database connection, SQLAlchemy, ETL to PostgreSQL |
git clone https://github.com/nenalukic/python-foundations-refresher.git
cd python-foundations-refresherpython3 -m venv venv
source venv/bin/activateIf you already have the requirements.txt file:
pip install -r requirements.txtIf not, generate it from your current environment:
pip freeze > requirements.txt- Python 3.11+
- pandas — data manipulation
- requests — API access
- SQLAlchemy — database ORM
- psycopg2-binary — PostgreSQL driver
- logging — tracking ETL execution
Start and connect to PostgreSQL:
brew services start postgresql
psql postgresCreate a new database for ETL testing:
CREATE DATABASE etl_project;Use two terminals for productivity:
- 🧩 Terminal 1: run
psqlto inspect or query the database - 🧠 Terminal 2: run Python scripts inside the virtual environment
- Each day lives in its own folder (
day-1,day-2, …). - Every folder includes examples, exercises, and short
README.mdnotes. - The pipeline evolves — from file manipulation → API calls → databases → ETL automation.
- Confidently write and structure Python scripts
- Handle JSON, CSV, and API data
- Clean and transform datasets with pandas
- Load and query data in PostgreSQL
- Build reproducible, logged ETL pipelines
Nevenka Lukic Data Engineer — Python • ETL • Data Pipelines