🐍 Python Foundations Refresher — Data Engineering Focus

📘 Overview

This repository is a 1-week structured refresher in Python, designed for Data Engineers. It revisits the key Python concepts and tools used in real-world data workflows — from scripting and automation to databases and ETL pipelines.

Each day builds on the previous one, moving from Python basics to practical data engineering patterns.

🎯 Goal

To rebuild strong Python foundations for data engineering by working through:

Modern Python syntax & clean code principles
Data manipulation with pandas
File handling (JSON, CSV, etc.)
API data extraction and cleaning
Database integration with PostgreSQL / SQLAlchemy
Building simple, modular ETL pipelines

🗓️ Weekly Breakdown

Day	Focus	Key Topics
Day 1	Python Refresher	Variables, control flow, functions, exceptions, file I/O
Day 2	Working with Files	JSON, CSV handling, file cleaning utilities
Day 3	Pandas Foundations	DataFrames, cleaning, transformations, aggregations
Day 4	APIs & Data Extraction	Fetching API data, normalizing JSON, saving clean datasets
Day 5	Databases (SQLite)	Loading and querying structured data
Day 6	Automation & Logging	Modular ETL pipeline, logging, error handling
Day 7	PostgreSQL Integration	Real database connection, SQLAlchemy, ETL to PostgreSQL

⚙️ Setup Instructions

1️⃣ Clone the Repository

git clone https://github.com/nenalukic/python-foundations-refresher.git
cd python-foundations-refresher

2️⃣ Create and Activate a Virtual Environment

python3 -m venv venv
source venv/bin/activate

3️⃣ Install Dependencies

If you already have the requirements.txt file:

pip install -r requirements.txt

If not, generate it from your current environment:

pip freeze > requirements.txt

🧰 Key Tools & Libraries

Python 3.11+
pandas — data manipulation
requests — API access
SQLAlchemy — database ORM
psycopg2-binary — PostgreSQL driver
logging — tracking ETL execution

🗄️ PostgreSQL (Mac Setup)

Start and connect to PostgreSQL:

brew services start postgresql
psql postgres

Create a new database for ETL testing:

CREATE DATABASE etl_project;

Use two terminals for productivity:

🧩 Terminal 1: run psql to inspect or query the database
🧠 Terminal 2: run Python scripts inside the virtual environment

🧩 Project Workflow

Each day lives in its own folder (day-1, day-2, …).
Every folder includes examples, exercises, and short README.md notes.
The pipeline evolves — from file manipulation → API calls → databases → ETL automation.

🧾 Key Learning Outcomes

Confidently write and structure Python scripts
Handle JSON, CSV, and API data
Clean and transform datasets with pandas
Load and query data in PostgreSQL
Build reproducible, logged ETL pipelines

👩‍💻 Author

Nevenka Lukic Data Engineer — Python • ETL • Data Pipelines

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
day-1		day-1
day-2		day-2
day-3		day-3
day-4		day-4
day-5		day-5
day-6		day-6
day-7		day-7
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🐍 Python Foundations Refresher — Data Engineering Focus

📘 Overview

🎯 Goal

🗓️ Weekly Breakdown

⚙️ Setup Instructions

1️⃣ Clone the Repository

2️⃣ Create and Activate a Virtual Environment

3️⃣ Install Dependencies

🧰 Key Tools & Libraries

🗄️ PostgreSQL (Mac Setup)

🧩 Project Workflow

🧾 Key Learning Outcomes

👩‍💻 Author

About

Uh oh!

Releases

Packages

Languages

nenalukic/python-foundations-refresher

Folders and files

Latest commit

History

Repository files navigation

🐍 Python Foundations Refresher — Data Engineering Focus

📘 Overview

🎯 Goal

🗓️ Weekly Breakdown

⚙️ Setup Instructions

1️⃣ Clone the Repository

2️⃣ Create and Activate a Virtual Environment

3️⃣ Install Dependencies

🧰 Key Tools & Libraries

🗄️ PostgreSQL (Mac Setup)

🧩 Project Workflow

🧾 Key Learning Outcomes

👩‍💻 Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages