Skip to content

nenalukic/python-foundations-refresher

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🐍 Python Foundations Refresher — Data Engineering Focus

📘 Overview

This repository is a 1-week structured refresher in Python, designed for Data Engineers. It revisits the key Python concepts and tools used in real-world data workflows — from scripting and automation to databases and ETL pipelines.

Each day builds on the previous one, moving from Python basics to practical data engineering patterns.


🎯 Goal

To rebuild strong Python foundations for data engineering by working through:

  • Modern Python syntax & clean code principles
  • Data manipulation with pandas
  • File handling (JSON, CSV, etc.)
  • API data extraction and cleaning
  • Database integration with PostgreSQL / SQLAlchemy
  • Building simple, modular ETL pipelines

🗓️ Weekly Breakdown

Day Focus Key Topics
Day 1 Python Refresher Variables, control flow, functions, exceptions, file I/O
Day 2 Working with Files JSON, CSV handling, file cleaning utilities
Day 3 Pandas Foundations DataFrames, cleaning, transformations, aggregations
Day 4 APIs & Data Extraction Fetching API data, normalizing JSON, saving clean datasets
Day 5 Databases (SQLite) Loading and querying structured data
Day 6 Automation & Logging Modular ETL pipeline, logging, error handling
Day 7 PostgreSQL Integration Real database connection, SQLAlchemy, ETL to PostgreSQL

⚙️ Setup Instructions

1️⃣ Clone the Repository

git clone https://github.com/nenalukic/python-foundations-refresher.git
cd python-foundations-refresher

2️⃣ Create and Activate a Virtual Environment

python3 -m venv venv
source venv/bin/activate

3️⃣ Install Dependencies

If you already have the requirements.txt file:

pip install -r requirements.txt

If not, generate it from your current environment:

pip freeze > requirements.txt

🧰 Key Tools & Libraries

  • Python 3.11+
  • pandas — data manipulation
  • requests — API access
  • SQLAlchemy — database ORM
  • psycopg2-binary — PostgreSQL driver
  • logging — tracking ETL execution

🗄️ PostgreSQL (Mac Setup)

Start and connect to PostgreSQL:

brew services start postgresql
psql postgres

Create a new database for ETL testing:

CREATE DATABASE etl_project;

Use two terminals for productivity:

  • 🧩 Terminal 1: run psql to inspect or query the database
  • 🧠 Terminal 2: run Python scripts inside the virtual environment

🧩 Project Workflow

  1. Each day lives in its own folder (day-1, day-2, …).
  2. Every folder includes examples, exercises, and short README.md notes.
  3. The pipeline evolves — from file manipulation → API calls → databases → ETL automation.

🧾 Key Learning Outcomes

  • Confidently write and structure Python scripts
  • Handle JSON, CSV, and API data
  • Clean and transform datasets with pandas
  • Load and query data in PostgreSQL
  • Build reproducible, logged ETL pipelines

👩‍💻 Author

Nevenka Lukic Data Engineer — Python • ETL • Data Pipelines

About

Focus on refresher + modern best practices + data-engineering-focused exercises.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages