Skip to content

Designed and implemented an ETL pipeline processing 10k+ sales records using Pandas. Modeled a Star Schema database architecture to optimize analytical queries. Developed SQL queries for profit margin analysis and KPI reporting. Tech Stack: Python, SQL (SQLite), Pandas, SQLAlchemy, Matplotlib.

License

Notifications You must be signed in to change notification settings

marccass/Retail-Data-Warehousing-Project-Python-SQL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

End-to-End Retail Data Warehouse Pipeline

📌 Project Overview

This project simulates a real-world Business Intelligence environment. The goal is to transform raw sales data into a Data Warehouse optimized for business analytics.

I built a complete ETL (Extract, Transform, Load) pipeline using Python and modeled the data into a Star Schema architecture using SQL.

🛠 Tech Stack

  • Python: Pandas (Data Cleaning), SQLAlchemy (ORM).
  • SQL: SQLite, Window Functions, Joins, Aggregations.
  • Data Modeling: Star Schema (Fact Table & Dimension Tables).
  • Visualization: Matplotlib.

🏗 Architecture

Data is transformed from a flat file (.csv) into a relational model:

  • Fact Table: fact_vendes (Transactions).
  • Dimensions: dim_clients (Customers), dim_productes (Products), dim_llocs (Locations).

📊 Business Insights (Examples)

Leveraging advanced SQL queries, the analysis revealed that:

  1. The Technology category is the most profitable (17.4% margin).
  2. The Furniture category is critically underperforming (only 2.5% margin), suggesting potential issues with logistics costs.

🚀 How to Run

  1. Install dependencies: pip install -r requirements.txt
  2. Run ETL pipeline: python scripts/etl_pipeline.py
  3. Execute analysis: python scripts/executar_sql.py

About

Designed and implemented an ETL pipeline processing 10k+ sales records using Pandas. Modeled a Star Schema database architecture to optimize analytical queries. Developed SQL queries for profit margin analysis and KPI reporting. Tech Stack: Python, SQL (SQLite), Pandas, SQLAlchemy, Matplotlib.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages