This project simulates a real-world Business Intelligence environment. The goal is to transform raw sales data into a Data Warehouse optimized for business analytics.
I built a complete ETL (Extract, Transform, Load) pipeline using Python and modeled the data into a Star Schema architecture using SQL.
- Python: Pandas (Data Cleaning), SQLAlchemy (ORM).
- SQL: SQLite, Window Functions, Joins, Aggregations.
- Data Modeling: Star Schema (Fact Table & Dimension Tables).
- Visualization: Matplotlib.
Data is transformed from a flat file (.csv) into a relational model:
- Fact Table:
fact_vendes(Transactions). - Dimensions:
dim_clients(Customers),dim_productes(Products),dim_llocs(Locations).
Leveraging advanced SQL queries, the analysis revealed that:
- The Technology category is the most profitable (17.4% margin).
- The Furniture category is critically underperforming (only 2.5% margin), suggesting potential issues with logistics costs.
- Install dependencies:
pip install -r requirements.txt - Run ETL pipeline:
python scripts/etl_pipeline.py - Execute analysis:
python scripts/executar_sql.py