Retail executives lack real-time, granular visibility into performance drivers - making it difficult to optimize inventory, allocate marketing spend, and retain high-value customers. Without an integrated analytics pipeline that transforms raw transactional data into actionable insights, decisions are reactive, inefficient, and often based on incomplete information.
This project solves that gap by delivering an end-to-end analytics engine that:
- Tracks revenue trends and growth rates by region and product category
- Identifies top-performing stores and products to guide resource allocation
- Segments customers using RFM and cohort analysis to enable targeted retention
- Measures performance against operational targets (e.g., sales KPIs)
- The result: a data-driven foundation for proactive optimization - turning daily transactions into strategic advantage.
This is an end-to-end analytics project demonstrating how raw transactional data can be transformed into actionable business insights using PostgreSQL, Python ETL, advanced SQL, and Power BI.
This project combines data engineering, analytical SQL, and business intelligence principles to answer key questions about revenue, products, stores, and customer behavior.
- Design a star schema optimized for analytical queries.
- Build an ETL pipeline to transform raw data into a structured data warehouse.
- Implement advanced SQL queries using CTEs, window functions, and aggregations.
- Visualize key metrics in Power BI to enable strategic decision-making.
retail-analytics-sql/
├── data/ # Raw synthetic CSV files generated by ETL scripts
├── docs/ # ER diagram & Power BI dashboard screenshot
├── etl/ # Python scripts for data generation and loading
├── sql/ # SQL schema and business-focused queries
├── requirements.txt # Python dependencies
└── README.md # Project documentation
The project uses a star schema for OLAP queries:
- Fact Table:
sales_fact- Metrics:
quantity,total_amount,profit - Foreign keys:
customer_id,store_id,product_id
- Metrics:
- Dimension Tables:
dim_customer: Customer demographics and signup infodim_store: Store details, region, and managerdim_product: Product attributes, category, and pricing
The schema is defined in sql/schema.sql and enables efficient analytical queries.
The ETL workflow is implemented in Python (etl/generate_data.py and etl/load_data.py):
- Extract: Read raw CSV data from
data/. - Transform:
- Load dimension tables directly.
- Calculate
profitper transaction:profit = total_amount - (quantity * unit_cost)
- Prepare
sales_factdata for loading.
- Load: Insert transformed data into PostgreSQL tables (
sales_factand dimension tables).
The sql/queries.sql file contains queries designed to answer real business questions:
- Sales Overview
- Monthly revenue, growth %, and running totals by region
- Example:
SELECT sale_month, SUM(total_amount) AS monthly_revenue, LAG(SUM(total_amount)) OVER (ORDER BY sale_month) AS prev_month_revenue FROM sales_fact GROUP BY sale_month;
- Product Analytics
- Top products by revenue, category rollups, store-level rankings
- Customer Analytics
- RFM segmentation: Recency, Frequency, Monetary value
- Cohort retention: M0, M1, M2 analysis
- Store & Regional Performance
- Revenue trends, Average Order Value (AOV), and KPI vs target
- KPI & Dashboard Metrics
- Profit margins, growth %, ranking, and regional performance
Techniques used: CTEs, window functions (LAG, RANK, NTILE), ROLLUP for subtotals, and safe division (NULLIF) for production-ready queries.
- Top-selling products and most profitable stores
- Monthly and YoY revenue growth
- Customer segmentation and high-value cohorts
- Sales vs target KPIs by region
- Cohort retention trends for lifecycle management
- PostgreSQL server running locally
- Python 3.8+
- Required libraries:
pandas,sqlalchemy,psycopg2-binary - Power BI Desktop (optional, for visualization)
- CREATE DATABASE retail_analytics;
- pip install -r requirements.txt
- cd etl
- python generate_data.py # Creates CSVs in /data
- python load_data.py # Loads data into PostgreSQL
- Use any PostgreSQL client (psql, pgAdmin, DBeaver) to execute queries in sql/queries.sql.
- Open the Power BI dashboard (docs/powerbi_dashboard.png) to see visual insights.
- Identify top-selling products and most profitable stores
- Track monthly revenue growth and regional performance
- Perform customer segmentation using RFM analysis
- Monitor sales vs target KPIs
- Understand cohort retention trends for customer lifecycle management
- Database: PostgreSQL
- ETL & Data Manipulation: Python, Pandas, SQLAlchemy
- Analytics: Advanced SQL (CTEs, Window Functions, ROLLUP)
- Visualization: Power BI
This project is licensed under the MIT License – see LICENSE for details. Free to use, modify, and learn from. Attribution appreciated.

