This project showcases a complete end-to-end SQL-based data analytics workflow using PostgreSQL. It involves designing a data warehouse schema, loading real-world retail datasets, and conducting detailed data exploration, business metric calculations, and reporting using SQL views.
💡 Inspiration: This project was built by following the YouTube tutorial by Data with Baraa. Full credit for the project idea and flow goes to him.
- Schema Design:
goldschema withdim_customers,dim_products, andfact_salestables. - Data Loading: CSV files loaded using
COPY. - Analysis Techniques Used:
- Exploratory Data Analysis (EDA)
- Aggregation & Window Functions
- Time Series & Trend Analysis
- Ranking & Segmentation
- Customer & Product Reporting
- Created a PostgreSQL schema and 3 core tables.
- Loaded datasets with customer, product, and sales information.
- Identified unique countries, categories, and subcategories.
- Investigated time range of orders and customer age distribution.
- Calculated total sales, quantity sold, average selling price.
- Counted orders, products, and unique customers.
- Customer count by country and gender.
- Product distribution and category-wise revenue.
- Customer-wise total sales and order behavior.
- Top 5 and bottom 5 revenue-generating products.
- Top 10 highest-spending customers.
- Sales performance by year and by month.
- Cumulative sales trends using
SUM() OVER()with partitioning.
- Compared each product's sales to its average and previous year.
- Tagged products as
Above Average,Below Average, orNo Change.
- Category-wise sales contribution to total revenue with percentage breakdown.
- Grouped products into cost ranges.
- Segmented customers into:
VIP(12+ months, >5000 sales),Regular(12+ months, ≤5000 sales),New(<12 months).
- Combines customer details with:
- Age and age group
- Total orders, sales, and product count
- Lifespan and recency
- Average order value and monthly spend
- Customer segment (VIP, Regular, New)
- Summarizes product-level metrics:
- Sales, orders, customer reach
- Cost-based performance tags (High, Mid, Low)
- Recency of sales
- Average revenue per order and per month
- Database: PostgreSQL
- Data Format: CSV
- Language: SQL
- Platform: Local PostgreSQL setup
- Create the
goldschema and load the provided CSV files using theCOPYcommand. - Run the SQL scripts in order: Database setup → EDA → Analysis → Reporting views.
- Query the views
gold.report_customersandgold.report_productsto access business insights.
- Idea & Structure: Inspired by Data with Baraa
- Data Source: Sample retail datasets in CSV format
This project is a practical demonstration of how SQL can be used not just for querying but for deep data exploration and reporting. The entire analysis was done using pure SQL, without any BI or visualization tools.