# Sales Data Analysis - Key KPIs Dashboard

This notebook analyzes sales data to identify key performance indicators (KPIs) and business insights.

## Dataset Overview
- **Source**: Sales transaction data
- **Time Period**: January - March 2019
- **Locations**: Yangon, Naypyitaw, Mandalay (Branches A, B, C)
- **Product Categories**: Health & Beauty, Electronics, Home & Lifestyle, Food & Beverages, Fashion

## Key Metrics We'll Analyze
1. Total Revenue and Sales Volume
2. Branch and City Performance
3. Product Line Analysis
4. Customer Segmentation
5. Payment Method Preferences
6. Temporal Sales Patterns
7. Customer Satisfaction Metrics

## Step 1: Data Ingestion and Setup

First, let's load our CSV data into a Delta table for analysis.

In [None]:
-- Create the sales table from CSV
CREATE OR REPLACE TABLE sales_data (
  Invoice_ID STRING,
  Branch STRING,
  City STRING,
  Customer_Type STRING,
  Gender STRING,
  Product_Line STRING,
  Unit_Price DECIMAL(10,2),
  Quantity INT,
  Tax_5_Percent DECIMAL(10,4),
  Total DECIMAL(10,4),
  Date DATE,
  Time TIME,
  Payment STRING,
  COGS DECIMAL(10,2),
  Gross_Margin_Percentage DECIMAL(12,9),
  Gross_Income DECIMAL(10,4),
  Rating DECIMAL(2,1)
)
USING CSV
OPTIONS (
  path '/FileStore/shared_uploads/sales-less-record.csv',
  header 'true',
  inferSchema 'true'
)

In [None]:
-- Data Quality Check: Overview of our dataset
SELECT 
  COUNT(*) as total_records,
  COUNT(DISTINCT Invoice_ID) as unique_invoices,
  COUNT(DISTINCT Branch) as branches,
  COUNT(DISTINCT City) as cities,
  COUNT(DISTINCT Product_Line) as product_lines,
  MIN(Date) as earliest_date,
  MAX(Date) as latest_date
FROM sales_data

## KPI 1: Overall Business Performance

Let's start with the most critical business metrics.

In [None]:
-- Overall Business KPIs
SELECT 
  ROUND(SUM(Total), 2) as total_revenue,
  ROUND(SUM(Gross_Income), 2) as total_gross_income,
  ROUND(SUM(COGS), 2) as total_cogs,
  COUNT(*) as total_transactions,
  SUM(Quantity) as total_items_sold,
  ROUND(AVG(Total), 2) as avg_transaction_value,
  ROUND(AVG(Rating), 2) as avg_customer_rating,
  ROUND(AVG(Gross_Margin_Percentage), 2) as avg_gross_margin_pct
FROM sales_data

## KPI 2: Branch and Location Performance

Understanding which locations drive the most revenue and profit.

In [None]:
-- Performance by Branch and City
SELECT 
  Branch,
  City,
  COUNT(*) as transactions,
  ROUND(SUM(Total), 2) as total_revenue,
  ROUND(SUM(Gross_Income), 2) as total_profit,
  ROUND(AVG(Total), 2) as avg_transaction_value,
  ROUND(AVG(Rating), 2) as avg_rating,
  SUM(Quantity) as total_items_sold
FROM sales_data
GROUP BY Branch, City
ORDER BY total_revenue DESC

In [None]:
-- Revenue contribution by location
SELECT 
  City,
  ROUND(SUM(Total), 2) as revenue,
  ROUND(SUM(Total) * 100.0 / (SELECT SUM(Total) FROM sales_data), 2) as revenue_percentage
FROM sales_data
GROUP BY City
ORDER BY revenue DESC

## KPI 3: Product Line Analysis

Identifying top-performing product categories and their profitability.

In [None]:
-- Product Line Performance
SELECT 
  Product_Line,
  COUNT(*) as transactions,
  ROUND(SUM(Total), 2) as total_revenue,
  ROUND(SUM(Gross_Income), 2) as total_profit,
  ROUND(AVG(Total), 2) as avg_transaction_value,
  ROUND(AVG(Unit_Price), 2) as avg_unit_price,
  ROUND(AVG(Rating), 2) as avg_rating,
  SUM(Quantity) as total_quantity_sold,
  ROUND(SUM(Total) * 100.0 / (SELECT SUM(Total) FROM sales_data), 2) as revenue_share_pct
FROM sales_data
GROUP BY Product_Line
ORDER BY total_revenue DESC

In [None]:
-- Product Line Performance by City
SELECT 
  City,
  Product_Line,
  COUNT(*) as transactions,
  ROUND(SUM(Total), 2) as revenue,
  ROUND(AVG(Rating), 2) as avg_rating
FROM sales_data
GROUP BY City, Product_Line
ORDER BY City, revenue DESC

## KPI 4: Customer Segmentation Analysis

Understanding customer behavior patterns and member vs normal customer performance.

In [None]:
-- Customer Type Analysis
SELECT 
  Customer_Type,
  COUNT(*) as transactions,
  ROUND(SUM(Total), 2) as total_revenue,
  ROUND(AVG(Total), 2) as avg_transaction_value,
  ROUND(AVG(Rating), 2) as avg_rating,
  ROUND(SUM(Total) * 100.0 / (SELECT SUM(Total) FROM sales_data), 2) as revenue_share_pct
FROM sales_data
GROUP BY Customer_Type
ORDER BY total_revenue DESC

In [None]:
-- Gender-based Analysis
SELECT 
  Gender,
  Customer_Type,
  COUNT(*) as transactions,
  ROUND(SUM(Total), 2) as total_revenue,
  ROUND(AVG(Total), 2) as avg_transaction_value,
  ROUND(AVG(Rating), 2) as avg_rating
FROM sales_data
GROUP BY Gender, Customer_Type
ORDER BY total_revenue DESC

In [None]:
-- Product preference by customer segment
SELECT 
  Customer_Type,
  Product_Line,
  COUNT(*) as purchases,
  ROUND(SUM(Total), 2) as revenue,
  ROUND(AVG(Rating), 2) as avg_rating
FROM sales_data
GROUP BY Customer_Type, Product_Line
ORDER BY Customer_Type, revenue DESC

## KPI 5: Payment Method Preferences

Understanding customer payment behavior and its impact on business.

In [None]:
-- Payment Method Analysis
SELECT 
  Payment,
  COUNT(*) as transactions,
  ROUND(SUM(Total), 2) as total_revenue,
  ROUND(AVG(Total), 2) as avg_transaction_value,
  ROUND(AVG(Rating), 2) as avg_rating,
  ROUND(COUNT(*) * 100.0 / (SELECT COUNT(*) FROM sales_data), 2) as usage_percentage
FROM sales_data
GROUP BY Payment
ORDER BY transactions DESC

In [None]:
-- Payment preference by customer type and location
SELECT 
  City,
  Customer_Type,
  Payment,
  COUNT(*) as transactions,
  ROUND(SUM(Total), 2) as revenue
FROM sales_data
GROUP BY City, Customer_Type, Payment
ORDER BY City, Customer_Type, transactions DESC

## KPI 6: Time-Based Analysis

Understanding sales patterns over time and identifying peak periods.

In [None]:
-- Monthly Sales Trends
SELECT 
  DATE_FORMAT(Date, 'yyyy-MM') as month,
  COUNT(*) as transactions,
  ROUND(SUM(Total), 2) as total_revenue,
  ROUND(SUM(Gross_Income), 2) as total_profit,
  ROUND(AVG(Total), 2) as avg_transaction_value,
  ROUND(AVG(Rating), 2) as avg_rating
FROM sales_data
GROUP BY DATE_FORMAT(Date, 'yyyy-MM')
ORDER BY month

In [None]:
-- Daily Sales Pattern (Day of Week)
SELECT 
  DAYOFWEEK(Date) as day_of_week,
  CASE 
    WHEN DAYOFWEEK(Date) = 1 THEN 'Sunday'
    WHEN DAYOFWEEK(Date) = 2 THEN 'Monday'
    WHEN DAYOFWEEK(Date) = 3 THEN 'Tuesday'
    WHEN DAYOFWEEK(Date) = 4 THEN 'Wednesday'
    WHEN DAYOFWEEK(Date) = 5 THEN 'Thursday'
    WHEN DAYOFWEEK(Date) = 6 THEN 'Friday'
    WHEN DAYOFWEEK(Date) = 7 THEN 'Saturday'
  END as day_name,
  COUNT(*) as transactions,
  ROUND(SUM(Total), 2) as total_revenue,
  ROUND(AVG(Total), 2) as avg_transaction_value
FROM sales_data
GROUP BY DAYOFWEEK(Date)
ORDER BY day_of_week

In [None]:
-- Hourly Sales Pattern
SELECT 
  HOUR(Time) as hour_of_day,
  COUNT(*) as transactions,
  ROUND(SUM(Total), 2) as total_revenue,
  ROUND(AVG(Total), 2) as avg_transaction_value
FROM sales_data
GROUP BY HOUR(Time)
ORDER BY hour_of_day

## KPI 7: Customer Satisfaction & Quality Metrics

Analyzing customer ratings and satisfaction across different dimensions.

In [None]:
-- Rating Distribution Analysis
SELECT 
  CASE 
    WHEN Rating >= 9.0 THEN 'Excellent (9.0+)'
    WHEN Rating >= 8.0 THEN 'Very Good (8.0-8.9)'
    WHEN Rating >= 7.0 THEN 'Good (7.0-7.9)'
    WHEN Rating >= 6.0 THEN 'Fair (6.0-6.9)'
    ELSE 'Poor (<6.0)'
  END as rating_category,
  COUNT(*) as transactions,
  ROUND(COUNT(*) * 100.0 / (SELECT COUNT(*) FROM sales_data), 2) as percentage,
  ROUND(SUM(Total), 2) as total_revenue
FROM sales_data
GROUP BY 
  CASE 
    WHEN Rating >= 9.0 THEN 'Excellent (9.0+)'
    WHEN Rating >= 8.0 THEN 'Very Good (8.0-8.9)'
    WHEN Rating >= 7.0 THEN 'Good (7.0-7.9)'
    WHEN Rating >= 6.0 THEN 'Fair (6.0-6.9)'
    ELSE 'Poor (<6.0)'
  END
ORDER BY 
  CASE 
    WHEN Rating >= 9.0 THEN 1
    WHEN Rating >= 8.0 THEN 2
    WHEN Rating >= 7.0 THEN 3
    WHEN Rating >= 6.0 THEN 4
    ELSE 5
  END

In [None]:
-- Rating correlation with revenue
SELECT 
  City,
  Product_Line,
  ROUND(AVG(Rating), 2) as avg_rating,
  COUNT(*) as transactions,
  ROUND(SUM(Total), 2) as total_revenue,
  ROUND(AVG(Total), 2) as avg_transaction_value
FROM sales_data
GROUP BY City, Product_Line
ORDER BY avg_rating DESC, total_revenue DESC

## KPI 8: Profitability Analysis

Deep dive into profit margins and cost analysis.

In [None]:
-- Profitability by Product Line
SELECT 
  Product_Line,
  COUNT(*) as transactions,
  ROUND(SUM(Total), 2) as total_revenue,
  ROUND(SUM(COGS), 2) as total_costs,
  ROUND(SUM(Gross_Income), 2) as total_profit,
  ROUND(AVG(Gross_Margin_Percentage), 2) as avg_margin_pct,
  ROUND((SUM(Gross_Income) / SUM(Total)) * 100, 2) as actual_margin_pct
FROM sales_data
GROUP BY Product_Line
ORDER BY total_profit DESC

In [None]:
-- High-value transaction analysis
SELECT 
  'High Value (>500)' as transaction_type,
  COUNT(*) as transactions,
  ROUND(SUM(Total), 2) as total_revenue,
  ROUND(AVG(Total), 2) as avg_transaction_value,
  ROUND(AVG(Rating), 2) as avg_rating
FROM sales_data
WHERE Total > 500

UNION ALL

SELECT 
  'Medium Value (100-500)' as transaction_type,
  COUNT(*) as transactions,
  ROUND(SUM(Total), 2) as total_revenue,
  ROUND(AVG(Total), 2) as avg_transaction_value,
  ROUND(AVG(Rating), 2) as avg_rating
FROM sales_data
WHERE Total BETWEEN 100 AND 500

UNION ALL

SELECT 
  'Low Value (<100)' as transaction_type,
  COUNT(*) as transactions,
  ROUND(SUM(Total), 2) as total_revenue,
  ROUND(AVG(Total), 2) as avg_transaction_value,
  ROUND(AVG(Rating), 2) as avg_rating
FROM sales_data
WHERE Total < 100

ORDER BY avg_transaction_value DESC

## KPI 9: Executive Summary & Key Insights

Final summary with actionable business insights.

In [None]:
-- Top performing combinations for strategic focus
SELECT 
  City,
  Product_Line,
  Customer_Type,
  COUNT(*) as transactions,
  ROUND(SUM(Total), 2) as revenue,
  ROUND(AVG(Total), 2) as avg_order_value,
  ROUND(AVG(Rating), 2) as avg_rating,
  ROUND(SUM(Gross_Income), 2) as profit
FROM sales_data
GROUP BY City, Product_Line, Customer_Type
HAVING COUNT(*) >= 5  -- Filter for significant combinations
ORDER BY revenue DESC
LIMIT 15

In [None]:
-- Performance benchmarks and targets
WITH stats AS (
  SELECT 
    AVG(Total) as avg_transaction,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY Total) as median_transaction,
    PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY Total) as p75_transaction,
    PERCENTILE_CONT(0.90) WITHIN GROUP (ORDER BY Total) as p90_transaction,
    AVG(Rating) as avg_rating,
    MIN(Rating) as min_rating,
    MAX(Rating) as max_rating
  FROM sales_data
)
SELECT 
  'Transaction Value Benchmarks' as metric_type,
  ROUND(avg_transaction, 2) as average,
  ROUND(median_transaction, 2) as median,
  ROUND(p75_transaction, 2) as top_25_percent,
  ROUND(p90_transaction, 2) as top_10_percent,
  NULL as rating_avg,
  NULL as rating_range
FROM stats

UNION ALL

SELECT 
  'Customer Rating Benchmarks' as metric_type,
  NULL as average,
  NULL as median,
  NULL as top_25_percent,
  NULL as top_10_percent,
  ROUND(avg_rating, 2) as rating_avg,
  CONCAT(ROUND(min_rating, 1), ' - ', ROUND(max_rating, 1)) as rating_range
FROM stats

## Key Business Insights & Recommendations

Based on the analysis above, here are the critical insights:

### 🏆 Top Performers
- **Best performing city/branch combination**: Check the location performance results
- **Most profitable product line**: Review product line analysis
- **Highest-value customer segment**: Member vs Normal customer comparison

### 📊 Key Metrics Summary
1. **Revenue Concentration**: Identify which locations drive most sales
2. **Product Mix**: Understand which categories perform best
3. **Customer Loyalty**: Member vs Normal customer behavior patterns
4. **Payment Preferences**: Customer payment method adoption
5. **Time Patterns**: Peak selling hours and days
6. **Satisfaction Levels**: Rating distribution and correlation with sales

### 🎯 Strategic Recommendations
1. **Focus on high-performing locations** for expansion or increased investment
2. **Optimize product mix** based on profitability and customer ratings
3. **Develop member retention programs** if members show higher value
4. **Improve payment infrastructure** for preferred methods
5. **Staff optimization** during peak hours identified in time analysis
6. **Quality improvement** for low-rated product categories

### 📈 Growth Opportunities
- Product lines with high ratings but low volume
- Underperforming locations with improvement potential
- Customer segments with growth opportunity
- Time slots with capacity for increased sales

---
*This analysis provides a comprehensive view of business performance across multiple dimensions. Use these insights to drive data-informed decision making and strategic planning.*