# Silver to Gold: Customer Metrics

This notebook transforms customer data from the silver layer into analytical metrics in the gold layer. It creates comprehensive customer analytics that can be used for business intelligence and customer relationship management.

## Metrics Overview

1. Purchase Behavior
   - Order counts and values
   - Total spend and average order value
   - Items purchased and frequency

2. Customer Timeline
   - First and last order dates
   - Days between orders
   - Customer lifecycle tracking

3. Category Preferences
   - Favorite categories and brands
   - Category diversity score
   - Shopping patterns

4. Lifetime Value
   - Current customer value
   - Predicted 12-month value
   - Value tier classification

5. Engagement Metrics
   - Review activity
   - Web engagement
   - Purchase frequency

6. Risk & Segmentation
   - Churn risk scoring
   - Customer segmentation
   - Discount sensitivity

## Dependencies
- Silver layer tables: customers_clean, orders_clean, order_items_clean, reviews_clean, web_events_clean
- Gold layer output: customer_metrics


In [0]:
-- Set up the environment
use catalog apjtechup;
use database gold;


## Create Customer Metrics Table

The following query creates a comprehensive customer metrics table that combines various aspects of customer behavior and value.


In [0]:
-- Create customer metrics table
CREATE OR REPLACE TABLE customer_metrics USING ICEBERG
WITH customer_orders AS (
    SELECT 
        o.customer_id,
        COUNT(DISTINCT o.order_id) as total_orders,
        SUM(o.total_amount) as total_spent,
        AVG(o.total_amount) as avg_order_value,
        SUM(oi.quantity) as total_items_purchased,
        AVG(oi.quantity) as avg_items_per_order,
        MIN(o.order_date_only) as first_order_date,
        MAX(o.order_date_only) as last_order_date,
        SUM(CASE WHEN o.has_discount THEN o.discount_amount ELSE 0 END) as total_discount_amount,
        AVG(CASE WHEN o.has_discount THEN o.discount_percentage ELSE 0 END) as avg_discount_percentage,
        COUNT(CASE WHEN o.has_discount THEN 1 END) as orders_with_discount
    FROM apjtechup.silver.orders_clean o
    LEFT JOIN apjtechup.silver.order_items_clean oi ON o.order_id = oi.order_id
    GROUP BY all
),
customer_reviews AS (
    SELECT 
        customer_id,
        COUNT(*) as total_reviews,
        AVG(rating) as avg_rating_given
    FROM apjtechup.silver.reviews_clean
    GROUP BY all
),
customer_web_activity AS (
    SELECT 
        customer_id,
        COUNT(*) as total_web_events,
        COUNT(CASE WHEN event_category = 'Purchase' THEN 1 END) as purchase_events,
        COUNT(CASE WHEN event_category = 'Cart Activity' THEN 1 END) as cart_events,
        COUNT(CASE WHEN event_category = 'Browsing' THEN 1 END) as browsing_events
    FROM apjtechup.silver.web_events_clean
    WHERE customer_id IS NOT NULL
    GROUP BY all
),
customer_preferences AS (
    WITH category_brand_counts AS (
        SELECT
            o.customer_id,
            oi.category_name,
            oi.brand,
            COUNT(*) AS cnt
        FROM apjtechup.silver.orders_clean o
        JOIN apjtechup.silver.order_items_clean oi
            ON o.order_id = oi.order_id
        GROUP BY
            o.customer_id,
            oi.category_name,
            oi.brand
    ),
    ranked AS (
        SELECT
            customer_id,
            category_name,
            brand,
            cnt,
            ROW_NUMBER() OVER (
                PARTITION BY customer_id
                ORDER BY cnt DESC
            ) AS rn
        FROM category_brand_counts
    )
    SELECT
        c.customer_id,
        c.category_name AS favorite_category,
        c.brand AS favorite_brand,
        (
            SELECT COUNT(DISTINCT category_name)
            FROM category_brand_counts cb
            WHERE cb.customer_id = c.customer_id
        ) AS category_count
    FROM ranked c
    WHERE c.rn = 1
)
SELECT 
    c.customer_id,
    c.full_name as customer_name,
    c.email,
    c.registration_date,
    c.age,
    c.age_group,
    c.customer_tier,
    
    -- Purchase Behavior Metrics
    COALESCE(co.total_orders, 0) as total_orders,
    COALESCE(co.total_spent, 0) as total_spent,
    ROUND(COALESCE(co.avg_order_value, 0), 2) as average_order_value,
    COALESCE(co.total_items_purchased, 0) as total_items_purchased,
    ROUND(COALESCE(co.avg_items_per_order, 0), 2) as average_items_per_order,
    
    -- Timing Metrics
    co.first_order_date,
    co.last_order_date,
    CASE 
        WHEN co.first_order_date IS NOT NULL 
        THEN DATEDIFF(CURRENT_DATE(), co.first_order_date)
        ELSE NULL
    END as days_since_first_order,
    CASE 
        WHEN co.last_order_date IS NOT NULL 
        THEN DATEDIFF(CURRENT_DATE(), co.last_order_date)
        ELSE NULL
    END as days_since_last_order,
    CASE 
        WHEN co.total_orders > 1 AND co.first_order_date IS NOT NULL AND co.last_order_date IS NOT NULL
        THEN ROUND(DATEDIFF(co.last_order_date, co.first_order_date) / GREATEST(co.total_orders - 1, 1), 2)
        ELSE NULL
    END as average_days_between_orders,
    
    -- Category Preferences
    cp.favorite_category,
    cp.favorite_brand,
    ROUND(COALESCE(cp.category_count, 0) / GREATEST(COALESCE(co.total_orders, 1), 1), 2) as category_diversity_score,
    
    -- Lifetime Value Metrics
    ROUND(COALESCE(co.total_spent, 0), 2) as customer_lifetime_value,
    ROUND(
        CASE 
            WHEN co.total_orders > 0 AND co.first_order_date IS NOT NULL
            THEN co.total_spent * (365.0 / GREATEST(DATEDIFF(CURRENT_DATE(), co.first_order_date), 1))
            ELSE 0
        END, 2
    ) as predicted_ltv_12m,
    CASE 
        WHEN COALESCE(co.total_spent, 0) >= 1000 THEN 'High Value'
        WHEN COALESCE(co.total_spent, 0) >= 500 THEN 'Medium Value'
        WHEN COALESCE(co.total_spent, 0) >= 100 THEN 'Low Value'
        ELSE 'New/Inactive'
    END as ltv_tier,
    
    -- Engagement Metrics
    COALESCE(cr.total_reviews, 0) as total_reviews,
    ROUND(COALESCE(cr.avg_rating_given, 0), 2) as average_rating_given,
    COALESCE(cwa.total_web_events, 0) as total_web_events,
    ROUND(
        CASE 
            WHEN COALESCE(cwa.total_web_events, 0) > 0
            THEN (COALESCE(cwa.purchase_events, 0) * 3 + COALESCE(cwa.cart_events, 0) * 2 + COALESCE(cwa.browsing_events, 0)) / 
                 GREATEST(COALESCE(cwa.total_web_events, 1), 1)
            ELSE 0
        END, 2
    ) as web_engagement_score,
    
    -- Risk & Segmentation
    CASE 
        WHEN co.last_order_date IS NULL THEN 1.0
        WHEN DATEDIFF(CURRENT_DATE(), co.last_order_date) > 180 THEN 0.9
        WHEN DATEDIFF(CURRENT_DATE(), co.last_order_date) > 90 THEN 0.6
        WHEN DATEDIFF(CURRENT_DATE(), co.last_order_date) > 30 THEN 0.3
        ELSE 0.1
    END as churn_risk_score,
    CASE 
        WHEN co.last_order_date IS NULL THEN 'High Risk'
        WHEN DATEDIFF(CURRENT_DATE(), co.last_order_date) > 180 THEN 'High Risk'
        WHEN DATEDIFF(CURRENT_DATE(), co.last_order_date) > 90 THEN 'Medium Risk'
        WHEN DATEDIFF(CURRENT_DATE(), co.last_order_date) > 30 THEN 'Low Risk'
        ELSE 'Active'
    END as churn_risk_category,
    CASE 
        WHEN COALESCE(co.total_orders, 0) >= 10 AND COALESCE(co.total_spent, 0) >= 1000 THEN 'VIP'
        WHEN COALESCE(co.total_orders, 0) >= 5 AND COALESCE(co.total_spent, 0) >= 500 THEN 'Loyal'
        WHEN COALESCE(co.total_orders, 0) >= 2 THEN 'Regular'
        WHEN COALESCE(co.total_orders, 0) = 1 THEN 'One-time'
        ELSE 'Prospect'
    END as customer_segment,
    
    -- Discount Behavior
    ROUND(COALESCE(co.total_discount_amount, 0), 2) as total_discount_amount,
    ROUND(COALESCE(co.avg_discount_percentage, 0), 2) as discount_percentage_avg,
    CASE 
        WHEN COALESCE(co.orders_with_discount, 0) > COALESCE(co.total_orders, 0) * 0.5 THEN TRUE
        ELSE FALSE
    END as is_discount_sensitive,
    
    -- Metadata
    CURRENT_TIMESTAMP() as created_at,
    CURRENT_TIMESTAMP() as updated_at
FROM apjtechup.silver.customers_clean c
LEFT JOIN customer_orders co ON c.customer_id = co.customer_id
LEFT JOIN customer_reviews cr ON c.customer_id = cr.customer_id
LEFT JOIN customer_web_activity cwa ON c.customer_id = cwa.customer_id
LEFT JOIN customer_preferences cp ON c.customer_id = cp.customer_id;


## Generate Summary Statistics

Creating summary views to analyze the customer metrics data.


In [0]:
-- Create summary statistics
CREATE OR REPLACE TEMPORARY VIEW customer_metrics_summary AS
SELECT 
    COUNT(*) as total_customers,
    COUNT(CASE WHEN total_orders > 0 THEN 1 END) as customers_with_orders,
    ROUND(AVG(customer_lifetime_value), 2) as avg_customer_ltv,
    ROUND(AVG(total_orders), 2) as avg_orders_per_customer,
    ROUND(AVG(average_order_value), 2) as avg_order_value,
    COUNT(CASE WHEN ltv_tier = 'High Value' THEN 1 END) as high_value_customers,
    COUNT(CASE WHEN ltv_tier = 'Medium Value' THEN 1 END) as medium_value_customers,
    COUNT(CASE WHEN ltv_tier = 'Low Value' THEN 1 END) as low_value_customers,
    COUNT(CASE WHEN customer_segment = 'VIP' THEN 1 END) as vip_customers,
    COUNT(CASE WHEN customer_segment = 'Loyal' THEN 1 END) as loyal_customers,
    COUNT(CASE WHEN churn_risk_category = 'High Risk' THEN 1 END) as high_churn_risk,
    COUNT(CASE WHEN churn_risk_category = 'Medium Risk' THEN 1 END) as medium_churn_risk,
    COUNT(CASE WHEN is_discount_sensitive = TRUE THEN 1 END) as discount_sensitive_customers
FROM customer_metrics;

-- Display summary
SELECT 'Customer Metrics Summary' as report_type;
SELECT * FROM customer_metrics_summary;


## Top Customer Analysis

Analyzing the top customers by lifetime value and their characteristics.


In [0]:
-- Top customers by LTV
SELECT 'Top 10 Customers by LTV' as report_type;
SELECT 
    customer_name,
    customer_tier,
    customer_segment,
    total_orders,
    customer_lifetime_value,
    average_order_value,
    favorite_category,
    churn_risk_category
FROM customer_metrics 
WHERE total_orders > 0
ORDER BY customer_lifetime_value DESC 
LIMIT 10;


## Customer Segment Analysis

Analyzing customer segments and their characteristics.


In [0]:
-- Customer segment distribution
SELECT 'Customer Segment Distribution' as report_type;
SELECT 
    customer_segment,
    COUNT(*) as customer_count,
    ROUND(AVG(customer_lifetime_value), 2) as avg_ltv,
    ROUND(AVG(total_orders), 2) as avg_orders,
    ROUND(AVG(average_order_value), 2) as avg_order_value
FROM customer_metrics
GROUP BY customer_segment
ORDER BY avg_ltv DESC;


## Optimize Table

Optimizing the customer metrics table for better query performance.


In [0]:
-- Optimize table
OPTIMIZE customer_metrics;
