<a target="_blank" href="https://colab.research.google.com/github/lukebarousse/Int_SQL_Data_Analytics_Course/blob/main/8_Project/1.1_Customer_Segementation.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# 1️⃣ Customer Segmentation (Who Are Our Most Valuable Customers?)

In [2]:
import sys
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

# If running in Google Colab, install PostgreSQL and restore the database
if 'google.colab' in sys.modules:
    # Install PostgreSQL
    !sudo apt-get install postgresql -qq > /dev/null 2>&1

    # Start PostgreSQL service (suppress output)
    !sudo service postgresql start > /dev/null 2>&1

    # Set password for the 'postgres' user to avoid authentication errors (suppress output)
    !sudo -u postgres psql -c "ALTER USER postgres WITH PASSWORD 'password';" > /dev/null 2>&1

    # Create the 'colab_db' database (suppress output)
    !sudo -u postgres psql -c "CREATE DATABASE contoso_100k;" > /dev/null 2>&1

    # Download the PostgreSQL .sql dump
    !wget -q -O contoso_100k.sql https://github.com/lukebarousse/Int_SQL_Data_Analytics_Course/releases/download/v.0.0.0/contoso_100k.sql

    # Restore the dump file into the PostgreSQL database (suppress output)
    !sudo -u postgres psql contoso_100k < contoso_100k.sql > /dev/null 2>&1

    # Shift libraries from ipython-sql to jupysql
    !pip uninstall -y ipython-sql > /dev/null 2>&1
    !pip install jupysql > /dev/null 2>&1

# Load the sql extension for SQL magic
%load_ext sql

# Connect to the PostgreSQL database
%sql postgresql://postgres:password@localhost:5432/contoso_100k

# Enable automatic conversion of SQL results to pandas DataFrames
%config SqlMagic.autopandas = True

# Disable named parameters for SQL magic
%config SqlMagic.named_parameters = "disabled"

# Display pandas number to two decimal places
pd.options.display.float_format = '{:.2f}'.format

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


## Background

You're a **data analyst at an e-commerce company**. Your stakeholders on marketing & finance teams need insights to improve customer retention and maximize revenue. They have three key questions:

1️⃣ **Who are our most valuable customers?** (Customer Segmentation)

2️⃣ **How do different customer groups generate long-term revenue?** (Cohort-Based LTV) 

3️⃣ **Which customers haven’t purchased recently?** (Retention Analysis)

Your job is to create a structured analysis using SQL that answers these questions and provides actionable insights for the business.

## Analysis

#### Overview
- Categorize customers based on their total lifetime value (LTV).
- Assign customers to **High, Mid, and Low-value** groups using CASE WHEN.

💼 **Example Use Cases:** Enables targeted marketing and personalized experiences
- Provide VIP benefits to high-value customers (early access, premium service)
- Create targeted upgrade paths for mid-value customers through personalized promotions
- Design re-engagement campaigns for low-value customers to increase purchase frequency
- Optimize marketing spend based on customer segment potential

#### Query Steps

1. Get the customer's lifetime value (LTV). 

In [7]:
%%sql 

SELECT
    customerkey,
    SUM(total_net_revenue) AS total_ltv
FROM cohort_analysis
GROUP BY customerkey

Unnamed: 0,customerkey,total_ltv
0,15,2217.41
1,180,2510.22
2,185,1395.52
3,243,287.67
4,387,4655.84
...,...,...
49482,2099619,6709.94
49483,2099656,10404.68
49484,2099697,38.20
49485,2099711,6008.67


2. Get the 25th and 75th percentile of the LTV. This will help us segement the customer's (similar to the notebook [3_Advanced_Segementation.ipynb](../1_Pivot_With_Case_Statements/3_Advanced_Segmentation.ipynb)).
    - High-Value: Customers in the top 25% (75th percentile and above)
    - Mid-Value: Customers in the middle 50% (25th to 75th percentile)
    - Low-Value: Customers in the bottom 25% (below the 25th percentile)

In [10]:
%%sql 

-- Put previous main query into a CTE
WITH customer_ltv AS (
    SELECT
        customerkey,
        SUM(total_net_revenue) AS total_ltv
    FROM cohort_analysis
    GROUP BY customerkey
)

SELECT
    PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY total_ltv) AS ltv_25th_percentile,
    PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY total_ltv) AS ltv_75th_percentile
FROM customer_ltv;

Unnamed: 0,percentile_25th,percentile_75th
0,843.59,5584.04


3. Using the 25th and 75th percentile, we can now segment the customers into High, Mid, and Low-value segments.

In [11]:
%%sql

WITH customer_ltv AS (
    SELECT
        customerkey,
        SUM(total_net_revenue) AS total_ltv
    FROM cohort_analysis
    GROUP BY customerkey
),

-- Put previous main query into a CTE
customer_segments AS (
    SELECT
        PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY total_ltv) AS percentile_25th,
        PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY total_ltv) AS percentile_75th
    FROM customer_ltv
)

-- Add the segments to the main query
SELECT
    c.customerkey,
    c.total_ltv,
    CASE
        WHEN c.total_ltv < percentile_25th THEN '1 - Low-Value'
        WHEN c.total_ltv BETWEEN percentile_25th AND percentile_75th THEN '2 - Mid-Value'
        ELSE '3 - High-Value'
    END AS customer_segment
FROM customer_ltv c,
    customer_segments cs;

Unnamed: 0,customerkey,total_ltv,customer_segment
0,15,2217.41,Mid-Value
1,180,2510.22,Mid-Value
2,185,1395.52,Mid-Value
3,243,287.67,Low-Value
4,387,4655.84,Mid-Value
...,...,...,...
49482,2099619,6709.94,High-Value
49483,2099656,10404.68,High-Value
49484,2099697,38.20,Low-Value
49485,2099711,6008.67,High-Value


4. Get the total revenue for each customer segment.

In [14]:
%%sql

WITH customer_ltv AS (
    SELECT
        customerkey,
        SUM(total_net_revenue) AS total_ltv
    FROM cohort_analysis
    GROUP BY customerkey
),

customer_segments AS (
    SELECT
        PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY total_ltv) AS percentile_25th,
        PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY total_ltv) AS percentile_75th
    FROM customer_ltv
),

-- Put previous main query into a CTE
segement_values AS (
    SELECT
        c.customerkey,
        c.total_ltv,
        CASE
            WHEN c.total_ltv > percentile_75th THEN 'High-Value'
            WHEN c.total_ltv BETWEEN percentile_25th AND percentile_75th THEN 'Mid-Value'
            ELSE 'Low-Value'
        END AS customer_segment
    FROM customer_ltv c,
    customer_segments cs
)

SELECT
    customer_segment,
    SUM(total_ltv) AS total_ltv,
    COUNT(customerkey) AS customer_count
FROM segement_values
GROUP BY customer_segment
ORDER BY total_ltv DESC
;

Unnamed: 0,customer_segment,total_ltv,customer_count
0,High-Value,135429277.27,12372
1,Mid-Value,66636451.79,24743
2,Low-Value,4341809.53,12372


#### 📊 Key Findings

- High-value segment (25% of customers) drives 66% of revenue ($135.4M)
    - 12,372 customers (25% of 49,487 total customers)
    - $135.4M / $206.4M total revenue = 66%
- Mid-value segment (50% of customers) generates 32% of revenue ($66.6M)
    - 24,743 customers (50% of 49,487 total customers)
    - $66.6M / $206.4M total revenue = 32%
- Low-value segment (25% of customers) accounts for 2% of revenue ($4.3M)
    - 12,372 customers (25% of 49,487 total customers)
    - $4.3M / $206.4M total revenue = 2%

### 💡 Business Insights

- High-Value (66% revenue):
    - Offer premium membership program to 12,372 VIP customers
    - Provide early access to new products and dedicated support
    - Focus on retention as losing one customer impacts revenue significantly
- Mid-Value (32% revenue):
    - Create upgrade paths for 24,743 customers through personalized promotions
    - Target with "next best product" recommendations based on high-value patterns
    - Potential $66.6M → $135.4M revenue opportunity if upgraded to high-value
- Low-Value (2% revenue):
    - Design re-engagement campaigns for 12,372 customers to increase purchase frequency
    - Test price-sensitive promotions to encourage more frequent purchases
    - Focus on converting $4.3M segment to mid-value through targeted offers

<img src="../8_Project/2.1_customer_segementation.png" alt="Customer Segementation by LTV" style="width: 70%; height: auto;">