<a target="_blank" href="https://colab.research.google.com/github/lukebarousse/Int_SQL_Data_Analytics_Course/blob/main/8_Project/1.3_Customer_Retention.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# 3️⃣ Retention Analysis (Who Hasn’t Purchased Recently?)

In [1]:
import sys
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

# If running in Google Colab, install PostgreSQL and restore the database
if 'google.colab' in sys.modules:
    # Install PostgreSQL
    !sudo apt-get install postgresql -qq > /dev/null 2>&1

    # Start PostgreSQL service (suppress output)
    !sudo service postgresql start > /dev/null 2>&1

    # Set password for the 'postgres' user to avoid authentication errors (suppress output)
    !sudo -u postgres psql -c "ALTER USER postgres WITH PASSWORD 'password';" > /dev/null 2>&1

    # Create the 'colab_db' database (suppress output)
    !sudo -u postgres psql -c "CREATE DATABASE contoso_100k;" > /dev/null 2>&1

    # Download the PostgreSQL .sql dump
    !wget -q -O contoso_100k.sql https://github.com/lukebarousse/Int_SQL_Data_Analytics_Course/releases/download/v.0.0.0/contoso_100k.sql

    # Restore the dump file into the PostgreSQL database (suppress output)
    !sudo -u postgres psql contoso_100k < contoso_100k.sql > /dev/null 2>&1

    # Shift libraries from ipython-sql to jupysql
    !pip uninstall -y ipython-sql > /dev/null 2>&1
    !pip install jupysql > /dev/null 2>&1

# Load the sql extension for SQL magic
%load_ext sql

# Connect to the PostgreSQL database
%sql postgresql://postgres:password@localhost:5432/contoso_100k

# Enable automatic conversion of SQL results to pandas DataFrames
%config SqlMagic.autopandas = True

# Disable named parameters for SQL magic
%config SqlMagic.named_parameters = "disabled"

# Display pandas number to two decimal places
pd.options.display.float_format = '{:.2f}'.format

## Background

You're a **data analyst at an e-commerce company**. Your stakeholders on marketing & finance teams need insights to improve customer retention and maximize revenue. They have three key questions:

1️⃣ **Who are our most valuable customers?** (Customer Segmentation)

2️⃣ **How do different customer groups generate long-term revenue?** (Cohort-Based LTV) 

3️⃣ **Which customers haven’t purchased recently?** (Retention Analysis)

Your job is to create a structured analysis using SQL that answers these questions and provides actionable insights for the business.

## Analysis

#### Overview

- Identify customers at risk of churning.
- Use `ROW_NUMBER()` to track last purchase while capturing revenue insights.

💼 **Example Use Cases:** Optimizes acquisition and retention strategies
- Focus marketing budget on channels producing highest-LTV customers
- Set appropriate customer acquisition costs based on expected lifetime value
- Develop targeted retention programs for highest-potential segments
- Forecast revenue more accurately using cohort performance patterns

#### Query Steps

In [6]:
%%sql

    SELECT
        MAX(orderdate)
    FROM sales

Unnamed: 0,max
0,2024-04-20


In [8]:
%%sql

WITH customer_last_purchase AS (
    SELECT
        customerkey,
        orderdate AS last_purchase_date,
        quantity * netprice * COALESCE(exchangerate, 1) AS last_net_revenue,
        ROW_NUMBER() OVER (PARTITION BY customerkey ORDER BY orderdate DESC) AS rn
    FROM sales
)
SELECT
    clp.customerkey,
    clp.last_purchase_date,
    clp.last_net_revenue,
    CASE
        WHEN clp.last_purchase_date < '2024-04-20'::date - INTERVAL '6 months' THEN 'Churned'
        ELSE 'Active'
    END AS customer_status
FROM customer_last_purchase clp
WHERE clp.rn = 1;

Unnamed: 0,customerkey,last_purchase_date,last_net_revenue,customer_status
0,15,2021-03-08,2217.41,Churned
1,180,2023-08-28,71.36,Churned
2,185,2019-06-01,1395.52,Churned
3,243,2016-05-19,287.67,Churned
4,387,2023-11-16,30.51,Active
...,...,...,...,...
49482,2099619,2020-07-10,544.59,Churned
49483,2099656,2024-02-06,193.56,Active
49484,2099697,2022-09-13,4.74,Churned
49485,2099711,2017-08-14,3940.92,Churned


In [10]:
%%sql

WITH customer_last_purchase AS (
    SELECT
        customerkey,
        orderdate AS last_purchase_date,
        quantity * netprice * COALESCE(exchangerate, 1) AS last_net_revenue,
        ROW_NUMBER() OVER (PARTITION BY customerkey ORDER BY orderdate DESC) AS rn
    FROM sales
),

churned_customers AS (
SELECT
    clp.customerkey,
    clp.last_purchase_date,
    clp.last_net_revenue,
    CASE
        WHEN clp.last_purchase_date < '2024-04-20'::date - INTERVAL '6 months' THEN 'Churned'
        ELSE 'Active'
    END AS customer_status
FROM customer_last_purchase clp
WHERE clp.rn = 1
)

SELECT
    customer_status,
    COUNT(customerkey) AS num_customers
FROM churned_customers
GROUP BY customer_status;


Unnamed: 0,customer_status,num_customers
0,Active,7015
1,Churned,42472


#### 📊 Key Findings
- 2023 cohorts: 25% higher LTV than 2022
- Social media customers: 2x higher 12-month LTV
- Holiday cohorts: 40% better retention