# Scoping

In [1]:
import os
from io import BytesIO, StringIO
from pathlib import Path

import boto3
import botocore.exceptions
import pandas as pd
from dotenv import load_dotenv

In [2]:
PROJ_ROOT = Path.cwd().parent

In [3]:
assert load_dotenv(dotenv_path=PROJ_ROOT.parent / '.env')

## About

Project scoping using the full random sample of data for ~10,000 customers from McMaster Bank.

## User Inputs

In [4]:
# R2 data bucket details
bucket_name = 'cc-churn-splits'

# columns to load for project scoping tasks
columns = [
    'clientnum',
    'card_category',
    'total_revolv_bal',
    'total_trans_amt',
    'is_churned',
]

# costs
# # revenue from transactions (bank earns #% of transaction volume)
interchange_rate = 0.02
# # revenue from revolving balance (~20% interest)
apr = 0.18
# # fee revenue from credit card exposure (modeled from card type)
card_fees = {"Blue": 0, "Silver": 50, "Gold": 100, "Platinum": 200}
tenure_years = 3
discount = 0.9
# # percentage of churners who can be convinced to stay (i.e. success rate
# # of saving a churning customer)
success_rate = 0.40
# # cost of intervention to get a single customer to not churn (discounts,
# # call center time, retention offers, etc.)
intervention_cost = 50
# # cost of acquiring a new customer (Customer Acquisition Cost, CAC)
replacement_cost = 200

In [5]:
account_id = os.getenv('ACCOUNT_ID')
access_key_id = os.getenv('ACCESS_KEY_ID')
secret_access_key = os.getenv('SECRET_ACCESS_KEY')

s3_client = boto3.client(
    's3',
    endpoint_url=f'https://{account_id}.r2.cloudflarestorage.com',
    aws_access_key_id=access_key_id,
    aws_secret_access_key=secret_access_key,
    region_name='auto'
)

multiplier = (1 - discount**tenure_years) / (1 - discount)

## Load Data

In [None]:
%%time
prefix = ''
dfs = []
response = s3_client.list_objects_v2(Bucket=bucket_name, Prefix=prefix)
for obj in response.get('Contents', []):
    if (
        not obj['Key'].endswith('/')
        and '/' not in obj['Key'][len(prefix):]
        and obj['Key'].endswith('_data.parquet.gzip')
    ):
        obj = s3_client.get_object(Bucket=bucket_name, Key=obj['Key'])
        df = pd.read_parquet(BytesIO(obj['Body'].read()), columns=columns)
        dfs.append(df)
df = pd.concat(dfs, ignore_index=True).astype({"is_churned": 'bool[pyarrow]'})
df

## Calculate Impact of Churn

Calculate customer lifetime value (CLV)

In [None]:
%%time
df = df.assign(
    interchange_rev=lambda df: df["total_trans_amt"] * interchange_rate,
    interest_rev=lambda df: df["total_revolv_bal"] * apr,
    fee_rev=lambda df: (
        df["card_category"].map(card_fees).astype('int16[pyarrow]')
    ),
    annual_rev=lambda df: (
        df["interchange_rev"] + df["interest_rev"] + df["fee_rev"]
    ),
    clv=lambda df: df["annual_rev"] * multiplier,
    intervention_cost=intervention_cost,
)
df

The class imbalance is shown below

In [8]:
df['is_churned'].value_counts(normalize=True).mul(100).reset_index()

Unnamed: 0,is_churned,proportion
0,False,83.934038
1,True,16.065962


**Observaitons**

1. Credit card churn is observed in 16% of customers. This is
   - **just below** the [industry average churn rate for the financial sector of 19%](https://billingplatform.com/blog/average-churn-rate-by-industry)
   - below the [standard for high credit card churn of 25-30%](https://uxpressia.com/blog/how-to-approach-customer-churn-measurement-in-banking)

The total number of customers and total Customer Lifetime Value (CLV) are shown below for churned and existing customers

In [9]:
%%time
df_attrition_stats = (
    df
    .groupby('is_churned')
    .agg({'clv': ['sum', 'count']})
    .set_axis(['clv_total', 'num_customers'], axis=1)
    .reset_index()
    .assign(
        clf_fraction=lambda df: (
            (df['clv_total']/df['clv_total'].sum()).mul(100)
        ),
        clv_per_customer=lambda df: df['clv_total']/df['num_customers'],
    )
)
df_attrition_stats

CPU times: user 6.42 ms, sys: 40 μs, total: 6.46 ms
Wall time: 5.54 ms


Unnamed: 0,is_churned,clv_total,num_customers,clf_fraction,clv_per_customer
0,False,7452624.118,8500,90.01784,876.779308
1,True,826428.2668,1627,9.98216,507.946077


One common way to calculate the customer churn impact is to calculate the lost CLV per churned customer and add the replacement cost (CAC). The average CAC for a credit card customer is approximately [167](https://firstpagesage.com/seo-blog/average-customer-acquisition-cost-cac-in-banking/) USD or ~200 CAD. The total impact of churn is estimated below for sample of data provided by McMaster Bank

In [10]:
(
    df_attrition_stats
    .reset_index(drop=True)
    .assign(
        impact_per_customer=lambda df: df['clv_per_customer']+replacement_cost,
        impact_total=lambda df: (
            df['clv_total']+df['num_customers'].mul(replacement_cost)
        ),
    )
)

Unnamed: 0,is_churned,clv_total,num_customers,clf_fraction,clv_per_customer,impact_per_customer,impact_total
0,False,7452624.118,8500,90.01784,876.779308,1076.779308,9152624.118
1,True,826428.2668,1627,9.98216,507.946077,707.946077,1151828.2668


**Observations**

1. Approximately 10% of customer lifetime value is lost due to the 16% churn rate over the past 12 months in the random sample of customer data. The impact to the client (i.e. to the bank's credit card division) is a loss of approximately 508 dollars of customer lifetime value (per customer). When the cost to acquire a new customer is taken into account this adds up to a loss of 708 dollars per customer, or 1,151,828 dollars overall, due to churn.

The CLV of retained customers is shown below

In [11]:
df_attrition_stats.query("is_churned == False").reset_index(drop=True)

Unnamed: 0,is_churned,clv_total,num_customers,clf_fraction,clv_per_customer
0,False,7452624.118,8500,90.01784,876.779308


**Observations**

1. The CLV is **just above** the [industry standard for credit card customers of $808](https://focus-digital.co/average-customer-lifetime-value-for-financial-services/).