# Project 4: Churn Risk Prioritization & Retention Strategy

## Objective
Identify high-risk, high-value customers and prioritize retention actions by
combining churn probability, customer lifetime value (CLTV), and customer behavior.

## Business Question
Given limited retention resources, which customers should be contacted first
to maximize revenue impact?

In [1]:
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

In [2]:
df = pd.read_excel("data/raw/Telco_customer_churn.xlsx")
df.shape

(7043, 33)

## Step 1: Prepare Features and Target Variable

In [3]:
# Target variable
y = df["Churn Label"].map({"Yes": 1, "No": 0})

# Drop non-feature columns
drop_cols = ["CustomerID", "Churn Label", "Churn Reason"]
X = df.drop(columns=drop_cols)

# Encode categorical features
X_encoded = pd.get_dummies(X, drop_first=True)

X_encoded.shape

(7043, 9345)

In [4]:
X_train, X_test, y_train, y_test = train_test_split(
    X_encoded, y, test_size=0.2, random_state=42, stratify=y
)

model = LogisticRegression(max_iter=3000, solver="liblinear")
model.fit(X_train, y_train)

## Step 2: Generate Churn Risk Scores

In [5]:
df["churn_probability"] = model.predict_proba(X_encoded)[:, 1]

df[["CustomerID", "churn_probability"]].head()

Unnamed: 0,CustomerID,churn_probability
0,3668-QPYBK,0.969117
1,9237-HQITU,0.611317
2,9305-CDSKC,0.989892
3,7892-POOKP,0.963655
4,0280-XJGEX,0.953838


In [6]:
df["churn_probability"].describe()

count    7.043000e+03
mean     2.672212e-01
std      3.643602e-01
min      8.820890e-08
25%      6.740592e-04
50%      2.916203e-02
75%      5.343618e-01
max      9.992020e-01
Name: churn_probability, dtype: float64

In [7]:
df["retention_priority"] = df["churn_probability"] * df["CLTV"]

df[["CustomerID", "churn_probability", "CLTV", "retention_priority"]].head()

Unnamed: 0,CustomerID,churn_probability,CLTV,retention_priority
0,3668-QPYBK,0.969117,3239,3138.968501
1,9237-HQITU,0.611317,2701,1651.167329
2,9305-CDSKC,0.989892,5372,5317.700188
3,7892-POOKP,0.963655,5003,4821.164528
4,0280-XJGEX,0.953838,5340,5093.492999


In [8]:
top_risk_customers = (
    df[["CustomerID", "churn_probability", "CLTV", "retention_priority"]]
    .sort_values("retention_priority", ascending=False)
)

top_risk_customers.head(10)

Unnamed: 0,CustomerID,churn_probability,CLTV,retention_priority
488,4143-HHPMK,0.977798,6402,6259.860147
1407,0877-SDMBN,0.987192,6117,6038.653067
27,5299-RULOA,0.989884,5998,5937.324745
709,0431-APWVY,0.992716,5958,5914.604688
1484,4391-RESHN,0.991531,5963,5912.498535
997,1725-IQNIY,0.940644,6274,5901.598061
1566,6496-SLWHQ,0.989848,5960,5899.496649
1318,6653-CBBOM,0.987661,5963,5889.425088
323,5565-FILXA,0.9944,5913,5879.890112
409,0637-YLETY,0.99377,5915,5878.14779


In [9]:
df["retention_tier"] = pd.qcut(
    df["retention_priority"],
    q=[0, 0.5, 0.8, 0.95, 1.0],
    labels=["Low", "Medium", "High", "Critical"]
)

df["retention_tier"].value_counts()

retention_tier
Low         3522
Medium      2112
High        1056
Critical     353
Name: count, dtype: int64