# Telco Customer Churn

https://www.kaggle.com/code/janiobachmann/bank-marketing-campaign-opening-a-term-deposit

https://www.kaggle.com/code/bandiatindra/telecom-churn-prediction/notebook

In [2]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [3]:
df_raw = pd.read_csv('./DATA/Telco-Customer-Churn.csv')
df_raw.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


In [4]:
df_raw.shape

(7043, 21)

In [None]:
df_raw.columns

In [None]:
## dtypes, missing values of all fields
## remove missing values
## get dummies for categorical
## binarise target
## correlation
## class distribution
## pie chart distribution of object fields by churn yes vs churn no
## kde plot for numeric / int fields by churn yes vs no

In [None]:
df_raw.dtypes

In [None]:
df_stg = df_raw.copy()

In [None]:
df_stg.TotalCharges = pd.to_numeric(df_stg.TotalCharges, errors='coerce')

In [None]:
df_stg.isnull().sum()

In [None]:
df_stg = df_stg.dropna()

In [None]:
df_stg = df_stg[['gender', 'SeniorCitizen', 'Partner', 'Dependents',
       'tenure', 'PhoneService', 'MultipleLines', 'InternetService',
       'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport',
       'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling',
       'PaymentMethod', 'MonthlyCharges', 'TotalCharges', 'Churn']]

In [None]:
df_stg_bin['Churn'] = df_stg_bin['Churn'].replace(to_replace='Yes', value=1)
df_stg_bin['Churn'] =  df_stg_bin['Churn'].replace(to_replace='No',  value=0)
df_stg_bin = pd.get_dummies(df_stg)
df_stg_bin.head()

In [None]:
df_temp = df_stg["Churn"].value_counts().to_frame().reset_index()
plt.suptitle('Class balance', fontsize=20)

#define Seaborn color palette to use
colors = sns.color_palette('pastel')[0:5]

#create pie chart
plt.pie(df_temp["Churn"], labels = df_temp["index"], colors = colors, autopct='%.0f%%')
plt.show()

In [None]:
sns.set(rc = {'figure.figsize':(15,8)})
df_corrs_x_y = df_stg_bin.corr()['Churn'].sort_values(ascending = False).to_frame().reset_index().rename(columns={"index": "Factor", "Churn": "Corr with Churn"})
ax = sns.barplot(x="Corr with Churn", y="Factor", data=df_corrs_x_y)

Customers who churn tend to have monthly contracts, and also do not have online security or tech support.

Those who are retained tend to have longer tenures and are on two year contracts.

In [None]:
df_stg[['tenure','MonthlyCharges','TotalCharges']].describe()

Numeric factors have different ranges, and will need to be scaled

In [None]:
sns.kdeplot(data=df_stg, x="tenure", hue="Churn")

In [None]:
sns.kdeplot(data=df_stg, x="MonthlyCharges", hue="Churn")

In [None]:
sns.kdeplot(data=df_stg, x="TotalCharges", hue="Churn")

From the above, we see that those who don't churn tend to have longer tenures. Furthermore, of the customers who churn, we see more cases of higher monthly charges.

In [None]:
df_stg.dtypes

In [None]:
f, ax = plt.subplots(1,2, figsize=(16,8))

colors = ["#FA5858", "#64FE2E"]
labels = "Not senior" ,"Senior"

plt.suptitle('Churn vs Senior Citizens', fontsize=20)

df_stg[df_stg["Churn"]=='Yes']["SeniorCitizen"].value_counts().plot.pie(explode=[0,0.25], autopct='%1.2f%%', ax=ax[0], shadow=True, colors=colors, 
                                             labels=labels, fontsize=12, startangle=25)


# ax[0].set_title('State of Loan', fontsize=16)
ax[0].set_ylabel('Ratio of Senior Citizens for Churn = Yes', fontsize=14)




df_stg[df_stg["Churn"]=='No']["SeniorCitizen"].value_counts().plot.pie(explode=[0,0.25], autopct='%1.2f%%', ax=ax[1], shadow=True, colors=colors, 
                                             labels=labels, fontsize=12, startangle=25)


# ax[0].set_title('State of Loan', fontsize=16)
ax[1].set_ylabel('Ratio of Senior Citizens for Churn = No', fontsize=14)

The proportion of senior citizens in the sample of churned customers is about double that of retained customers.

In [None]:
df_stg["Dependents"].value_counts()

In [None]:
f, ax = plt.subplots(1,2, figsize=(16,8))

colors = ["#FA5858", "#64FE2E"]
labels = "No partner" ,"Has partner"

plt.suptitle('Churn vs Partner', fontsize=20)

df_stg[df_stg["Churn"]=='Yes']["Partner"].value_counts().plot.pie(explode=[0,0.25], autopct='%1.2f%%', ax=ax[0], shadow=True, colors=colors, 
                                             labels=labels, fontsize=12, startangle=25)


# ax[0].set_title('State of Loan', fontsize=16)
ax[0].set_ylabel('Ratio of Partner for Churn = Yes', fontsize=14)




df_stg[df_stg["Churn"]=='No']["Partner"].value_counts().plot.pie(explode=[0,0.25], autopct='%1.2f%%', ax=ax[1], shadow=True, colors=colors, 
                                             labels=labels, fontsize=12, startangle=25)


# ax[0].set_title('State of Loan', fontsize=16)
ax[1].set_ylabel('Ratio of Partner for Churn = No', fontsize=14)

In [None]:
f, ax = plt.subplots(1,2, figsize=(16,8))

colors = ["#FA5858", "#64FE2E"]
labels = "No Dependents" ,"Has Dependents"

plt.suptitle('Churn vs Dependents', fontsize=20)

df_stg[df_stg["Churn"]=='Yes']["Dependents"].value_counts().plot.pie(explode=[0,0.25], autopct='%1.2f%%', ax=ax[0], shadow=True, colors=colors, 
                                             labels=labels, fontsize=12, startangle=85)


# ax[0].set_title('State of Loan', fontsize=16)
ax[0].set_ylabel('Ratio of Dependents for Churn = Yes', fontsize=14)




df_stg[df_stg["Churn"]=='No']["Dependents"].value_counts().plot.pie(explode=[0,0.25], autopct='%1.2f%%', ax=ax[1], shadow=True, colors=colors, 
                                             labels=labels, fontsize=12, startangle=25)


# ax[0].set_title('State of Loan', fontsize=16)
ax[1].set_ylabel('Ratio of Dependents for Churn = No', fontsize=14)