# Telecom Churn Prediction - Modelling

Due to the Telecom market being a highly competitive one, there is always a threat of loosing the existing customers. There is a need to keep the customer from churning since every customer lost is a loss of revenue, at the same time, to make up for the loss, you need to acquire more customers. This is again a dent to the revenue since company needs to spend a significant amount to acquire a new customer and make profit from him.

Preventing customer churn is the necessity. The aim is to correctly predict a customer who is likely to churn and target them with offers relevant. The model should have a higher recall since the company would not want false negatives and end up not targetting those who actually need to be targeted. But it is also imperative that the model would have a higher precision too since there goes a spending towards every customer you are targetting and the company would want to keep the spending as low as possible on the false positives.

Since the target is to keep both Precision and Recall as high as possible, we would look more on the F1 score to compare the models.

In [1]:
#load python packages
import os
import pandas as pd
import datetime
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
from pathlib import Path

%matplotlib inline

In [2]:
data = Path('data/processed_data.csv')
# load the data file into a Data Frame
df = pd.read_csv(data)
# inspect the dataframe
df.head()

Unnamed: 0.1,Unnamed: 0,tenure,MonthlyCharges,TotalCharges,gender_Female,SeniorCitizen_No,Partner_No,Dependents_No,PhoneService_No,MultipleLines_No,...,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaperlessBilling_No,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,0,1,29.85,29.85,1,1,0,1,1,0,...,0,1,0,0,0,0,0,1,0,No
1,1,34,56.95,1889.5,0,1,1,1,0,1,...,0,0,1,0,1,0,0,0,1,No
2,2,2,53.85,108.15,0,1,1,1,0,1,...,0,1,0,0,0,0,0,0,1,Yes
3,3,45,42.3,1840.75,0,1,1,1,1,0,...,0,0,1,0,1,1,0,0,0,No
4,4,2,70.7,151.65,1,1,1,1,0,1,...,0,1,0,0,0,0,0,1,0,Yes


In [3]:
df.drop(columns=['Unnamed: 0'], inplace=True)

In [4]:
df.head()

Unnamed: 0,tenure,MonthlyCharges,TotalCharges,gender_Female,SeniorCitizen_No,Partner_No,Dependents_No,PhoneService_No,MultipleLines_No,MultipleLines_Yes,...,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaperlessBilling_No,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,1,29.85,29.85,1,1,0,1,1,0,0,...,0,1,0,0,0,0,0,1,0,No
1,34,56.95,1889.5,0,1,1,1,0,1,0,...,0,0,1,0,1,0,0,0,1,No
2,2,53.85,108.15,0,1,1,1,0,1,0,...,0,1,0,0,0,0,0,0,1,Yes
3,45,42.3,1840.75,0,1,1,1,1,0,0,...,0,0,1,0,1,1,0,0,0,No
4,2,70.7,151.65,1,1,1,1,0,1,0,...,0,1,0,0,0,0,0,1,0,Yes


In [5]:
# first we import the preprocessing package from the sklearn library
from sklearn import preprocessing

# Declare an explanatory variable, called X,and assign it the result of dropping 'Churn' from the df
X = df.drop(['Churn'], axis=1)

# Declare a response variable, called y, and assign it the Churn column of the df 
y = df.Churn 

# Here we use the StandardScaler() method of the preprocessing package, and then call the fit() method with parameter X 
scaler = preprocessing.StandardScaler().fit(X)

# Declare a variable called X_scaled, and assign it the result of calling the transform() method with parameter X 
X_scaled=scaler.transform(X) 

In [6]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.3, random_state=0)

In [7]:
from sklearn.linear_model import LogisticRegression
clf = LogisticRegression(random_state=0).fit(X_train, y_train)
y_pred = clf.predict(X_test)

from sklearn.metrics import classification_report
target_names = ['Yes', 'No']
print(classification_report(y_test, y_pred, target_names=target_names))

              precision    recall  f1-score   support

         Yes       0.84      0.89      0.87      1555
          No       0.64      0.54      0.58       555

    accuracy                           0.80      2110
   macro avg       0.74      0.71      0.73      2110
weighted avg       0.79      0.80      0.79      2110



In [8]:
from sklearn.metrics import confusion_matrix
confusion_matrix(y_test, y_pred)

array([[1391,  164],
       [ 258,  297]], dtype=int64)

Here we have a very high False negative, so we would look towards other models and see if we can reduce it.

In [9]:
from sklearn import tree
clf = tree.DecisionTreeClassifier().fit(X_train, y_train)
y_pred = clf.predict(X_test)
print(confusion_matrix(y_test, y_pred))
target_names = ['Yes', 'No']
print(classification_report(y_test, y_pred, target_names=target_names))

[[1236  319]
 [ 308  247]]
              precision    recall  f1-score   support

         Yes       0.80      0.79      0.80      1555
          No       0.44      0.45      0.44       555

    accuracy                           0.70      2110
   macro avg       0.62      0.62      0.62      2110
weighted avg       0.70      0.70      0.70      2110



With a decision tree classifier, we have not got any improved performance. This is the case with the default hyperparameters.

In [11]:
from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier().fit(X_train, y_train)
y_pred = clf.predict(X_test)
print(confusion_matrix(y_test, y_pred))

target_names = ['Yes', 'No']
print(classification_report(y_test, y_pred, target_names=target_names))

[[1396  159]
 [ 288  267]]
              precision    recall  f1-score   support

         Yes       0.83      0.90      0.86      1555
          No       0.63      0.48      0.54       555

    accuracy                           0.79      2110
   macro avg       0.73      0.69      0.70      2110
weighted avg       0.78      0.79      0.78      2110



We still have a higher false negatives and so far our logistic regression model has been the best performing one. Lets do a Random Search on Random forest to see if we get any better performance.

In [11]:
from sklearn.model_selection import RandomizedSearchCV
# Number of trees in random forest
n_estimators = [int(x) for x in np.linspace(start = 200, stop = 2000, num = 10)]
# Number of features to consider at every split
max_features = ['auto', 'sqrt']
# Maximum number of levels in tree
max_depth = [int(x) for x in np.linspace(10, 110, num = 11)]
max_depth.append(None)
# Minimum number of samples required to split a node
min_samples_split = [2, 5, 10]
# Minimum number of samples required at each leaf node
min_samples_leaf = [1, 2, 4]
# Method of selecting samples for training each tree
bootstrap = [True, False]
# Create the random grid
random_grid = {'n_estimators': n_estimators,
               'max_features': max_features,
               'max_depth': max_depth,
               'min_samples_split': min_samples_split,
               'min_samples_leaf': min_samples_leaf,
               'bootstrap': bootstrap}
print(random_grid)

{'n_estimators': [200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000], 'max_features': ['auto', 'sqrt'], 'max_depth': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, None], 'min_samples_split': [2, 5, 10], 'min_samples_leaf': [1, 2, 4], 'bootstrap': [True, False]}


In [12]:
# Use the random grid to search for best hyperparameters
# First create the base model to tune
rf = RandomForestClassifier()
# Random search of parameters, using 3 fold cross validation, 
# search across 100 different combinations, and use all available cores
rf_random = RandomizedSearchCV(estimator = rf, param_distributions = random_grid, n_iter = 100, cv = 3, verbose=2, random_state=42, n_jobs = -1)
# Fit the random search model
rf_random.fit(X_train, y_train)

Fitting 3 folds for each of 100 candidates, totalling 300 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:  1.3min
[Parallel(n_jobs=-1)]: Done 154 tasks      | elapsed:  5.8min
[Parallel(n_jobs=-1)]: Done 300 out of 300 | elapsed: 11.5min finished


RandomizedSearchCV(cv=3, estimator=RandomForestClassifier(), n_iter=100,
                   n_jobs=-1,
                   param_distributions={'bootstrap': [True, False],
                                        'max_depth': [10, 20, 30, 40, 50, 60,
                                                      70, 80, 90, 100, 110,
                                                      None],
                                        'max_features': ['auto', 'sqrt'],
                                        'min_samples_leaf': [1, 2, 4],
                                        'min_samples_split': [2, 5, 10],
                                        'n_estimators': [200, 400, 600, 800,
                                                         1000, 1200, 1400, 1600,
                                                         1800, 2000]},
                   random_state=42, verbose=2)

In [13]:
rf_random.best_params_

{'n_estimators': 1800,
 'min_samples_split': 2,
 'min_samples_leaf': 4,
 'max_features': 'auto',
 'max_depth': 10,
 'bootstrap': False}

In [14]:
best_grid = rf_random.best_estimator_
best_grid.fit(X_train, y_train)
y_pred = best_grid.predict(X_test)
print(classification_report(y_test, y_pred, target_names=target_names))

              precision    recall  f1-score   support

         Yes       0.84      0.91      0.87      1555
          No       0.65      0.50      0.57       555

    accuracy                           0.80      2110
   macro avg       0.74      0.70      0.72      2110
weighted avg       0.79      0.80      0.79      2110



Yes we are able to get a higher F1 score along with a higher recall.

In [13]:
from sklearn.svm import SVC

model_svm = SVC(kernel='linear') 
model_svm.fit(X_train,y_train)
y_pred = model_svm.predict(X_test)


print(confusion_matrix(y_test, y_pred))

target_names = ['Yes', 'No']
print(classification_report(y_test, y_pred, target_names=target_names))

[[1398  157]
 [ 260  295]]
              precision    recall  f1-score   support

         Yes       0.84      0.90      0.87      1555
          No       0.65      0.53      0.59       555

    accuracy                           0.80      2110
   macro avg       0.75      0.72      0.73      2110
weighted avg       0.79      0.80      0.80      2110

