# Churn Modeling using Neural Networks and Hyperparameter Tuning

## Introduction

Business problem: A bank has a problem with customer turn-over rates.

Dataset: We have a log of 10,000 customers with their information such as credit score, gender, age, balance, and a yes/no if they exited the bank or not.

Solution: Utilize Neural Networks and Hyperparameter Tuning methods to develop a churn rate modeler which will predict whether a customer will leave the bank or not.

## Data Visualization

In [156]:
# Importing packages
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

# Importing the dataset
dataset = pd.read_csv('Churn_Modelling.csv')

In [157]:
#Visualizing first few rows
dataset.head()

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


In [158]:
# Getting relevant features for correlation map
corr_map = dataset.iloc[:, 3:].corr()
corr_map[['Exited']].sort_values(by='Exited', ascending=False)

Unnamed: 0,Exited
Exited,1.0
Age,0.285323
Balance,0.118533
EstimatedSalary,0.012097
HasCrCard,-0.007138
Tenure,-0.014001
CreditScore,-0.027094
NumOfProducts,-0.04782
IsActiveMember,-0.156128


## Data Preprocessing

First, we import our 10,000 line csv file into a 'dataset' variable. We then split the feature variables into an 'X' variable and whether they exited or not in a 'y' variable. We skip the first 3 columns for the 'X' dataset because row number, customer id, and surname is not relevant for this study.

In [160]:
# Importing the dataset
dataset = pd.read_csv('Churn_Modelling.csv')
X = dataset.iloc[:, 3:13].values
y = dataset.iloc[:, 13].values

Here, we check on the first row of our 'X' dataset.

In [161]:
# Visualizing 'X'
X[0]

array([619, 'France', 'Female', 42, 2, 0.0, 1, 1, 1, 101348.88], dtype=object)

We have 2 categorical variables that we need to change to numbers. We use scikit-learn's encoders to change it into numbers.

In [162]:
# Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X_1 = LabelEncoder()
X[:, 1] = labelencoder_X_1.fit_transform(X[:, 1])
labelencoder_X_2 = LabelEncoder()
X[:, 2] = labelencoder_X_2.fit_transform(X[:, 2])
onehotencoder = OneHotEncoder(categorical_features = [1])
X = onehotencoder.fit_transform(X).toarray()
X = X[:, 1:]

In [163]:
# After categorical data encoding
X[0]

array([  0.00000000e+00,   0.00000000e+00,   6.19000000e+02,
         0.00000000e+00,   4.20000000e+01,   2.00000000e+00,
         0.00000000e+00,   1.00000000e+00,   1.00000000e+00,
         1.00000000e+00,   1.01348880e+05])

Now we split the datasets into traning and test set, 80% of our data will be used to train our model and 20% will be used to test our model.

In [164]:
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

Since we will be using intensive neural networks, we will scale our features to make it easier on our deep learning processing stages.

In [165]:
# First row of training set before feature scaling
X_train[0]

array([  0.00000000e+00,   1.00000000e+00,   6.67000000e+02,
         0.00000000e+00,   3.40000000e+01,   5.00000000e+00,
         0.00000000e+00,   2.00000000e+00,   1.00000000e+00,
         0.00000000e+00,   1.63830640e+05])

In [166]:
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

In [167]:
# First row of training set after feature scaling
X_train[0]

array([-0.5698444 ,  1.74309049,  0.16958176, -1.09168714, -0.46460796,
        0.00666099, -1.21571749,  0.8095029 ,  0.64259497, -1.03227043,
        1.10643166])

## Developing the Neural Network

We start development of our neural network with TensorFlow and Keras. The Sequential package will be used to initalize our neural network. The Dense package will be used to layer our neural network.

In [177]:
# Importing TensorFlow and the Keras libraries with packages
import tensorflow
import keras
from keras.models import Sequential
from keras.layers import Dense

Time to initialize our neural network!

In [178]:
# Initialising the Neural Network
classifier = Sequential()

We add our first layer which takes 11 inputs and has 6 neurons. Generally, we pick a neuron count which is averaged between input (11) and output (1) counts. We use a rectifier(general) activation function for our middle layers and a sigmoid (probablity) function for our final output.

In [179]:
# Adding the input layer and the first hidden layer
classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu', input_dim = 11))

# Adding the second hidden layer
classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu'))

# Adding the output layer
classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))

To compile our neural network, we use an 'Adam' optimizer, an excellent algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. We use a log loss function, binary crossentropy. since it works well with our sigmoid function. Our metrics will be accuracy since that is what we are most concerned about for our model.

In [180]:
# Compiling the Neural Network
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

To fit our neural network to our training set, we use a batch size of 10 and 100 epochs.

In [None]:
# Fitting the Neural Network to the Training set
classifier.fit(X_train, y_train, batch_size = 10, epochs = 100)

## Making Predictions and Evaluating the Model

Since we are predicting churn rate, we will use above 50% to be 'True'. We should use a higher threshold for more sensitive studies such as medical ones. After, we print a confusion matrix to see our results.

In [182]:
# Predicting the Test set results
y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.5)

# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)

[[1529   66]
 [ 248  157]]


In [183]:
# Visualizing confusion matrix
pos, neg = cm
tp, fp = pos
fn, tn = neg
print('True Positives:', tp)
print('True Negatives:', tn)
print('False Positives:', fp)
print('False Negatives:', fn)

True Positives: 1529
True Negatives: 157
False Positives: 66
False Negatives: 248


In [184]:
# Accuracy rate of confusion matrix
print('Accuracy:', (tp+tn)/(tp+tn+fp+fn))

Accuracy: 0.843


### Test Neural Network Model

Here we test our neural network with the following data:
* Geograph: France
* Credit score: 650
* Gender: Female
* Age: 20 years old
* Tenure: 2 years
* Balance: 27000 dollars
* Number of products: 2
* Credit card: Yes
* Active member: Yes
* Estimated salary: 50000 dollars

In [20]:
# Making prediction for given values
new_prediction = classifier.predict(
    sc.transform(np.array(
        [[0.0, 0, 650, 0, 20, 2, 27000, 2, 1, 1, 50000]])))
new_prediction_tf = (new_prediction > 0.5)
print('Customer will leave:', new_prediction_tf, new_prediction)

Customer will leave: [[False]] [[ 0.07463048]]


## Evaluating and Tuning the Neural Network

### Evaluating the Neural Network

What is new now is that we will use k-fold cross validation to improve our test set accuracy. To do this, we use scikit-learn's cross val score. We also use keras.wrappers to wrap scikit learn around keras.

In [172]:
# Importing packages and building new classifer with cross validation
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score
from keras.models import Sequential
from keras.layers import Dense
def build_classifier():
    classifier = Sequential()
    classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu', input_dim = 11))
    classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu'))
    classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))
    classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
    return classifier

In [None]:
# Fitting Neural Network to Training set and getting accuracies
classifier = KerasClassifier(build_fn = build_classifier, batch_size = 10, epochs = 100)
accuracies = cross_val_score(estimator = classifier, X = X_train, y = y_train, cv = 10)

Our performance here merely improved by a fraction of a percent. Time to tune our neural network even further.

In [174]:
mean = accuracies.mean()
variance = accuracies.std()
print('Average: ', ('%.4f' % mean))
print('Standard Deviation: ', ('%.4f' % variance))

Average:  0.8350
Standard Deviation:  0.0119


### Tuning the Neural Network

To finally tune our neural network, we use grid search to tune our hyper parameters such as batch size and epochs.

In [146]:
# Importing packages and building new classifer with grid search
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV
from keras.models import Sequential
from keras.layers import Dense

In [147]:
# Building our Neural Network with empty optimizer, batch size, and epochs
def build_classifier(optimizer):
    classifier = Sequential()
    classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu', input_dim = 11))
    classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu'))
    classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))
    classifier.compile(optimizer = optimizer, loss = 'binary_crossentropy', metrics = ['accuracy'])
    return classifier
classifier = KerasClassifier(build_fn = build_classifier)

In [152]:
# Parameters for batch size, epochs, and optimizer functions
parameters = {'batch_size': [16, 32],
              'epochs': [100, 500],
              'optimizer': ['adam', 'rmsprop']}

In [None]:
# Setting up Grid Search
grid_search = GridSearchCV(estimator = classifier,
                           param_grid = parameters,
                           scoring = 'accuracy',
                           cv = 10)
grid_search = grid_search.fit(X_train, y_train)
best_parameters = grid_search.best_params_
best_accuracy = grid_search.best_score_

In [154]:
# Getting our best parameters
best_parameters

{'batch_size': 32, 'epochs': 500, 'optimizer': 'rmsprop'}

In [175]:
# Getting our best average
print('Best average: ', ('%.4f' % best_accuracy))

Best average:  0.8511


## Conclusion

Our original model scored a 84.15%. With hyper parameter tuning, the average was increased by a percent, up to 85.11%.

Possible improvements could be increasing the batch size to 64 and increasing the epoch to 1000 to see if there could be better accuracy rates. However, there would be a need of a GPU for this application. Just this notebook alone took 10 hours to run on an i5-6400 CPU.