# Artificial Neural Network for Customer’s Churn Prediction


Pooja Umathe


Churn’s prediction could be a great asset in the business strategy for retention applying before the exit of customers and We will create a real model with python, applied on a bank environment. This model will tell us if the customer is going or not to exit from the bank. Churn rate (sometimes called attrition rate), in its broadest sense, is a measure of the number of individuals or items moving out of a collective group over a specific period.

# What are neural networks?
Neural networks, commonly known as Artificial Neural Networks (ANN) are quite a simulation of human brain functionality in machine learning (ML) problems. ANNs shall be noted not as a solution for all the problems that arise, but would provide better results with many other techniques altogether for various ML tasks. Most common use of ANNs are clustering and classification, which can be used for regression tasks as well.

# Let's begin
So, In our dataset we would be dealing with Churn Modeling i.e. we would be writing a Artificial Neural Network to find out reasons as to why and which customers are actually leaving the bank and their dependencies on one another. This is a classification problem 0-1 classification(1 if customer Leaves and 0 if customer stays).

In [42]:
#Let’s import our python libraries and our dataset creating X for our independent variable and y for our dependent variable
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd


In [43]:
# Importing the dataset
dataset = pd.read_csv('Bank_Churn.csv')
dataset.head(5)

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


In [44]:
#Looking at the features we can see that row no.,name will have no relation with a customer with leaving the bank.
X = dataset.iloc[:, 3:13].values
y = dataset.iloc[:, 13].values

In [45]:
X

array([[619, 'France', 'Female', ..., 1, 1, 101348.88],
       [608, 'Spain', 'Female', ..., 0, 1, 112542.58],
       [502, 'France', 'Female', ..., 1, 0, 113931.57],
       ...,
       [709, 'France', 'Female', ..., 0, 1, 42085.58],
       [772, 'Germany', 'Male', ..., 1, 0, 92888.52],
       [792, 'France', 'Female', ..., 1, 0, 38190.78]], dtype=object)

Once we have quality data and selected the right target, we will prepare the data for the model. In general we do not need big data to create a model, but if our variables are significant, modelling with hundreds of data could work. We need always to test test our models to check if everything works correctly. Let’s say for our example to work with 10.000 rows dataset We will split our entire dataset in 2 parts. The bigger part, that will be 80% of data, will be used for the training of the model, while the remaining 20% will be used to test the model and have its accuracy.

But our model needs in input numerical data, so, we need to encode categorical data into numerical data. In This case we have Geography (France, Spain and Germany) and Gender (Male and Female). For Geography we will have 0,1,2 instead of France, Spain and Gemany and 0,1 instead of Gender.

In [46]:
# Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X_1 = LabelEncoder()
X[:, 1] = labelencoder_X_1.fit_transform(X[:, 1])
labelencoder_X_2 = LabelEncoder()
X[:, 2] = labelencoder_X_2.fit_transform(X[:, 2])

In [47]:
# Now creating Dummy variables
onehotencoder = OneHotEncoder(categorical_features = [1])
X = onehotencoder.fit_transform(X).toarray()
X = X[:, 1:]

In [48]:
X

array([[0.0000000e+00, 0.0000000e+00, 6.1900000e+02, ..., 1.0000000e+00,
        1.0000000e+00, 1.0134888e+05],
       [0.0000000e+00, 1.0000000e+00, 6.0800000e+02, ..., 0.0000000e+00,
        1.0000000e+00, 1.1254258e+05],
       [0.0000000e+00, 0.0000000e+00, 5.0200000e+02, ..., 1.0000000e+00,
        0.0000000e+00, 1.1393157e+05],
       ...,
       [0.0000000e+00, 0.0000000e+00, 7.0900000e+02, ..., 0.0000000e+00,
        1.0000000e+00, 4.2085580e+04],
       [1.0000000e+00, 0.0000000e+00, 7.7200000e+02, ..., 1.0000000e+00,
        0.0000000e+00, 9.2888520e+04],
       [0.0000000e+00, 0.0000000e+00, 7.9200000e+02, ..., 1.0000000e+00,
        0.0000000e+00, 3.8190780e+04]])

In [49]:
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)


Before training it is important to apply feature scaling to data. It is an important step because it standardize the data and give them the same scale also for a faster computation.

In [50]:
#Feature scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

Now we can start to create our Artificial Neural Network and for doing this we will need to import the Keras library

Listing out the steps involved in training the ANN with Stochastic Gradient Descent:-

1)Randomly initialize the weights to small numbers close to 0(But not 0)
2)Input the 1st observation of your dataset in the Input Layer, each Feature in one Input Node
3)Forward-Propagation from Left to Right, the neurons are activated in a way that the impact of each neuron’s activation
is limited by the weights.Propagate the activations until getting the predicted result y.
4)Compare the predicted result with the actual result. Measure the generated error.
5)Back-Propagation: From Right to Left, Error is back propagated.Update the weights according to how much they are
responsible for the error.The Learning Rate tells us by how much such we update the weights.
6)Repeat Steps 1 to 5 and update the weights after each observation(Reinforcement Learning).
Or: Repeat Steps 1 to 5 but update the weights only after a batch of observations(Batch Learning) 
7)When the whole training set is passed through the ANN.That completes an Epoch. Redo more Epochs.

In [51]:
# Importing the Keras libraries and packages
import keras
from keras.models import Sequential
from keras.layers import Dense

We need now to define the number of hidden layers, the number of nodes for each hidden layers. We know that our input layers will have as many nodes as our independent variables (in our example we have 11 independent variables) and as output layers we will have our y, so 1 dependent variable.

In [52]:
# Initialising the ANN
classifier = Sequential()

We will use the Dense function A tip for the definition of the number of nodes of hidden layers, based on experiments, is to choose the average between the input and the output layers. In this case we will have 6 nodes (11+1)/2.

In [53]:
# Adding the input layer and the first hidden layer
classifier.add(Dense(output_dim = 6, init = 'uniform', activation = 'relu', input_dim = 11))

# Adding the second hidden layer
classifier.add(Dense(output_dim = 6, init = 'uniform', activation = 'relu'))

# Adding the output layer
classifier.add(Dense(output_dim = 1, init = 'uniform', activation = 'sigmoid'))

# Compiling the ANN
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

  
  """
  


In [54]:
# Fitting the ANN to the Training set
classifier.fit(X_train, y_train, batch_size = 10, nb_epoch = 100)

  


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.History at 0xa0e8d42ba8>

In [55]:
# Making the predictions and evaluating the model
y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.5)


In [56]:
# Creating the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

In [57]:
cm

array([[1547,   48],
       [ 261,  144]], dtype=int64)

In [58]:
Accuracy = (1547+144)/(48+261+1547+144)
Accuracy

0.8455

The Accuracy of the Neural Network Model is 84.5% which is decent enough.