## Predicting Bank Customer Churn with an Artificial Neural Network

In this project, we will build an Artificial Neural Network to predict which bank customers are likely to leave the bank.

In [1]:
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [2]:
# Importing the dataset
dataset = pd.read_csv('Churn_Modelling.csv')

### Data Prep

In [3]:
dataset.head(10)

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0
5,6,15574012,Chu,645,Spain,Male,44,8,113755.78,2,1,0,149756.71,1
6,7,15592531,Bartlett,822,France,Male,50,7,0.0,2,1,1,10062.8,0
7,8,15656148,Obinna,376,Germany,Female,29,4,115046.74,4,1,0,119346.88,1
8,9,15792365,He,501,France,Male,44,4,142051.07,2,0,1,74940.5,0
9,10,15592389,H?,684,France,Male,27,2,134603.88,1,1,1,71725.73,0


In [4]:
dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
RowNumber          10000 non-null int64
CustomerId         10000 non-null int64
Surname            10000 non-null object
CreditScore        10000 non-null int64
Geography          10000 non-null object
Gender             10000 non-null object
Age                10000 non-null int64
Tenure             10000 non-null int64
Balance            10000 non-null float64
NumOfProducts      10000 non-null int64
HasCrCard          10000 non-null int64
IsActiveMember     10000 non-null int64
EstimatedSalary    10000 non-null float64
Exited             10000 non-null int64
dtypes: float64(2), int64(9), object(3)
memory usage: 1.1+ MB


In [5]:
dataset.describe()

Unnamed: 0,RowNumber,CustomerId,CreditScore,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
count,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0
mean,5000.5,15690940.0,650.5288,38.9218,5.0128,76485.889288,1.5302,0.7055,0.5151,100090.239881,0.2037
std,2886.89568,71936.19,96.653299,10.487806,2.892174,62397.405202,0.581654,0.45584,0.499797,57510.492818,0.402769
min,1.0,15565700.0,350.0,18.0,0.0,0.0,1.0,0.0,0.0,11.58,0.0
25%,2500.75,15628530.0,584.0,32.0,3.0,0.0,1.0,0.0,0.0,51002.11,0.0
50%,5000.5,15690740.0,652.0,37.0,5.0,97198.54,1.0,1.0,1.0,100193.915,0.0
75%,7500.25,15753230.0,718.0,44.0,7.0,127644.24,2.0,1.0,1.0,149388.2475,0.0
max,10000.0,15815690.0,850.0,92.0,10.0,250898.09,4.0,1.0,1.0,199992.48,1.0


Okay... we have a dataset with 10,000 bank customers and we are trying to predict which customers are at risk of leaving the bank given historical information. Our dataset has 14 different features, so let's check them out:
 - **RowNumber** - Index of the dataset
 - **CustomerID** - Unique customer ID number
 - **Surname** - Customer's surname
 - **CreditScore** - Customer's credit score
 - **Geography** - Where is the customer located?
 - **Gender** - What is the customer's gender?
 - **Age** - Customer's age
 - **Tenure** - How long the customer has been with the bank
 - **Balance** - How much money does the customer have in their bank account?
 - **NumOfProducts** - How many bank products has the customer signed up for?
 - **HasCrCard** - Whether or not the customer has a credit card
 - **IsActiveMember** - If the customer is active or not.
 - **EstimatedSalary** - Estimation of customer's salary
 - **Exited** - Whether or not the customer is still with the bank.

In [8]:
# To predict, we will not use RowNumber, CustomerID, or Surname, or Exited (predicting this feature)
X = dataset.iloc[:,3:13].values

In [9]:
# Set our prediction feature
y = dataset['Exited']

In [10]:
X

array([[619, 'France', 'Female', ..., 1, 1, 101348.88],
       [608, 'Spain', 'Female', ..., 0, 1, 112542.58],
       [502, 'France', 'Female', ..., 1, 0, 113931.57],
       ..., 
       [709, 'France', 'Female', ..., 0, 1, 42085.58],
       [772, 'Germany', 'Male', ..., 1, 0, 92888.52],
       [792, 'France', 'Female', ..., 1, 0, 38190.78]], dtype=object)

Now we need to convert our categorical features into numerical values. The only features we need to change are **Geography** and **Gender**. Geography only consists of *France, Spain*, and *Germany*, and Gender only consists of *Male* and *Female*.

In [11]:
# Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder

In [12]:
labelencoder_X_1 = LabelEncoder()

In [14]:
# Index at 1 is Geography
# Call .fit_transform of our Geography category
X[:,1] = labelencoder_X_1.fit_transform(X[:,1])

In [15]:
# And do the same for the Gender feature
labelencoder_X_2 = LabelEncoder()
X[:,2] = labelencoder_X_2.fit_transform(X[:,2])

In [17]:
X

array([[619, 0, 0, ..., 1, 1, 101348.88],
       [608, 2, 0, ..., 0, 1, 112542.58],
       [502, 0, 0, ..., 1, 0, 113931.57],
       ..., 
       [709, 0, 0, ..., 0, 1, 42085.58],
       [772, 1, 1, ..., 1, 0, 92888.52],
       [792, 0, 0, ..., 1, 0, 38190.78]], dtype=object)

In [18]:
# Now we will create dummy variables for our Geography feature, so we are only working with 1s and 0s
# We also want to refrain from the dummy variable trap, so we must only use 2 of the 3 dummy variable columns
onehotencoder = OneHotEncoder(categorical_features = [1])
X = onehotencoder.fit_transform(X).toarray()
X = X[:, 1:]

In [19]:
X

array([[  0.00000000e+00,   0.00000000e+00,   6.19000000e+02, ...,
          1.00000000e+00,   1.00000000e+00,   1.01348880e+05],
       [  0.00000000e+00,   1.00000000e+00,   6.08000000e+02, ...,
          0.00000000e+00,   1.00000000e+00,   1.12542580e+05],
       [  0.00000000e+00,   0.00000000e+00,   5.02000000e+02, ...,
          1.00000000e+00,   0.00000000e+00,   1.13931570e+05],
       ..., 
       [  0.00000000e+00,   0.00000000e+00,   7.09000000e+02, ...,
          0.00000000e+00,   1.00000000e+00,   4.20855800e+04],
       [  1.00000000e+00,   0.00000000e+00,   7.72000000e+02, ...,
          1.00000000e+00,   0.00000000e+00,   9.28885200e+04],
       [  0.00000000e+00,   0.00000000e+00,   7.92000000e+02, ...,
          1.00000000e+00,   0.00000000e+00,   3.81907800e+04]])

### Training the Data

Now that our data is properly formatted for analysis, let's train our model.

In [20]:
from sklearn.model_selection import train_test_split

In [21]:
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2, random_state=0)

To prevent one independent variable from dominating another one, we must scale our features so all of the values fall on a normal distribution straddling 0.

In [22]:
from sklearn.preprocessing import StandardScaler

In [23]:
# create a standard scaler instance
sc = StandardScaler()

In [24]:
X_train = sc.fit_transform(X_train)
X_test = sc.fit_transform(X_test)

In [25]:
X_train

array([[-0.5698444 ,  1.74309049,  0.16958176, ...,  0.64259497,
        -1.03227043,  1.10643166],
       [ 1.75486502, -0.57369368, -2.30455945, ...,  0.64259497,
         0.9687384 , -0.74866447],
       [-0.5698444 , -0.57369368, -1.19119591, ...,  0.64259497,
        -1.03227043,  1.48533467],
       ..., 
       [-0.5698444 , -0.57369368,  0.9015152 , ...,  0.64259497,
        -1.03227043,  1.41231994],
       [-0.5698444 ,  1.74309049, -0.62420521, ...,  0.64259497,
         0.9687384 ,  0.84432121],
       [ 1.75486502, -0.57369368, -0.28401079, ...,  0.64259497,
        -1.03227043,  0.32472465]])

### Build the Artificial Neural Network

Steps for implementing an ANN.

<img src="ANN_steps.png">

In [26]:
# import Keras modules
import keras

Using TensorFlow backend.


In [27]:
from keras.models import Sequential

In [28]:
from keras.layers import Dense

In [29]:
# Initializing the ANN
classifier = Sequential()

For the activation function for the hidden layers, we will use the Rectifier Function, and we will use the Sigmoid Function for the output layer. This will provide us with a % likelihood of a customer who is likely to churn.

In [30]:
# Adding the input layer and the first hidden layer
# Use the sequential.add() method and call Dense
# Units is the number of nodes to add to the hidden layer. Rule of thumb says use the average between input layer & output layer
# kernel_initializer='uniform' sets our inital weights to uniform random values
# activation = 'relu' means Rectifier function for activation function of this layer
# input_dim = 11 means we have 11 independent variables in our input layer
classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu', input_dim = 11))

In [31]:
# Adding a 2nd hidden layer
# same as above, except we don't need to specify input_dim
classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu'))

In [32]:
# Adding the output layer
# units = 1 because we are only outputting one value
# change our activation function to 'sigmoid'
classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))

In [33]:
# Compiling the ANN
# optimizer is the algorithm we wants to use to find the optimal set of weights. 'adam' = stochastic gradient descent
# loss = 'binary_crossentropy' means our loss function is logarithmic w/ 2 categories
# metrics = ['accuracy'] means we are using accuracy to evaluate our model
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

In [34]:
# Fitting the ANN to the Training set
# batch_size is the number of observations before we update weights
# epochs is the number of trials to fit model and update weights
classifier.fit(X_train, y_train, batch_size = 10, epochs = 100)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

Epoch 88/100
Epoch 89/100
Epoch 90/100
Epoch 91/100
Epoch 92/100
Epoch 93/100
Epoch 94/100
Epoch 95/100
Epoch 96/100
Epoch 97/100
Epoch 98/100
Epoch 99/100
Epoch 100/100


<keras.callbacks.History at 0x117fc8750>

### Viewing Results

In [35]:
# Predicting the Test set results
predictions = classifier.predict(X_test)
# set threshold to 50%
predictions = (predictions > 0.5)

In [37]:
# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix, classification_report
print(confusion_matrix(y_test,predictions))
print(classification_report(y_test,predictions))

[[1535   60]
 [ 212  193]]
             precision    recall  f1-score   support

          0       0.88      0.96      0.92      1595
          1       0.76      0.48      0.59       405

avg / total       0.86      0.86      0.85      2000



Overall, not bad. We converged to an accuracy of around 86%. This means that our model is 86% accurate at predicting which customers are likely to leave the bank given certain information about them.

 - We accurately predicted 1535 that did not churn.
 - For 60 people who did churn, we falsely predicted that they did not.
 - For 212 people who did not churn, we falsely predicted they did.
 - We accurately predicted 193 people who did churn.

### Single Customer Prediction

Let's predict whether the following person is likely to churn or not.

 - Geography: France
 - Credit Score: 600
 - Gender: Male
 - Age: 40 years old
 - Tenure: 3 years
 - Balance: 60000
 - Number of Products: 2
 - Does this customer have a credit card ? Yes
 - Is this customer an Active Member: Yes
 - Estimated Salary: 50000

In [38]:
new_customer = classifier.predict(sc.transform(np.array([[0.0, 0, 600, 1, 40, 3, 60000, 2, 1, 1, 50000]])))

In [41]:
new_customer# = (new_customer > 0.5)

array([[ 0.0182485]], dtype=float32)

In [42]:
new_customer = (new_customer > 0.5)

In [44]:
print new_customer

[[False]]


New Customer is not predicted to churn.