# Information
Predict if a customer leave the bank:  
Churn rate of a bank: the annual percentage rate at which customers stop subscribing to a service.  

The columns in order:   
Row Number  
Customer ID  
Lastname  
Credit Score  
Geography  
Gender  
Age  
Tenure  
Balance  
Number of Products  
Has Credit Card  
Is Active Member  
Estimated Salary  
Exited: They wait 6 months and enter the exit information for each customer.  

You can apply this model to any dataset with lots of variables and two outcome : 0 or 1

This is a classification problem


In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Part 1: Data Preprocessing

In [2]:
# 1) IMPORT THE DATASET
dataset = pd.read_csv('Churn_Modelling.csv')

In [3]:
X = dataset.iloc[:, 3:13].values # we excluded the row number, customer id, surname columns. they have no impact for a customer to decide to leave or stay. 
y = dataset.iloc[:, 13].values #the targets, actual outputs
#.values: runs only the values, no headings.

Since, this is a classification problem, we can use the classification template:

In [4]:
# Classification template

#According to template, after importing the dataset, and creating our dependent and independent variables
#we need to split the dataset into the training and test sets, however for this dataset
#we, first, need to encode the categorical variables:

# 2) ENCODING CATEGORICAL DATA: CATEGORICAL TEMPLATE
#For this we have categorical template. This section will endcode any categorical data that you have in your dataset:

# Encoding the Independent Variable: we have two categorical variables
#country and gender

from sklearn.preprocessing import LabelEncoder, OneHotEncoder

labelencoder_X_1 = LabelEncoder()
X[:, 1] = labelencoder_X_1.fit_transform(X[:, 1])
#Country column number is 1. This will encode the values as 0, 1, 2...
labelencoder_X_2 = LabelEncoder()
X[:, 2] = labelencoder_X_2.fit_transform(X[:, 2])
#Gender column number is 2.
#we are creating dummy variables for the country column
#the reason for that we don't want any numbers except 0 or 1. 
#so we will say for example France = 0,0,1; Spain= 1,0,0; German; 0,1,0 instead of  0,1,2
#one hot encoder will create three dummy columns, place it before the credit score column.
onehotencoder = OneHotEncoder(categorical_features = [1])
X = onehotencoder.fit_transform(X).toarray()
#we don't need three variables to represent the three different countries, so
#we will delete one dummy variable to avoid falling into the dummy variable trap.
#this won't change the country information: France becomes = 0,0; German=0,1; Spain,1,0
X = X[:, 1:] #this takes all the rows in X (inputs), and takes columns from 1 to end
#CHANGED: just the column numbers, and one delete one dummy variable

# 3) SPLITTING THE DATASET INTO TRAINING AND TEST:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
#CHANGED: just the test size from 0.25 to 0.2 since we have 10,000 observations
#to avoid warning: we changed from sklearn.cross_validation to model_selection

# 4) FEATURE SCALING: 
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
#CHANGED: No change

# Part 2: Start building the model

In [5]:
# we're going to import the Keras libraries and packages for the model
#This part is not included in the classification template

#1) IMPORT LIBRARIES
import keras
from keras.models import Sequential #required to initialize the ANN
from keras.layers import Dense #required to build layers
from keras.layers import Dropout #improving the ANN
# 2) INITIALIZE THE ANN
#we need to initialize the ANN, that is defining it as a sequence of layers
classifier = Sequential()
#this object we created is nothing else than the model itself
#since this model is going to be a classifier model, we will just name it classifier
#so this classifier object is nothing else but the future model that we're going to create

# 3) ADD THE LAYERS:
#adding input and the first hidden layer with dropouts (with dropout addition):
classifier.add(Dense(units=6, kernel_initializer='uniform', activation='relu', input_dim=11))
classifier.add(Dropout(p=0.1))
#dropout argument:
#p=fraction you want to disable. 
#if we have 10 neurons and if we put p=0.1, that means we will disable 1 neuron in each iteration
#try higher value of p if you still have overfitting
#dense function arguments:
#output_dim(units): the number of nodes/neurons you want to add in this hidden layer.
#it's the average of the number of nodes in the input layer and the output layer
#in this dataset, we have 11 columns=nodes in input layer; we have 1 node in the output layer=1 column
#the average is 11+1=12/2=6 nodes in the hidden layer
#init: randomly initialize the weights
#activstion = activation function
#input_dim = number of independent variables = columns

#adding the second hidden layer:
classifier.add(Dense(units = 6, kernel_initializer='uniform', activation = 'relu'))
classifier.add(Dropout(p=0.1))
#adding the output layer
classifier.add(Dense(units = 1, kernel_initializer='uniform', activation = 'sigmoid'))
#if you're dealing with a dependent variable (output) that has more than two categories (0,1),
#like for example, 3 categories, you will need to change two things:
#output_dim = 3 (because the output will be one-hot encoded)
#activation = softmax

#4) COMPILE THE ANN
classifier.compile(optimizer = 'adam', loss= 'binary_crossentropy', metrics = ['accuracy'])
#compile arguments:
#optimizer: weights are still only initialize. this algorithm changes the weights efficiently
#loss: since we have 2 outcomes that are binary we use binary_crossentropy
#if you have more than 2 outcomes than loss = categorical_crossentropy
#metrics:creatorion that you choose to evaluate your model.

#5) FITTING THE ANN TO THE TRAINING SET
classifier.fit(X_train, y_train, batch_size=10, epochs= 100)



Using TensorFlow backend.


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

Epoch 82/100
Epoch 83/100
Epoch 84/100
Epoch 85/100
Epoch 86/100
Epoch 87/100
Epoch 88/100
Epoch 89/100
Epoch 90/100
Epoch 91/100
Epoch 92/100
Epoch 93/100
Epoch 94/100
Epoch 95/100
Epoch 96/100
Epoch 97/100
Epoch 98/100
Epoch 99/100
Epoch 100/100


<keras.callbacks.History at 0x10dd5c710>

# Part 3:Making the predictions and evaluating the model:

In [11]:
# classification template continue


# 1) PREDICTING THE TEST SET RESULTS
y_pred = classifier.predict(X_test)
#y_pred is the all the probabilities that the 2000 customers of the test set to leave the bank
#y_pred first row: 0.23, so it says the first customer has 20% prob of leaving
#bank can use this information: sort the y_pred descending, find out the 10% of the customers that are most likely to leave
#then try to understand the why these customers are most likely to leave

# 2) MAKING THE CONFUSION MATRIX
#in order to use the confusion matrix we need y_pred as true or false not the probabilities:
#convert the probabilities into the predicted results:
#we need to choose a threshold to decide when the predicted result is 1 and 0
#natural threshold to take is 0.5. if we are dealing with more sensitive information we need to take higher threshold.
#for example, if a tumor is malignant
y_pred = (y_pred > 0.5) #if y_pred is >0.5 then it returns true, otherwise false
# lower than 0.5 don't leave the bank, higher than 0.5 leave the bank

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

In [7]:
#let's look at the confusion matrix
cm
#so out of 2000 new observations we get 1524 + 161 correct predictions, and 71 +244 false predictions
#this will change in each run of the model 

array([[1562,   33],
       [ 288,  117]])

In [8]:
test_accuracy = ((cm[0,0] + cm[1,1]) / 2000) *100
test_accuracy


83.950000000000003

In [9]:
#let's try to predict if this customer will leave the bank or not, using our model:
#Geography: France
#Credit Score: 600
#Gender: Male
#Age: 40 years old
#Tenure: 3 years
#Balance: $60000
#Number of Products: 2
#Does this customer have a credit card ? Yes
#Is this customer an Active Member: Yes
#Estimated Salary: $50000
new_prediction = classifier.predict(sc.transform(np.array([[0.0, 0, 600, 1, 40, 3, 60000, 2, 1, 1, 50000]])))
new_prediction = (new_prediction>0.5)

In [10]:
new_prediction
#false=meaning the customer will not leave the bank

array([[False]], dtype=bool)