### Churn Modelling Dataset
This dataset has Customer Id, Surname, Credit Score, Geography, Gender, Age, Tenure, Balance, Num of Products they( use from the bank such as credit card or loan, etc), Has a Credit card or not (1 means yes 0 means no), Is Active Member ( That means the customer is using the bank or not), estimated salary.

So these all are independent variables of the Churn Modelling dataset. The last feature is the dependent variable and that is customer exited or not from the bank in the future( 1 means the customer will exit the bank and 0 means the customer will stay in the bank.)

The bank uses these independent variables and analyzes the behavior of customers for 6 months whether they leave the bank or stay and made this dataset.

Now the bank has to create a predictive model based on this dataset for new customers. This predictive model has to predict for any new customer that he/she will stay in the bank or leave the bank so that bank can offer something special for the customers whom the predictive model predicts will leave the bank.

### Data PreProcessing

In [1]:
# import the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [2]:
# load the dataset
df = pd.read_csv('churn.csv')
df.head()

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


In this dataset, there are 13 independent variables and 1 dependent variable. But the first three independent variables Row Number, Customer Id, and Surname are useless for our prediction. So we will eliminate these three independent variables in the next step. And we will also split the independent variables in X and the dependent variable in Y.

In [3]:
# split the dataset into X and y
X = pd.DataFrame(df.iloc[:, 3:13].values)
y = df.iloc[:, 13].values

In [4]:
X.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,619,France,Female,42,2,0.0,1,1,1,101348.88
1,608,Spain,Female,41,1,83807.86,1,0,1,112542.58
2,502,France,Female,42,8,159660.8,3,1,0,113931.57
3,699,France,Female,39,1,0.0,2,0,0,93826.63
4,850,Spain,Female,43,2,125510.82,1,1,1,79084.1


In [5]:
X.shape

(10000, 10)

In [6]:
print(y)

[1 0 1 ... 1 1 0]


In [7]:
y.shape

(10000,)

### Label Encoding
Categorical variables-Geography and Gender have to be encoded into some labels such as 0 and 1 for gender so that they can be easily processed. And one-hot encoding for some geography variables.

In [8]:
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.compose import ColumnTransformer

# label encoding for gender variable
labelencoder_X_2 = LabelEncoder()
X.loc[:, 2] = labelencoder_X_2.fit_transform(X.iloc[:, 2])

In [9]:
X.head() # Gender gets converted to numeric values

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,619,France,0,42,2,0.0,1,1,1,101348.88
1,608,Spain,0,41,1,83807.86,1,0,1,112542.58
2,502,France,0,42,8,159660.8,3,1,0,113931.57
3,699,France,0,39,1,0.0,2,0,0,93826.63
4,850,Spain,0,43,2,125510.82,1,1,1,79084.1


In [10]:
# label encoding for geography variable
labelencoder_X_1 = LabelEncoder()
X.loc[:, 1] = labelencoder_X_1.fit_transform(X.iloc[:, 1])

In [11]:
X.head() # geography gets converted to numeric values

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,619,0,0,42,2,0.0,1,1,1,101348.88
1,608,2,0,41,1,83807.86,1,0,1,112542.58
2,502,0,0,42,8,159660.8,3,1,0,113931.57
3,699,0,0,39,1,0.0,2,0,0,93826.63
4,850,2,0,43,2,125510.82,1,1,1,79084.1


In [12]:
# one-hot encoding
labelencoder_X_1 = LabelEncoder()
X.loc[:, 1] = labelencoder_X_1.fit_transform(X.iloc[:, 1])
ct = ColumnTransformer([("Geography", OneHotEncoder(), [1])], remainder = 'passthrough')
X = ct.fit_transform(X)
X = X[:,1:]

Here [0 0] means France,
[0 1] means Spain and
[1 0] means Germany.
So, the first two columns represent the Geography variable.

In [13]:
print(X)

[[0.0 0.0 619 ... 1 1 101348.88]
 [0.0 1.0 608 ... 0 1 112542.58]
 [0.0 0.0 502 ... 1 0 113931.57]
 ...
 [0.0 0.0 709 ... 0 1 42085.58]
 [1.0 0.0 772 ... 1 0 92888.52]
 [0.0 0.0 792 ... 1 0 38190.78]]


### Splitting the dataset into training and testing sets

In [14]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

### Feature Scaling
In this dataset, all values are not in the same range, especially the Balance and Estimated_salary and this will certainly increase the calculation time. So to overcome this problem, feature scaling is used.
Feature scaling helps to normalize the data within a particular range.

In [15]:
# normalize the data
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

In [23]:
X_train

array([[-0.5698444 ,  1.74309049,  0.16958176, ...,  0.64259497,
        -1.03227043,  1.10643166],
       [ 1.75486502, -0.57369368, -2.30455945, ...,  0.64259497,
         0.9687384 , -0.74866447],
       [-0.5698444 , -0.57369368, -1.19119591, ...,  0.64259497,
        -1.03227043,  1.48533467],
       ...,
       [-0.5698444 , -0.57369368,  0.9015152 , ...,  0.64259497,
        -1.03227043,  1.41231994],
       [-0.5698444 ,  1.74309049, -0.62420521, ...,  0.64259497,
         0.9687384 ,  0.84432121],
       [ 1.75486502, -0.57369368, -0.28401079, ...,  0.64259497,
        -1.03227043,  0.32472465]])

In [24]:
X_test

array([[ 1.75486502, -0.57369368, -0.55204276, ...,  0.64259497,
         0.9687384 ,  1.61085707],
       [-0.5698444 , -0.57369368, -1.31490297, ...,  0.64259497,
        -1.03227043,  0.49587037],
       [-0.5698444 ,  1.74309049,  0.57162971, ...,  0.64259497,
         0.9687384 , -0.42478674],
       ...,
       [-0.5698444 ,  1.74309049, -0.74791227, ...,  0.64259497,
        -1.03227043,  0.71888467],
       [ 1.75486502, -0.57369368, -0.00566991, ...,  0.64259497,
         0.9687384 , -1.54507805],
       [ 1.75486502, -0.57369368, -0.79945688, ...,  0.64259497,
        -1.03227043,  1.61255917]])

### Importing keras libraries and packages

In [16]:
import keras
from keras.models import Sequential
from keras.layers import Dense

### Initialize artificial neural networks
The Sequential class helps to build ANN but as a sequence of layers. ANN is built with fully connected layers. After initializing the ANN, the input layer and the first hidden layer are added. Dense is a famous class in Tensorflow used to add a fully connected layer in ANN.

“add” is the method in the Sequential Class. output_dim represents the number of hidden neurons in the hidden layer.

In [17]:
classifier = Sequential()
# first hidden layer/input layer
classifier.add(Dense(6, kernel_initializer="uniform", activation = 'relu', input_dim = 11))

The activation function in the hidden layer for a fully connected neural network should be the Rectifier Activation function. That’s why ‘relu’ (Rectified Linear Unit) is used. In here, the Input layer has 11 neurons because 11 independent variables (including 2 columns of Geography) are present. Now, the next hidden layer is built in a similar fashion as the first layer.

In [18]:
# second hidden layer
classifier.add(Dense(6, kernel_initializer='uniform', activation = 'relu'))

In the output layer, 1 neuron is required beacuse there is one dependent variable in Binary form in the dataset. That means prediction has to be done in 0 or 1 form. That’s why only one neuron is needed in the output layer.

In the output layer, there should be a Sigmoid activation function because it not only predicts but also provides the probability of customers leaving the bank or not. After building the basic structure of ANN, the next step is to train the ANN.

In [19]:
# third hidden layer/output layer
classifier.add(Dense(1, kernel_initializer='uniform', activation = 'sigmoid'))

### Train the ANN
Compile is a method of Tensorflow. “adam’ is the optimizer that can perform the stochastic gradient descent. The optimizer updates the weights during training and reduces the loss. When implementing binary prediction, loss function is always given as binary_crossentropy. For evaluating the ANN model, Accuracy metrics is used.

In [20]:
# compiling the ANN
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

In [21]:
# fit ANN to training set
classifier.fit(X_train, y_train, batch_size = 10, epochs = 100)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.History at 0x23979472f40>

### Predicting the test results
y_pred > 0.5 means if y-pred is in between 0 to 0.5, then this new y_pred will become 0(False). And if y_pred is larger than 0.5, then the new y_pred will become 1(True).

In [25]:
y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.5)



In [26]:
y_pred

array([[False],
       [False],
       [False],
       ...,
       [False],
       [False],
       [False]])

### Plotting the confusion matrix

In [27]:
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test,y_pred)

# final accuracy = 84.55%

[[1540   55]
 [ 254  151]]


0.8455