# Deep learning A-Z : Building an ANN

This notebook is my response to the first homework of the course called *Deep Learning A-Z™: Hands-On Artificial Neural Networks* accessible here : https://www.udemy.com/deeplearning/

In this notebook, we are going to build an ANN using keras and by following instructions given on the course. This neurals network will predict, for a customer of a bank, if this customer is going to leave the bank or not. We are going to train our ann with a dataset containing data about approximately 10000 clients, which also includes a response column in which we can see whether the client stayed or not in the bank.

### Imports

In [1]:
import os
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.preprocessing import LabelEncoder, OneHotEncoder

import keras
from keras.models import Sequential
from keras.layers import Dense

# Ignore warnings
import warnings
warnings.filterwarnings('ignore')

Using TensorFlow backend.


### 1. Data preprocessing

In [2]:
path_train = os.path.join(os.path.dirname(os.path.dirname(os.path.abspath('__file__'))), 'ressources/Artificial_Neural_Networks/Churn_Modelling.csv')
dataset = pd.read_csv(path_train)
dataset.head()

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


In [3]:
dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
RowNumber          10000 non-null int64
CustomerId         10000 non-null int64
Surname            10000 non-null object
CreditScore        10000 non-null int64
Geography          10000 non-null object
Gender             10000 non-null object
Age                10000 non-null int64
Tenure             10000 non-null int64
Balance            10000 non-null float64
NumOfProducts      10000 non-null int64
HasCrCard          10000 non-null int64
IsActiveMember     10000 non-null int64
EstimatedSalary    10000 non-null float64
Exited             10000 non-null int64
dtypes: float64(2), int64(9), object(3)
memory usage: 1.1+ MB


"Exited" is our response column, its the 14th column, we are going to need this information to modify the template given by the course to make it works for this case. The three first columns have no impact on the response so we will not include it in out training table.

As shown in the course, we are going to preprocess our data using the template given in the course:

In [4]:
X = dataset.iloc[:, 3:13].values # we modify indexes according to what we saw with the info() method of the dataset
y = dataset.iloc[:, 13].values # idem
X[1]

array([608, 'Spain', 'Female', 41, 1, 83807.86, 1, 0, 1, 112542.58],
      dtype=object)

Now we have to encode our categorical variables. We are going to do it using the template of the course. Here we have two categorical columns (Geography and Gender) so we have to create to encoders:

In [5]:
# Encoding categorical data
labelencoder_X_geo = LabelEncoder()
X[:, 1] = labelencoder_X_geo.fit_transform(X[:, 1])
labelencoder_X_gender = LabelEncoder()
X[:, 2] = labelencoder_X_gender.fit_transform(X[:, 2])
onehotencoder = OneHotEncoder(categorical_features = [1])
X = onehotencoder.fit_transform(X).toarray()
X = X[:, 1:]

In [6]:
# Splitting the dataset into the Training set and Test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# Feature Scaling
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

In [7]:
X_train.shape

(8000, 11)

Our data are now preprocessed ! We can start building our model:

### 2. Let's build our ANN

In [8]:
# Create your classifier here
classifier = Sequential() # Initializing our ANN

As a reminder, here are all the steps we must follow for training an ANN with stochastic gradient descent method. Dense function will be used for step 1. From step 2, we know that each features is attributed to one node, so we have to create 11 input nodes in our input layer. 
We also have to choose an activation function (step 3) as we saw in the course, we will use the best one for our hidden layers (based on experiment) : the rectifier function. The sigmoid function is a very good option for our output layer because it will gives us probabilities for each classes.
After that, we will use learning rate to choose how weights are updated and we will also think about how many epochs we are going to do. Let's go !

![title](images/steps.png)

In [9]:
# Adding the input layer and the first hidden layer of our ANN
classifier.add(Dense(units=6, kernel_initializer='uniform', activation='relu', input_shape = (11,)))

- *units* param corresponds to the number of node of the layer:

**tip :** choose the number of nodes in the hidden layers as the average of the number of nodes in the input layer and the number of nodes in the output layer
Here we have 11 nodes in input layer and 1 node in output layer (because binary output) so we choose 6 nodes in hidden layers

- *kernel_initializer* param corresponds to the way we are initializing our weights:

As we saw during the course, weights must be initialized as small numbers close to zero. The random uniform function allows us to initialize our weights in this way.

- *activation* param corresponds to the activation function for hidden layers:

We choose 'relu' for rectifier function.

- *input_shape* is the number of nodes in the input layer

Now, let’s create our second hidden layer, which will be the same as the first one. Here, we don’t have to specify the input shape because it can deduce this with the previous hidden layer. For the first one, there was no hidden layer yet created so we had to specify it. 

In [10]:
# Adding the second hidden layer
classifier.add(Dense(units=6, kernel_initializer='uniform', activation='relu'))

Finally, we have to add our output layer. We have to change the activation function because we said that the activation function of our output layer will be the sigmoid function. We also want a single binary output so the units (output dimension) will be set to 1.

In [11]:
# Adding the output layer
classifier.add(Dense(units=1, kernel_initializer='uniform', activation='sigmoid'))

Now, we are going to apply stochastic gradient method on the whole neural network by compiling our model. Indeed, we have built our ANN but the weigths are still initialized, so now we need to find the best weights that will make our NN the most powerful.

In [12]:
# Compilling the ANN
classifier.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

- *optimizer* is the algorithm which will be used to find the optimal set of weights in the ANN, here we will use (as we seen in the course) the stochastic gradient descent method. There are several types of this method, a very efficient one is called 'adam', we will use this one.

- *loss* is the loss function to be optimized within the SGD method. We will use here a log loss because the activation function of our output layer is the sigmoid function (this is the same as for logistic regression model).
**tip**: for more than two categories, the log loss function is called "categorical_crossentropy"

- *metrics* is the criterion that we use to evaluate our model. Here we are going to choose 'accuracy'.

Now we are going to train our NN with the fit method

In [13]:
classifier.fit(X_train, y_train, batch_size=10, epochs=100)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.History at 0x102e54ef0>

- *batch_size* is the number of observations after which you want to update the weights.

- *epochs*, an epoch is an iteration over the entire x and y data provided. We need to repeat step one to six several time (number of epoch) on our dataset to train the model.

For this two arguments, there is no rule and the experimentation is the best method to find out the best number to input.

After running this cell, we can see how stochastic gradient descent if performing on our dataset and how accuracy is improving as the number of epochs realised increases.

In [None]:
# Predicting the Test set results
y_pred = classifier.predict(X_test)

# Making the Confusion Matrix
cm = confusion_matrix(y_test, y_pred)

### 3. Making predictions