# Deep learning A-Z : Building an ANN

This notebook is my response to the first homework of the course called *Deep Learning A-Z™: Hands-On Artificial Neural Networks* accessible here : https://www.udemy.com/deeplearning/

In this notebook, we are going to build an ANN using keras and by following instructions given on the course. This neurals network will predict, for a customer of a bank, if this customer is going to leave the bank or not. We are going to train our ann with a dataset containing data about approximately 10000 clients, which also includes a response column in which we can see whether the client stayed or not in the bank.

### Imports

In [9]:
import os
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.preprocessing import LabelEncoder, OneHotEncoder

import keras
from keras.models import Sequential
from keras.layers import Dense

# Ignore warnings
import warnings
warnings.filterwarnings('ignore')

### 1. Data preprocessing

In [2]:
path_train = os.path.join(os.path.dirname(os.path.dirname(os.path.abspath('__file__'))), 'ressources/Artificial_Neural_Networks/Churn_Modelling.csv')
dataset = pd.read_csv(path_train)
dataset.head()

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


In [4]:
dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
RowNumber          10000 non-null int64
CustomerId         10000 non-null int64
Surname            10000 non-null object
CreditScore        10000 non-null int64
Geography          10000 non-null object
Gender             10000 non-null object
Age                10000 non-null int64
Tenure             10000 non-null int64
Balance            10000 non-null float64
NumOfProducts      10000 non-null int64
HasCrCard          10000 non-null int64
IsActiveMember     10000 non-null int64
EstimatedSalary    10000 non-null float64
Exited             10000 non-null int64
dtypes: float64(2), int64(9), object(3)
memory usage: 1.1+ MB


"Exited" is our response column, its the 14th column, we are going to need this information to modify the template given by the course to make it works for this case. The three first columns have no impact on the response so we will not include it in out training table.

As shown in the course, we are going to preprocess our data using the template given in the course:

In [5]:
X = dataset.iloc[:, 3:12].values # we modify indexes according to what we saw with the info() method of the dataset
y = dataset.iloc[:, 13].values # idem
X[1]

array([608, 'Spain', 'Female', 41, 1, 83807.86, 1, 0, 1], dtype=object)

Now we have to encode our categorical variables. We are going to do it using the template of the course. Here we have two categorical columns (Geography and Gender) so we have to create to encoders:

In [6]:
# Encoding categorical data
labelencoder_X_geo = LabelEncoder()
X[:, 1] = labelencoder_X_geo.fit_transform(X[:, 1])
labelencoder_X_gender = LabelEncoder()
X[:, 2] = labelencoder_X_gender.fit_transform(X[:, 2])
onehotencoder = OneHotEncoder(categorical_features = [1])
X = onehotencoder.fit_transform(X).toarray()
X = X[:, 1:]

In [7]:
# Splitting the dataset into the Training set and Test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

# Feature Scaling
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
X_train

array([[-0.5698444 ,  1.74309049,  0.16958176, ...,  0.8095029 ,
         0.64259497, -1.03227043],
       [ 1.75486502, -0.57369368, -2.30455945, ..., -0.92159124,
         0.64259497,  0.9687384 ],
       [-0.5698444 , -0.57369368, -1.19119591, ..., -0.92159124,
         0.64259497, -1.03227043],
       ...,
       [-0.5698444 , -0.57369368,  0.9015152 , ...,  0.8095029 ,
         0.64259497, -1.03227043],
       [-0.5698444 ,  1.74309049, -0.62420521, ...,  0.8095029 ,
         0.64259497,  0.9687384 ],
       [ 1.75486502, -0.57369368, -0.28401079, ..., -0.92159124,
         0.64259497, -1.03227043]])

Our data are now preprocessed ! We can start building our model:

### 2. Let's build our ANN

In [14]:
# Create your classifier here
classifier = Sequential() # Initializing our ANN

As a reminder, here are all the steps we must follow for training an ANN with stochastic gradient descent method. Dense function will be used for step 1. From step 2, we know that each features is attributed to one node, so we have to create 11 input nodes in our input layer. 
We also have to choose an activation function (step 3) as we saw in the course, we will use the best one for our hidden layers (based on experiment) : the rectifier function. The sigmoid function is a very good option for our output layer because it will gives us probabilities for each classes.
After that, we will use learning rate to choose how weights are updated and we will also think about how many epochs we are going to do. Let's go !

![title](images/steps.png)

In [16]:
# Adding the input layer and the first hidden layer of our ANN

classifier.add(Dense(units=6, kernel_initializer='uniform', activation='relu', input_shape = (11,)))

- *units* param corresponds to the number of node in the hidden layers :
**tip :** choose the number of nodes in the hidden layers as the average of the number of nodes in the input layer and the number of nodes in the output layer
Here we have 11 nodes in input layer and 1 node in output layer (because binary output) so we choose 6 nodes in hidden layers

- *kernel_initializer* param corresponds to the way we are initializing our weights :
As we saw during the course, weights must be initialized as small numbers close to zero. The random uniform function allows us to initialize our weights in this way.

- *activation* param corresponds to the activation function for hidden layers :
We choose 'relu' for rectifier function.

- *input_shape* is the number of nodes in the input layer

In [None]:
# Predicting the Test set results
y_pred = classifier.predict(X_test)

# Making the Confusion Matrix
cm = confusion_matrix(y_test, y_pred)

### 3. Making predictions