
### Artifical Neural Networks - Churn Modelling

- Fully Connected Neural Network: Layers are fully connected layers
- An input vector containing  different features and predict an outcome which is a binary variable i.e. for ***classification***

##### Understanding the [Dataset](../dataFiles/Churn_Modelling.csv)

- Dataset of a bank with information about customers
- Last column i.e. Exited is the dependent variable tells yes/no - the customer stayed or exited the bank
- Bank observed customers for a period of time and gathered outcomes in the dependent variable to get the correlation between the independent features and the dependent variable
- They intent to deploy the model on new  customers to encourage customers to stay with  the bank
- The given dataset will be used to train the model and then use the model on future customers as well as predicting the probability that a given customer leaves the bank.

In [18]:
# Install tensorflow
!pip install tensorflow

Defaulting to user installation because normal site-packages is not writeable


#### Importing the Libraries

In [19]:
import pandas as pd
import numpy as np
import tensorflow as tf

In [20]:
tf.__version__

'2.13.0'

### Part 1 - Data preprocessing

#### Importing the dataset
- exclude column which will have no impact on the outcome i.e. stay or leave (RowNumber, Customerid, Surname)

In [21]:
dataset = pd.read_csv('../dataFiles/Churn_Modelling.csv')
x = dataset.iloc[:, 3:-1].values
y = dataset.iloc[:, -1].values

In [22]:
print(x)

[[619 'France' 'Female' ... 1 1 101348.88]
 [608 'Spain' 'Female' ... 0 1 112542.58]
 [502 'France' 'Female' ... 1 0 113931.57]
 ...
 [709 'France' 'Female' ... 0 1 42085.58]
 [772 'Germany' 'Male' ... 1 0 92888.52]
 [792 'France' 'Female' ... 1 0 38190.78]]


In [23]:
print(y)

[1 0 1 ... 1 1 0]


#### Encoding categorical data

- encode gender column using label encoding i.e. o and 1 for Gender has a relationship order
- Encode 'Geography' column using OneHotEncoding as data has no relationship order between categories
- No missing data so no need to take care of that

 ***Label encoder of the Gender column***

In [24]:


#label encoding for gender column
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
x[:, 2] = le.fit_transform(x[:, 2]) # [:, 2] all rows and column 2

***OneHotEncoding for the Geography column***

In [25]:
print(x)

[[619 'France' 0 ... 1 1 101348.88]
 [608 'Spain' 0 ... 0 1 112542.58]
 [502 'France' 0 ... 1 0 113931.57]
 ...
 [709 'France' 0 ... 0 1 42085.58]
 [772 'Germany' 1 ... 1 0 92888.52]
 [792 'France' 0 ... 1 0 38190.78]]


***OneHotEncoding the 'Georgraphy' Column***

In [26]:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [1])], remainder='passthrough')
x = np.array(ct.fit_transform(x))

In [27]:
print(x)

[[1.0 0.0 0.0 ... 1 1 101348.88]
 [0.0 0.0 1.0 ... 0 1 112542.58]
 [1.0 0.0 0.0 ... 1 0 113931.57]
 ...
 [1.0 0.0 0.0 ... 0 1 42085.58]
 [0.0 1.0 0.0 ... 1 0 92888.52]
 [1.0 0.0 0.0 ... 1 0 38190.78]]


***Splitting the dataset into  the Training set and Test Set***


In [28]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x,y, test_size=0.2, random_state=0)

##### Feature Scaling - it is compulsory for deep learning

In [29]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)

### Part 2 - Building the ANN

- Initialise the ANN as a sequence of layers
- Add the input layer and the first hidden layer composed a certian number of neurons
- add a second hidden layer to build a deep learning model instead of a shallow learning model
- finally add the output layer which will predict our output


***Initialising the ANN***

In [30]:
# create a variable for an ANN network using the sequential class for a sequence of layers
ann = tf.keras.models.Sequential()


***Adding the input layer and  the first hidden layer***

In [31]:
# add as an object of the dense class
#unit = number of neurons - number depends on experimentation with different hyper parameters e.g. 6
#activation function - must be the rectifier function with code name 'relu'
ann.add(tf.keras.layers.Dense(units =6, activation='relu'))

***Adding the second hidden layer***
- similar to the above

In [33]:
ann.add(tf.keras.layers.Dense(units =6, activation='relu'))

***Adding the output layer***
- units depend on the output dimension e.g. in this  example we only need a binary value hence 1 neuron
- the value of the activation function it needs to be a sigmoid function as it allows the predictions and
- probability and the binary output is 1, this will also give us the probability that the customer leaves the bank
- for non binary classification the activation for the output layer should be **softmax** instead of ***sigmoid***

In [35]:

ann.add(tf.keras.layers.Dense(units =1, activation='sigmoid'))

### Part 3 - Training the ANN

***Compiling the ANN***
- Compile the ANN with an optimiser, loss function and a metric for accuracy because we are performing classification
- Train the ANN on the training set over a certain number of Epochs

- stochaistic gradient descent allows us to update the weights in order to reduce the error (loss) between the predictions and the real result
- Best optimiser is one which can perform stochaistic gradient descent - Adam optimiser is the best
- Loss function is computes the difference between the predictions and the real result - loss function must always be **'binary_crossentropy'** if we are dealing with a binary outclassification. For other ***classifications** it needs to be **'categorical_crossentropy'**
- Accuracy is the final evaluation metric - the main metric is accuracy

In [36]:
#parameters: optimizer, loss function and evaluation metric

ann.compile(optimizer ='adam', loss ='binary_crossentropy' , metrics =['accuracy'])

***Training the ANN on the Training Dataset***
- use batch learning as it is a lot more efficient when training a model, classic batch size = 32
- epochs improve the performance of the mode over time

In [None]:
ann.fit(x_train, y_train, batch_size =32, epoch =100)

### Part 4: Making the predictions and evaluating the model

#### Predicting the result of a single observation

#### Predicting Test set results

#### Making the confusion Matrix