# Artificial Neural Networks (ANNs)

<a id='9'></a>
# 1. Implementation in Keras

A brief about the NN libraries:

Theano is an open source numerical computation library based on numpy syntax. It can run not only on the CPU ( Central Processing Unit) but also the GPU (Graphical Processing Unit). GPU is a processor for graphic purposes somewhat similar to a graphic card. GPU is much more powerful in terms of efficiency etc. because it has more cores and is able to run more floating points calculations per second than the CPU. GPU is highly specialised for heavy, parallel computations which is a requirement in Neural Networks that we are about to see.

How parallel computation comes into play in NNs? When we are forward propagating the different activations of neurons for the activation function or when we back propagate the error. Also calculations can be carried out faster this way. Theano was developed at the University of Montreal.

Tensorflow is similar to Theano. It's been developed by Google.

However the point to consider is that these two libraries are more towards the research and development side of Neural Networks. If we were to create a model from scratch and make some improvements in it, experiment or something these two would be great but right now we would be using Keras for beginning till we step up. Keras in some way wraps the two libraries for us and provides small and easy to implement modules of code.

In [None]:
#!pip install tensorflow 
import numpy as np
import pandas as pd
import os
import tensorflow
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

<a id='10'></a>
# 2. Business Problem and EDA

Our Business problem which I have chosen for this tutorial is a classification problem wherein we have a dataset in which there are details of a bank's customers and the target variable is a binary variable reflecting the fact whether the customer left the bank (closed his account) or he continues to be a customer.

In [None]:
# Importing the dataset
churn_data = pd.read_csv('Churn_Modelling.csv', index_col='RowNumber')

In [None]:
churn_data.info()

In [None]:
churn_data.describe()

In [None]:
churn_data.head()

In [None]:
churn_data['Geography'].value_counts()
# some columns are totally unproductive so let's remove them
#churn_data.drop(['CustomerId','Surname'],axis=1,inplace=True)

In [None]:
churn_data.head()

In [None]:
# some columns have text data so let's one hot encode them
#  for more on one hot encoding click this link below
# https://www.kaggle.com/shrutimechlearn/types-of-regression-and-stats-in-depth

from sklearn.preprocessing import LabelEncoder,OneHotEncoder
from sklearn.compose import ColumnTransformer

# encode Gender column to 0 and 1
churn_data_encoded = churn_data.copy()
label_enconder = LabelEncoder()
churn_data_encoded['Gender'] = label_enconder.fit_transform(churn_data_encoded['Gender'])

# enconde Geography column to a binary array
Geography_dummies = pd.get_dummies(prefix='Geo',data=churn_data_encoded,columns=['Geography'])

# OR VIA LABEL ENCONDING and ONE HOT ENCODING

#column_transformer = ColumnTransformer([("Geography", OneHotEncoder(), [1])], remainder = 'passthrough')
#churn_data_encoded = column_transformer.fit_transform(churn_data_encoded)
#churn_data_encoded = churn_data_encoded[:, 1:]
#churn_data_encoded = pd.DataFrame(churn_data_encoded)
#churn_data_encoded.head()

In [None]:
Geography_dummies.head()

In [None]:
churn_data_encoded = Geography_dummies.copy()
churn_data_encoded = churn_data_encoded.drop(['Surname', 'CustomerId'], axis = 1)
churn_data_encoded.head()

In [None]:
sns.countplot(y=churn_data_encoded.Exited ,data=churn_data_encoded, hue=churn_data_encoded.Exited)
plt.xlabel("Count of each Target class")
plt.ylabel("Target classes")
plt.show()

In [None]:
churn_data_encoded.hist(figsize=(15,12),bins = 15)
plt.title("Features Distribution")
plt.show()

In [None]:
plt.figure(figsize=(12,5))
p=sns.heatmap(churn_data_encoded.corr(), annot=True,cmap='RdYlGn',center=0)

#### X and y definitions

In [None]:
X = churn_data_encoded.drop('Exited',axis=1).values
y = churn_data_encoded.Exited

#### Train and test split

In [None]:
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 1, stratify=y, shuffle=True)

#### Feature Scaling (Data Normalization)

In [None]:
# Feature Scaling because yes we don't want one independent variable dominating the other and it makes computations easy
from sklearn.preprocessing import MinMaxScaler
sc = MinMaxScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.fit_transform(X_test)

In [None]:
# sequential model to initialise our ann and dense module to build the layers
from tensorflow.keras.models import Sequential
from tensorflow.keras import Input
from tensorflow.keras.layers import Dense
import tensorflow as tf

#### ANN Building

In [None]:
ann = Sequential()
# Adding the input layer and the first hidden layer
input_layer = Dense(units = 12, activation = 'relu', input_dim = 12)
ann.add(input_layer)

# Adding the hidden layer 1
hidden_layer_1 = Dense(units = 6, activation = 'relu')
ann.add(hidden_layer_1)

# Adding the hidden layer 2
hidden_layer_2 = Dense(units = 3, activation = 'relu')
ann.add(hidden_layer_2)

# Adding the output layer
# Sigmoid -> Binary classification, Softmax -> Multiclassification
ann.add(Dense(units = 1, activation = 'sigmoid'))

# Loss (Binary Classification) -> binary_crossentropy, Multiclass -> categorical_crossentropy
# Compiling the ANN | means applying ADAM on the whole ANN
ann.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics=['accuracy'])

#### ANN Training

In [None]:
# Fitting the ANN to the Training set
history = ann.fit(X_train, y_train, batch_size=50, epochs = 100,verbose = 1, validation_split=0.2, validation_freq=1)

<a id='11'></a>
# 3. Evaluation Metrics

#### Evaluating the generated ANN model

In [None]:
# summarize history for loss and accuracy
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model loss / accuracy convergence')
plt.xlabel('epoch')
plt.legend(['Train Accuracy', 'Validation Accuracy', "Train Loss", "Validation Loss"], loc='center right')
plt.show()

# summarize history for loss
# plt.plot(history.history['loss'])
# plt.plot(history.history['val_loss'])
# plt.title('model loss')
# plt.ylabel('loss')
# plt.xlabel('epoch')
# plt.legend(['train', 'validation'], loc='upper left')
# plt.show()

In [None]:
score, acc = ann.evaluate(X_train, y_train, batch_size=10)
print('Train score:', score)
print('Train accuracy:', acc)
# Part 3 - Making predictions and evaluating the model

# Predicting the Test set results
y_pred = ann.predict(X_test)
y_pred = (y_pred > 0.5).astype("int32")

print('*'*20)
score, acc = ann.evaluate(X_test, y_test, batch_size=10)
print('Test score:', score)
print('Test accuracy:', acc)
# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

In [None]:
p = sns.heatmap(cm, annot=True, cmap="Blues" ,fmt='g')
plt.title('Confusion matrix')
plt.ylabel('Actual label')
plt.xlabel('Predicted label')

In [None]:
#import classification_report
from sklearn.metrics import classification_report
print(classification_report(y_test,y_pred))

In [None]:
from sklearn.metrics import roc_curve
y_predict_probability = ann.predict(X_test)
fpr, tpr, thresholds = roc_curve(y_test, y_predict_probability)
plt.plot([0,1],[0,1],'k--')
plt.plot(fpr,tpr, label='ANN')
plt.xlabel('FPR')
plt.ylabel('TPR')
plt.title('ROC curve')
plt.show()

In [None]:
#Area under ROC curve
from sklearn.metrics import roc_auc_score
roc_auc_score(y_test,y_predict_probability)

Deviation is very low so I'd say that it is unlikely to be an overfitted model. With different training sets it got the mean with all training results is still very close to the above model.

<a id='12'></a>
# 4. Improving ANN with Dropout layer

Dropout Regularization is used to ignore certain neurons in order to reduce noise (overfitting) if needed.

<img src="https://preview.ibb.co/e7yPPp/dropout.jpg" alt="dropout" border="0">

p is the fraction of input units to drop. If suppose there are ten neurons from a layer and p is 0.1 then one of the neurons would be disabled and its output would not be sent to the further layer.

It is advisable to start with p 0.1 and move to higher values when in case the overfitting problem persists. Also going over 0.5 is not advisable generally because it may cause underfitting as most of the neurons are disabled.


#### Building our ANN with Dropout layers

In [None]:
# Improving the ANN
from tensorflow.keras.layers import Dropout

ann = Sequential()
# Adding the input layer and the first hidden layer (w/ Dropout)
ann.add(Dense(units = 12, activation = 'relu', input_dim = 12))
ann.add(Dropout(rate = 0.1))

# Adding the hidden layer 1 (w/ Dropout)
ann.add(Dense(units = 24, activation = 'relu'))
ann.add(Dropout(rate = 0.1))

# Adding the hidden layer 2 (w/ Dropout)
ann.add(Dense(units = 12, activation = 'relu'))
ann.add(Dropout(rate = 0.1))

# Adding the output layer
ann.add(Dense(units = 1, activation = 'sigmoid'))

# Compiling the ANN
ann.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics=['accuracy'])

#### Training our ANN

In [None]:
# Fitting the ANN to the Training set
history = ann.fit(X_train, y_train, batch_size=100, epochs = 200,verbose = 1, validation_split=0.2)

#### Evaluation Plots and Metrics

In [None]:
# summarize history for loss and accuracy
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model loss / accuracy convergence')
plt.xlabel('epoch')
plt.legend(['Train Accuracy', 'Validation Accuracy', "Train Loss", "Validation Loss"], loc='center right')
plt.show()

# summarize history for loss
# plt.plot(history.history['loss'])
# plt.plot(history.history['val_loss'])
# plt.title('model loss')
# plt.ylabel('loss')
# plt.xlabel('epoch')
# plt.legend(['train', 'validation'], loc='upper left')
# plt.show()

In [None]:
# Part 3 - Making predictions and evaluating the model

score, acc = ann.evaluate(X_train, y_train,
                            batch_size=10)
print('Train score:', score)
print('Train accuracy:', acc)
# Part 3 - Making predictions and evaluating the model

# Predicting the Test set results
y_pred = ann.predict(X_test)
y_pred = (y_pred > 0.5)

print('*'*20)
score, acc = ann.evaluate(X_test, y_test,
                            batch_size=10)
print('Test score:', score)
print('Test accuracy:', acc)

# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

In [None]:
p = sns.heatmap(pd.DataFrame(cm), annot=True, cmap="Blues" ,fmt='g')
plt.title('Confusion matrix', y=1.1)
plt.ylabel('Actual label')
plt.xlabel('Predicted label')

In [None]:
#import classification_report
from sklearn.metrics import classification_report
print(classification_report(y_test,y_pred))

In [None]:
from sklearn.metrics import roc_curve
y_predict_probability = ann.predict(X_test)
fpr, tpr, thresholds = roc_curve(y_test, y_predict_probability)
plt.plot([0,1],[0,1],'k--')
plt.plot(fpr,tpr, label='ANN')
plt.xlabel('fpr')
plt.ylabel('tpr')
plt.title('ROC curve')
plt.show()

In [None]:
#Area under ROC curve
from sklearn.metrics import roc_auc_score
roc_auc_score(y_test,y_predict_probability)

# 5. Single prediction exercise

Use our ANN model to predict if the customer with the following informations will leave the bank:

- **Credit Score**: 600
- **Gender**: Male
- **Age**: 40 years old
- **Tenure**: 3 years
- **Balance**: \$60.000
- **Number of Products**: 2
- **Does this customer have a credit card?**: Yes
- **Is this customer an Active Member?**: Yes
- **Estimated Salary**: \$50.000
- **Geography**: France

So, should we say goodbye to that customer?

**And now this client?**

- **Credit Score**: 500
- **Gender**: Female
- **Age**: 65 years old
- **Tenure**: 5 years
- **Balance**: \$160.000
- **Number of Products**: 3
- **Does this customer have a credit card?**: Yes
- **Is this customer an Active Member?**: No
- **Estimated Salary**: \$100.000
- **Geography**: Spain