# Artificial Neural Network(ANN)

* Dataset of a bank(fictional) with 10k customers. Bank is seeing unusual churn(leaving the company) rate, they have seen customers leaving at high rate so they want to address that issue. So they hired you to look into this dataset and give some insights to them. 
* This bank operates in Europe, in 3 countries France Spain and Germany. 6 months before they started measuring and recorded everything about the customers. 
* Tenure(how long they have been with the bank), NumOfProducts(loan, credit card, savings account etc in total), IsActiveMember(does the customer have done any transaction in past 6/3 months), Exited(whether or not that customer left that bank in those 6 months, left bank=1, still with bank=0).
* So you have to create a geo demographic segmentation to tell the bank that which of their custoers are at high risk of leaving the bank. 
* By doing this you can say which people are more reliable and that could govern the banks decision whether to give a loan or not and you can also say which transactions are more likely to be fraudulent and which are less likely.

In [3]:
#importing the libraries
import numpy as np
import pandas as pd
import tensorflow as tf

In [4]:
tf.__version__

'2.5.0'

## Data Preprocessing

In [21]:
#importing dataset
dataset = pd.read_csv("Dataset/Churn_Modelling.csv")
x = dataset.iloc[:,3:-1].values
y = dataset.iloc[:,-1].values

In [6]:
print(x)

[[619 'France' 'Female' ... 1 1 101348.88]
 [608 'Spain' 'Female' ... 0 1 112542.58]
 [502 'France' 'Female' ... 1 0 113931.57]
 ...
 [709 'France' 'Female' ... 0 1 42085.58]
 [772 'Germany' 'Male' ... 1 0 92888.52]
 [792 'France' 'Female' ... 1 0 38190.78]]


In [7]:
print(y)

[1 0 1 ... 1 1 0]


In [8]:
#encoding categorical data
#label encoding the "gender" column
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
x[:, 2] = le.fit_transform(x[:, 2])

In [9]:
print(x)

[[619 'France' 0 ... 1 1 101348.88]
 [608 'Spain' 0 ... 0 1 112542.58]
 [502 'France' 0 ... 1 0 113931.57]
 ...
 [709 'France' 0 ... 0 1 42085.58]
 [772 'Germany' 1 ... 1 0 92888.52]
 [792 'France' 0 ... 1 0 38190.78]]


In [11]:
#one hot encoding the "geography" column, because we don't have order relationship between france spain and germany
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [1])], remainder='passthrough')
x = np.array(ct.fit_transform(x))

In [12]:
print(x)   #"1.0 0.0 0.0"=France, "0.0 0.0 1.0"=Spain, "0.0 1.0 0.0"=Germany

[[1.0 0.0 0.0 ... 1 1 101348.88]
 [0.0 0.0 1.0 ... 0 1 112542.58]
 [1.0 0.0 0.0 ... 1 0 113931.57]
 ...
 [1.0 0.0 0.0 ... 0 1 42085.58]
 [0.0 1.0 0.0 ... 1 0 92888.52]
 [1.0 0.0 0.0 ... 1 0 38190.78]]


In [13]:
#splitting the dataset into training set and test set
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

In [14]:
#feature scaling, """it's compulsory in deep learning"""
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)

## Building the ANN

In [16]:
#initializing the ann as the sequence of layers

# sequential class which allows to build artificial neural network as a sequence of layers as opposed to a computational graph
ann = tf.keras.models.Sequential()   #Sequential is taken from "models" module from the "Keras" library which belongs to tensor flow #since tensor flow 2.0, the Keraas library is included in tensor flow

In [17]:
#adding the input layer and the first hidden layer

#.add(), adds anything like the hidden layer, output layer, drop out layer(which allows to prevent overfitting)
ann.add(tf.keras.layers.Dense(units=6, activation="relu"))   #add a fully connected layer, .layers have classes to add any layer in ann
#units: number of neurons you want to have in this first hidden layer, to know how many neurons to add we must experiment with the hyperparameters(the parameters which won't be trained during the training process)
#activation: activation function("relu" is the code name for rectifier activation function)

In [18]:
#adding the second hidden layer

ann.add(tf.keras.layers.Dense(units=6, activation="relu"))

In [19]:
#adding the output layer

ann.add(tf.keras.layers.Dense(units=1, activation="sigmoid"))   #units=1, becuase we have 1/0 in the dependent variable so it can be given by one neuron but if we had non-binary dependent variable then we would have to do OneHotEnconding and specify "units=3"
#sigmoid function will give you ultimate predictions(0/1) and the probabilities that the binary outcome is 1. #Basically it'll tell whether or not customer choose to leave the bank and also for each customer the probability that the customer leaves the bank
#if the outputs are non_binary i.e. more then two output values then activation must be "softmax"

## Training the ANN

In [20]:
#compiling the ann with the optimizer and then the lost function

ann.compile(optimizer= "adam", loss= "binary_crossentropy", metrics= ["accuracy"])   #atom optimizer can perform stochastic gradient descent(compare at each iteration and update the weights), this optimzer will update the weights through SGD because we choose atom optimizer to add next iteration and reduce the loss
#loss function is the way to compute the difference between predictions and real results, while binary outcome the loss function should always be "binary_crossentropy" and in case of non-binary, "categorical_crossentropy"
#you can choose several metrics(list of metrics) with which you want to evaluate you ann at the same time but we will choose just one matrix, 

In [21]:
#training the ann on the training set

ann.fit(x_train, y_train, batch_size= 32, epochs= 100)   #batch_size becuase we're going to compare predictions with real results in batch wise
#a neural network has to be trained on certain number of epochs so as to increase accuracy over time, but it shouldn't be small

#in output the last accuracy is 0.86 so out of 100 you have 86 cirrect predictions

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<tensorflow.python.keras.callbacks.History at 0x2599136ba60>

## Making the predictions and evaluating the model

### Predicting the result of a single observation

**Homework**

Use our ANN model to predict if the customer with the following informations will leave the bank:

Geography: France

Credit Score: 600

Gender: Male

Age: 40 years old

Tenure: 3 years

Balance: $ 60000

Number of Products: 2

Does this customer have a credit card ? Yes

Is this customer an Active Member: Yes

Estimated Salary: $ 50000

So, should we say goodbye to that customer ?

In [24]:
print(ann.predict(sc.transform([[1, 0, 0, 600, 1, 40, 3, 60000, 2, 1, 1, 50000]])))   #this will give ypu the probability of that customer
print(ann.predict(sc.transform([[1, 0, 0, 600, 1, 40, 3, 60000, 2, 1, 1, 50000]])) > 0.5)   #adding ">0.5" will give you True or False as 0.5 is the threshold which tells if the predicted probability is greater than 0.5 i.e.if the probability is greater than 0.5 then the customer is more likely to leave the bank but here the probability is less than 0.5 so it's false i.e. the customer won't leave the bank

[[0.05083597]]
[[False]]


Therefore, our ANN model predicts that this customer stays in the bank!

**Important note 1**: Notice that the values of the features were all input in a double pair of square brackets. That's because the "predict" method always expects a 2D array as the format of its inputs. And putting our values into a double pair of square brackets makes the input exactly a 2D array.

**Important note 2**: Notice also that the "France" country was not input as a string in the last column but as "1, 0, 0" in the first three columns. That's because of course the predict method expects the one-hot-encoded values of the state, and as we see in the first row of the matrix of features X, "France" was encoded as "1, 0, 0". And be careful to include these values in the first three columns, because the dummy variables are always created in the first columns.

In [25]:
#predicting the test set results

y_pred = ann.predict(x_test)
y_pred = (y_pred > 0.5)   #">0.5" same as above
print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)) ,1))

[[0 0]
 [0 0]
 [0 0]
 ...
 [1 1]
 [0 1]
 [0 1]]


In [26]:
#confusion matrix

from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)

#1533 correct predictions of customer staying, 190 correct predictions of customer leaving, 74 incorrect predictions of customer leaving the bank, 203 incorrect predictions of customer staying in the bank
#the 203 should be low for good model

[[1533   74]
 [ 203  190]]


0.8615