# Artificial Neural Network

### Importing the libraries

In [41]:
import numpy as np
import pandas as pd
import tensorflow as tf

In [43]:
tf.__version__

'2.4.1'

## Part 1 - Data Preprocessing

### Importing the dataset

In [45]:
dataset = pd.read_csv('Churn_Modelling.csv')
X = dataset.iloc[:, 3:-1].values
y = dataset.iloc[:, -1].values

In [47]:
print(X)

[[619 'France' 'Female' ... 1 1 101348.88]
 [608 'Spain' 'Female' ... 0 1 112542.58]
 [502 'France' 'Female' ... 1 0 113931.57]
 ...
 [709 'France' 'Female' ... 0 1 42085.58]
 [772 'Germany' 'Male' ... 1 0 92888.52]
 [792 'France' 'Female' ... 1 0 38190.78]]


In [48]:
print(y)

[1 0 1 ... 1 1 0]


### Encoding categorical data

Label Encoding the "Gender" column

In [49]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
X[:, 2] = le.fit_transform(X[:, 2])

In [50]:
print(X)

[[619 'France' 0 ... 1 1 101348.88]
 [608 'Spain' 0 ... 0 1 112542.58]
 [502 'France' 0 ... 1 0 113931.57]
 ...
 [709 'France' 0 ... 0 1 42085.58]
 [772 'Germany' 1 ... 1 0 92888.52]
 [792 'France' 0 ... 1 0 38190.78]]


One Hot Encoding the "Geography" column

In [51]:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [1])], remainder='passthrough')
X = np.array(ct.fit_transform(X))

In [52]:
print(X)

[[1.0 0.0 0.0 ... 1 1 101348.88]
 [0.0 0.0 1.0 ... 0 1 112542.58]
 [1.0 0.0 0.0 ... 1 0 113931.57]
 ...
 [1.0 0.0 0.0 ... 0 1 42085.58]
 [0.0 1.0 0.0 ... 1 0 92888.52]
 [1.0 0.0 0.0 ... 1 0 38190.78]]


### Splitting the dataset into the Training set and Test set

In [53]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

### Feature Scaling

In [54]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

## Part 2 - Building the ANN

### Initializing the ANN

In [55]:
"""
Billions of neurons in human brain
Deep learning is to mimic human brain

Neuron (Biology):
Neurons --> dendrites --> axon --> nerotransmitter --> receptor

Node (Programming):
Input value(senses?)(adjust by weights) --> neuron(rectifier) --> output signal(movements?)(sigmoid)


Need to feature scale data first (standardize/normalize input value)
Artificial Nerual Net:
Input layer --> hidden layers(a lot of it!) --> output layer
         (connect)       (connect)       (connect)

Activation function (probability of y being 1/0 or yes/no):
    Threshold: 1 if x >= 0 else 0
    Sigmoid: Smoother than threshold, from 0 to 1
    Rectifier: max(x, 0)
    Hyperbolic tangent: from -1 to 1

E.g.
Input: Area, Bedrooms, Distance to city, Age
Can apply machine learn algorithms here (without hidden layer)

Hidden layer(top neuron): [name, weights]: [Area, 1], [Bedrooms, 0], [Distance, 1], [Age, 0]
Activate only when certain criteria is met (found out only area and distance is important)

Hidden layer(middle neuron): picked: Area, Bedrooms, Age
(maybe a large family looking for properties)

Hidden layer(bottom neuron): picked: Age

Increase flexbility of prediction
Output: price

Train:
Then compare output to actual using Cost = 1/2 * (out - actual)^2 (smaller Cost better result)
Then it goes back to the model and updated the weights of different neurons (back propagation)

How to adjust weights?
Batch gradient descent (same result):
Calculate the slope(roll the ball left to right / right to left in a bowl)(actually is zigzag action)
adjust weights after a complete run

Stochastic gradient descent (faster)(random):
if Cost function is not convex, batch gradient descent can find a local minimum instead of absolute minimum
adjust weights in every row

Mini-batch gradient descent:
choose # of rows between 1 to whole file

"""

ann = tf.keras.models.Sequential()

### Adding the input layer and the first hidden layer

In [56]:
ann.add(tf.keras.layers.Dense(units=6, activation='relu'))

### Adding the second hidden layer

In [57]:
ann.add(tf.keras.layers.Dense(units=6, activation='relu'))

### Adding the output layer

In [58]:
ann.add(tf.keras.layers.Dense(units=1, activation='sigmoid')) # units matches output dimensions (in binary numbers)
# activation will be softmax for nonbinary output

## Part 3 - Training the ANN

### Compiling the ANN

In [59]:
ann.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
# adam is Stochastic gradient descent, loss for binary: binary_crossentropy, non-binary: categorical_crossentropy

### Training the ANN on the Training set

In [60]:
ann.fit(X_train, y_train, batch_size = 32, epochs = 40) # usually 32 for batch_size

Epoch 1/40
Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40
Epoch 9/40
Epoch 10/40
Epoch 11/40
Epoch 12/40
Epoch 13/40
Epoch 14/40
Epoch 15/40
Epoch 16/40
Epoch 17/40
Epoch 18/40
Epoch 19/40
Epoch 20/40
Epoch 21/40
Epoch 22/40
Epoch 23/40
Epoch 24/40
Epoch 25/40
Epoch 26/40
Epoch 27/40
Epoch 28/40
Epoch 29/40
Epoch 30/40
Epoch 31/40
Epoch 32/40
Epoch 33/40
Epoch 34/40
Epoch 35/40
Epoch 36/40
Epoch 37/40
Epoch 38/40
Epoch 39/40
Epoch 40/40


<tensorflow.python.keras.callbacks.History at 0x26d1b66fbe0>

## Part 4 - Making the predictions and evaluating the model

### Predicting the result of a single observation

**Homework**

Use our ANN model to predict if the customer with the following informations will leave the bank: 

Geography: France

Credit Score: 600

Gender: Male

Age: 40 years old

Tenure: 3 years

Balance: \$ 60000

Number of Products: 2

Does this customer have a credit card? Yes

Is this customer an Active Member: Yes

Estimated Salary: \$ 50000

So, should we say goodbye to that customer?

**Solution**

In [61]:
print(ann.predict(sc.transform([[1, 0, 0, 600, 1, 40, 3, 60000, 2, 1, 1, 50000]])) > 0.5) 
# if predicted prob is > 0.5, we consider it to be 1 else 0

[[False]]


Therefore, our ANN model predicts that this customer stays in the bank!

**Important note 1:** Notice that the values of the features were all input in a double pair of square brackets. That's because the "predict" method always expects a 2D array as the format of its inputs. And putting our values into a double pair of square brackets makes the input exactly a 2D array.

**Important note 2:** Notice also that the "France" country was not input as a string in the last column but as "1, 0, 0" in the first three columns. That's because of course the predict method expects the one-hot-encoded values of the state, and as we see in the first row of the matrix of features X, "France" was encoded as "1, 0, 0". And be careful to include these values in the first three columns, because the dummy variables are always created in the first columns.

### Predicting the Test set results

In [62]:
y_pred = ann.predict(X_test)
y_pred = (y_pred > 0.5)
print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1))

[[0 0]
 [0 1]
 [0 0]
 ...
 [0 0]
 [0 0]
 [0 0]]


### Making the Confusion Matrix

In [63]:
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)

[[1513   82]
 [ 201  204]]


0.8585