**Training the ANN with Stochaistic Gradient Descent**

- Step 1: Randomly initialise the weights to small numbers close to 0 (but not 0).
- Step 2: Input the first observation of your dataset in the input layer, each feature in one input mode. 
- Step 3: *Forward-Propagation*: from left to right, the neurons are activated in a way that the impact of each neuron's activation is limited by the weights. Propagate the activations until getting the predicted result y.
- Step 4: Compare the predicted result to the actual result. Measure the generated error.
- Step 5: *Back-Porpagation*: from right to left, the error is back-propagated. Update the weights according to how much they are responsible for the error. The learning rate decides by how much we update the weights.
- Step 6: Repeat Steps 1-5 and update the weights after each observation (Reinforcement Learning). Or: Repeat Steps 1 tp 5 but update the weights only after a batch of observations (Batch Learning)
- Step 7: When the whole training set passed through the ANN, that makes an **epoch**. Redo more epochs.


- Geodemographic segmentation model

In [1]:
# Import the libraries

import numpy as np
import pandas as pd
import tensorflow as tf

In [2]:
tf.__version__

'2.11.0'

In [3]:
# Part1: Data Preprocessing
# Importing the dataset
dataset = pd.read_csv("Churn_Modelling.csv")
X = dataset.iloc[:,3:-1].values
y = dataset.iloc[:,-1].values

In [4]:
print(X)

[[619 'France' 'Female' ... 1 1 101348.88]
 [608 'Spain' 'Female' ... 0 1 112542.58]
 [502 'France' 'Female' ... 1 0 113931.57]
 ...
 [709 'France' 'Female' ... 0 1 42085.58]
 [772 'Germany' 'Male' ... 1 0 92888.52]
 [792 'France' 'Female' ... 1 0 38190.78]]


In [5]:
print(y)

[1 0 1 ... 1 1 0]


In [6]:
# Encoding categorical data
# Label encoding the "Gender" column
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
X[:, 2] = le.fit_transform(X[:, 2])

In [7]:
print(X)

[[619 'France' 0 ... 1 1 101348.88]
 [608 'Spain' 0 ... 0 1 112542.58]
 [502 'France' 0 ... 1 0 113931.57]
 ...
 [709 'France' 0 ... 0 1 42085.58]
 [772 'Germany' 1 ... 1 0 92888.52]
 [792 'France' 0 ... 1 0 38190.78]]


In [8]:
# One Hot Encoding the "Geograpgy" column
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [1])], remainder='passthrough')
X = np.array(ct.fit_transform(X))

In [9]:
print(X)

[[1.0 0.0 0.0 ... 1 1 101348.88]
 [0.0 0.0 1.0 ... 0 1 112542.58]
 [1.0 0.0 0.0 ... 1 0 113931.57]
 ...
 [1.0 0.0 0.0 ... 0 1 42085.58]
 [0.0 1.0 0.0 ... 1 0 92888.52]
 [1.0 0.0 0.0 ... 1 0 38190.78]]


In [10]:
print(X[0])

[1.0 0.0 0.0 619 0 42 2 0.0 1 1 1 101348.88]


In [11]:
# Splitting the dataset into Training and testing set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,
                                                    test_size=0.2,
                                                    random_state=0)

In [12]:
# Feature scaling : This step is compulsary for 
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()

X_train_scaled = sc.fit_transform(X_train)
# We don't fit_transform again as that would cause data leakage
X_test_scaled = sc.transform(X_test)

In [13]:
# Part2: Building the ANN

In [14]:
# Initialize the ANN
ann = tf.keras.models.Sequential()

In [15]:
# Adding the input layer and the first hidden layer
ann.add(tf.keras.layers.Dense(units=6, activation='relu'))

In [16]:
# Adding the second hidden layer
ann.add(tf.keras.layers.Dense(units=6, activation='relu'))

In [17]:
# Adding the output layer. Binary unit=1,
# but if we have A,B,C as y, we use unit = 2 
# one hot encode these i.e A = 1 0 0, B = 0 1 0, C = 0 0 1
# sigmoid activation function for output layer as that will return probabilty
# for non binary we also set activation as softmax
ann.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))

In [18]:
# Part3: Training the ANN

In [19]:
# Compiling the ANN
# for non binary we use loss = categorical_crossentropy
ann.compile(optimizer='adam', loss='binary_crossentropy', metrics= ['accuracy'])

In [22]:
# Training the ANN on the Training set
ann.fit(X_train_scaled, y_train, batch_size = 32, epochs=100)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.History at 0x7fda3fb55c40>

In [None]:
# Part 4: Making the predicions and evaluating the model

**Observation**

Use our ANN model to predict if the customer with the following informations will leave the bank: 

Geography: France

Credit Score: 600

Gender: Male

Age: 40 years old

Tenure: 3 years

Balance: \$ 60000

Number of Products: 2

Does this customer have a credit card? Yes

Is this customer an Active Member: Yes

Estimated Salary: \$ 50000

So, should we say goodbye to that customer?

In [25]:
# Predicting the result of a single observation
print(f"Probability: {ann.predict(sc.transform([[1,0,0,600,1,40,3,60000,2,1,1,50000]]))}")
print(f"Outcome (False/True): {ann.predict(sc.transform([[1,0,0,600,1,40,3,60000,2,1,1,50000]]))>0.5}")

Probability: [[0.0294814]]
Outcome (False/True): [[False]]


In [27]:
# Predicting the test set results
y_pred = ann.predict(X_test_scaled)
# To get True/False or 0/1
y_pred = y_pred > 0.5
print(np.concatenate(
    (y_pred.reshape(len(y_pred),1),
     y_test.reshape(len(y_test),1)),
     1))

[[0 0]
 [0 1]
 [0 0]
 ...
 [0 0]
 [0 0]
 [0 0]]


In [33]:
# Making the Confusion Matrix and calculating accuracy
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test,y_pred)
acc = accuracy_score(y_test, y_pred)
print(f"Confusion Matrix:\n {cm}\n Accuracy: {acc}")

Confusion Matrix:
 [[1511   84]
 [ 202  203]]
 Accuracy: 0.857
