# Artificial Neural Network Demo

# Importing Libraries
## part 1- Data Preprocessing
## part 2- Building ANN
## part 3- Training ANN, Compiling ANN
## part 4- Making predictions and evaluating model

### Importing the libraries

In [1]:
#for using tensorflow for first time you need to install it
!pip install tensorflow





In [5]:
import numpy as np
import pandas as pd
import tensorflow as tf

In [6]:
#check tensorflow version
tf.__version__

'2.11.0'

## Part 1 - Data Preprocessing

### Importing the dataset

In [7]:
#import dataset
dataset = pd.read_csv('Churn_Modelling.csv')

#take all the columns except last column(output) 
#leave first three columns(which dont have impact on the output) 
#store in X which is independent variable
X = dataset.iloc[:, 3:-1].values   #start from column index 3 till one before last column

'''[:, 3:-1]: The colon : before the comma represents all rows, 
and 3:-1 specifies columns from the 4th column (index 3)
up to the second-to-last column (excluding the last column).'''


#take the last column(ouptput) which is dependent variable
y = dataset.iloc[:, -1].values   

In [8]:
#have a look at independent variables 
print(X)

[[619 'France' 'Female' ... 1 1 101348.88]
 [608 'Spain' 'Female' ... 0 1 112542.58]
 [502 'France' 'Female' ... 1 0 113931.57]
 ...
 [709 'France' 'Female' ... 0 1 42085.58]
 [772 'Germany' 'Male' ... 1 0 92888.52]
 [792 'France' 'Female' ... 1 0 38190.78]]


In [9]:
#have a look at independent variable
print(y)

[1 0 1 ... 1 1 0]


### Encoding categorical data

Label Encoding the "Gender" column

In [10]:
from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()

#label encoding for gender column which is column with index 2
X[:, 2] = le.fit_transform(X[:, 2])

In [11]:
print(X)

[[619 'France' 0 ... 1 1 101348.88]
 [608 'Spain' 0 ... 0 1 112542.58]
 [502 'France' 0 ... 1 0 113931.57]
 ...
 [709 'France' 0 ... 0 1 42085.58]
 [772 'Germany' 1 ... 1 0 92888.52]
 [792 'France' 0 ... 1 0 38190.78]]


One Hot Encoding the "Geography" column

In [12]:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [1])], remainder='passthrough')

'''
The first transformer is specified as ('encoder', OneHotEncoder(), [1]):
---'encoder': A custom name for this transformation step (you can choose any name).
---OneHotEncoder(): The transformation method to be applied (one-hot encoding).
---[1]: The index (or indices) of the column(s) to be transformed using this method.
In this case, it’s column 1 (assuming 0-based indexing).

remainder='passthrough':
Indicates what to do with the columns not explicitly transformed.
'passthrough' means that any columns not specified 
(other than the one at index 1) will be kept unchanged in the output.
'''

#execute the encoding
X = np.array(ct.fit_transform(X))

In [13]:
print(X)

[[1.0 0.0 0.0 ... 1 1 101348.88]
 [0.0 0.0 1.0 ... 0 1 112542.58]
 [1.0 0.0 0.0 ... 1 0 113931.57]
 ...
 [1.0 0.0 0.0 ... 0 1 42085.58]
 [0.0 1.0 0.0 ... 1 0 92888.52]
 [1.0 0.0 0.0 ... 1 0 38190.78]]


### Splitting the dataset into the Training set and Test set

In [14]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

'''
Parameters:
X: The feature matrix (input data).
y: The target vector (output labels).
test_size: The proportion of data allocated for testing (e.g., 0.2 means 20% for testing).
random_state: Seed for the random number generator (ensures reproducibility).
'''

'\nParameters:\nX: The feature matrix (input data).\ny: The target vector (output labels).\ntest_size: The proportion of data allocated for testing (e.g., 0.2 means 20% for testing).\nrandom_state: Seed for the random number generator (ensures reproducibility).\n'

### Feature Scaling

In [15]:
'''
Feature scaling ensures that all features are on a comparable scale 
and have similar ranges.

Why Use Feature Scaling?

Equal Contribution: Scaling guarantees each feature contributes equally in training. 
Without scaling, larger-scale features might dominate, 
leading to skewed results.

Algorithm Performance: Many algorithms  perform better or converge faster
when features are scaled.

Numerical Stability: Avoiding significant scale differences between 
features prevents numerical instability issues during calculations.
'''
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()

#apply scaling on both training and testing data
'''
Two Steps Combined:

Fit: Calculates the mean and standard deviation (or other parameters) 
from the training data.

Transform: Applies the calculated parameters to standardize the data.

so for X_train we apply fit_transform

but for case of X_test we simply appy transform 
Applies the same scaling parameters (calculated during training) 
to the test data.
Ensures consistency between training and testing data.
'''
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

## Part 2 - Building the ANN

### Initializing the ANN

In [16]:
'''
What is a Sequential Model?
A Sequential model is a linear stack of layers in deep learning.
It’s the simplest type of neural network architecture.
Layers are added sequentially, one after the other.
'''
ann = tf.keras.models.Sequential()

### Adding the input layer and the first hidden layer

In [17]:
#This line adds a hidden layer to the neural network model.
#The layer has 6 neurons, and the ReLU activation function is applied 
#Hidden layers help the model learn complex patterns from the input data.


ann.add(tf.keras.layers.Dense(units=6, activation='relu'))

'''
ann: Refers to the Sequential model (already initialized

add(): Method to add a layer to the model.

Dense: A fully connected layer (also known as a dense layer).

units=6: Specifies the number of neurons (units) in this layer. 
In this case, there are 6 neurons.

activation='relu': Specifies the activation function for these neurons. 
Here, it’s the Rectified Linear Unit (ReLU).
ReLU outputs the input directly if it’s positive.
Otherwise, it outputs zero.
Mathematically: f(x)=max(0,x)
'''

"\nann: Refers to the Sequential model (already initialized\n\nadd(): Method to add a layer to the model.\n\nDense: A fully connected layer (also known as a dense layer).\n\nunits=6: Specifies the number of neurons (units) in this layer. \nIn this case, there are 6 neurons.\n\nactivation='relu': Specifies the activation function for these neurons. \nHere, it’s the Rectified Linear Unit (ReLU).\nReLU outputs the input directly if it’s positive.\nOtherwise, it outputs zero.\nMathematically: f(x)=max(0,x)\n"

### Adding the second hidden layer

In [18]:
ann.add(tf.keras.layers.Dense(units=6, activation='relu'))

### Adding the output layer

In [19]:
ann.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))

'''
ann: Refers to the Sequential model (already initialized).

add(): Method to add a layer to the model.

Dense: A fully connected layer (also known as a dense layer).

units=1: Specifies the number of neurons (units) in this layer. 
In this case, there is only 1 neuron.
activation='sigmoid': Specifies the activation function 
for this output layer. Here, it’s the sigmoid function.
A sigmoid function is a mathematical function 
with a characteristic “S”-shaped curve. 
It transforms any value in the domain (−∞,∞)  to a number between 0 and 1
'''

"\nann: Refers to the Sequential model (already initialized).\n\nadd(): Method to add a layer to the model.\n\nDense: A fully connected layer (also known as a dense layer).\n\nunits=1: Specifies the number of neurons (units) in this layer. \nIn this case, there is only 1 neuron.\nactivation='sigmoid': Specifies the activation function \nfor this output layer. Here, it’s the sigmoid function.\nA sigmoid function is a mathematical function \nwith a characteristic “S”-shaped curve. \nIt transforms any value in the domain (−∞,∞)  to a number between 0 and 1\n"

## Part 3 - Training the ANN

### Compiling the ANN

In [20]:
#This line sets up the model for training
#by specifying how it should learn and evaluate itself.
#The optimizer, loss function, and evaluation metric are crucial 
#for model performance.


ann.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

'''
ann: Refers to the Sequential model (already initialized).

compile(): Method to configure the model for training.

optimizer='adam': Specifies the optimization algorithm. 
Here, we use the Adam optimizer, which adapts the learning rate during training.

loss='binary_crossentropy': Defines the loss function. 
It’s a loss function used in binary classification problems.
Specifically designed for models that predict probabilities.
Measures the difference between predicted probabilities and actual class labels.

metrics=['accuracy']: Specifies evaluation metrics. 
In this case, we track accuracy during training.

'''

"\nann: Refers to the Sequential model (already initialized).\n\ncompile(): Method to configure the model for training.\n\noptimizer='adam': Specifies the optimization algorithm. \nHere, we use the Adam optimizer, which adapts the learning rate during training.\n\nloss='binary_crossentropy': Defines the loss function. \nIt’s a loss function used in binary classification problems.\nSpecifically designed for models that predict probabilities.\nMeasures the difference between predicted probabilities and actual class labels.\n\nmetrics=['accuracy']: Specifies evaluation metrics. \nIn this case, we track accuracy during training.\n\n"

### Training the ANN on the Training set

In [21]:
#This line trains the neural network model using the provided training data.
#The model learns from the features (X_train) 
#and adjusts its weights to minimize the specified loss function 
#over the given number of epochs.

ann.fit(X_train, y_train, batch_size = 32, epochs = 100)


'''
ann: Refers to the neural network model (already defined and compiled).

fit(): Method to train the model using the specified training data.

X_train: The feature matrix (input data) for training.

y_train: The corresponding target labels (output values) for training.

batch_size=32: Specifies the number of samples in each batch during training 
(helps with memory efficiency and convergence).

epochs=100: The number of times the entire dataset is passed through 
the model during training.
'''

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

'\nann: Refers to the neural network model (already defined and compiled).\n\nfit(): Method to train the model using the specified training data.\n\nX_train: The feature matrix (input data) for training.\n\ny_train: The corresponding target labels (output values) for training.\n\nbatch_size=32: Specifies the number of samples in each batch during training \n(helps with memory efficiency and convergence).\n\nepochs=100: The number of times the entire dataset is passed through \nthe model during training.\n'

## Part 4 - Making the predictions and evaluating the model

### Predicting the result of a single observation

**Homework**

Use our ANN model to predict if the customer with the following informations will leave the bank: 

Geography: France

Credit Score: 600

Gender: Male

Age: 40 years old

Tenure: 3 years

Balance: \$ 60000

Number of Products: 2

Does this customer have a credit card ? Yes

Is this customer an Active Member: Yes

Estimated Salary: \$ 50000

So, should we say goodbye to that customer ?

**Solution**

In [22]:
#The ann.predict method is used to obtain predictions 
#from a trained neural network model.

print(ann.predict(sc.transform([[1, 0, 0, 600, 1, 40, 3, 60000, 2, 1, 1, 50000]])) > 0.5)

'''
And here we choose a threshold of 0.5

to say that if that predicted probability

is larger than 0.5,

well we will consider the final result to be one,
'''

[[False]]


'\nAnd here we choose a threshold of 0.5\n\nto say that if that predicted probability\n\nis larger than 0.5,\n\nwell we will consider the final result to be one,\n'

Therefore, our ANN model predicts that this customer stays in the bank!

**Important note 1:** Notice that the values of the features were all input in a double pair of square brackets. That's because the "predict" method always expects a 2D array as the format of its inputs. And putting our values into a double pair of square brackets makes the input exactly a 2D array.

**Important note 2:** Notice also that the "France" country was not input as a string in the last column but as "1, 0, 0" in the first three columns. That's because of course the predict method expects the one-hot-encoded values of the state, and as we see in the first row of the matrix of features X, "France" was encoded as "1, 0, 0". And be careful to include these values in the first three columns, because the dummy variables are always created in the first columns.

### Predicting the Test set results

In [23]:
y_pred = ann.predict(X_test)
y_pred = (y_pred > 0.5)

#The concatenated array shows predicted labels alongside true labels 
#for comparison.
print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1))

'''
np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)), 1):

np.concatenate: A NumPy function that combines arrays along a specified axis.

y_pred and y_test are arrays containing predicted labels and true labels, 
respectively.

reshape(len(y_pred), 1): Reshapes each array to have a single column 
(vertical shape).

axis=1: Specifies concatenation along columns (horizontally).
'''

[[0 0]
 [0 1]
 [0 0]
 ...
 [0 0]
 [0 0]
 [0 0]]


'\nnp.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)), 1):\n\nnp.concatenate: A NumPy function that combines arrays along a specified axis.\n\ny_pred and y_test are arrays containing predicted labels and true labels, \nrespectively.\n\nreshape(len(y_pred), 1): Reshapes each array to have a single column \n(vertical shape).\n\naxis=1: Specifies concatenation along columns (horizontally).\n'

### Making the Confusion Matrix

In [24]:
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)

'''
A confusion matrix (also known as an error matrix) is a table 
that summarizes the performance of a machine learning model 
on a set of test data. It provides a detailed breakdown of the 
number of accurate and inaccurate predictions made by the model.

Here’s how it works:

Each row of the matrix represents instances from an actual class (ground truth).

Each column represents instances predicted by the model.

The diagonal of the matrix shows correctly predicted instances 
(true positives and true negatives).

Off-diagonal elements represent misclassifications 
(false positives and false negatives).

'''



[[1526   69]
 [ 198  207]]


'\nA confusion matrix (also known as an error matrix) is a table \nthat summarizes the performance of a machine learning model \non a set of test data. It provides a detailed breakdown of the \nnumber of accurate and inaccurate predictions made by the model.\n\nHere’s how it works:\n\nEach row of the matrix represents instances from an actual class (ground truth).\n\nEach column represents instances predicted by the model.\n\nThe diagonal of the matrix shows correctly predicted instances \n(true positives and true negatives).\n\nOff-diagonal elements represent misclassifications \n(false positives and false negatives).\n\n'

<table>
  <tr>
    <th></th>
    <th>Predicted Positive</th>
    <th>Predicted Negative</th>
  </tr>
  <tr>
    <td>Actual Positive</td>
    <td>True Positive (TP)</td>
    <td>False Negative (FN)</td>
  </tr>
  <tr>
    <td>Actual Negative</td>
    <td>False Positive (FP)</td>
    <td>True Negative (TN)</td>
  </tr>
</table>

<p><strong>Accuracy:</strong></p>
<p>Accuracy measures the proportion of correct predictions (both true positives and true negatives) out of all predictions made.</p>
<p>It answers the question: “How often was the model correct?”</p>
<p>Formula: \( \text{Accuracy} = \frac{TP + TN}{TP + FN + FP + TN} \)</p>

<p><strong>Precision:</strong></p>
<p>Precision focuses on the accuracy of positive predictions.</p>
<p>It answers the question: “When the model predicted TRUE, how often was it right?”</p>
<p>Precision is crucial when the cost of false positives (Type I errors) is high.</p>
<p>Formula: \( \text{Precision} = \frac{TP}{TP + FP} \)</p>

<p><strong>Recall (Sensitivity):</strong></p>
<p>Recall measures the proportion of actual positive instances correctly identified by the model.</p>
<p>It answers the question: “Out of all actual positive cases, how many did the model predict correctly?”</p>
<p>Recall is important when minimizing false negatives (Type II errors) is critical.</p>
<p>Formula: \( \text{Recall} = \frac{TP}{TP + FN} \)</p>

<p>These metrics help us evaluate the performance of a classification model. Remember that the choice of metric depends on the specific problem and the associated costs of different types of errors.</p>
