# Batch Normalization

**In hidden layer nodes, input data does not normalize when they generally occur, so the vanishing gradient descent problem comes into the picture. If we want to normalize, we use batch normalization.**  

Batch normalization is a technique used to improve the training of deep neural networks. It helps to stabilize and accelerate the training process by normalizing the inputs of each layer. This is achieved by maintaining the mean and variance of the inputs close to 0 and 1, respectively.

## Key Benefits of Batch Normalization
- **Improves Gradient Flow**: By normalizing the inputs, it helps to mitigate the problem of vanishing and exploding gradients.
- **Reduces Internal Covariate Shift**: It reduces the change in the distribution of network activations, which helps in faster convergence.
- **Acts as a Regularizer**: It introduces a slight noise to the network, which can help in reducing overfitting.

## How Batch Normalization Works
1. **Calculate Mean and Variance**: For each mini-batch, compute the mean and variance of the inputs.
2. **Normalize**: Subtract the mean and divide by the standard deviation to normalize the inputs.
3. **Scale and Shift**: Apply learned parameters (gamma and beta) to scale and shift the normalized inputs.

## Batch Normalization in Practice
In popular deep learning frameworks like TensorFlow and PyTorch, batch normalization can be easily applied using built-in functions.

Batch normalization is a powerful tool that can significantly improve the performance and training speed of deep neural networks.


# Practical

In [14]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

In [15]:
dataset = pd.read_csv(r'C:\Users\shark\Desktop\Deep Learning\CampusX\Dataset\Churn_Modelling.csv')
dataset.head(3)

Unnamed: 0,CreditScore,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,619,42,2,0.0,1,1,1,101348.88,1
1,608,41,1,83807.86,1,0,1,112542.58,0
2,502,42,8,159660.8,3,1,0,113931.57,1


In [16]:
input_data = dataset.iloc[:, :-1]
output_data = dataset.iloc[:, -1]

In [17]:
ss = StandardScaler()
ss.fit_transform(input_data)

array([[-0.32622142,  0.29351742, -1.04175968, ...,  0.64609167,
         0.97024255,  0.02188649],
       [-0.44003595,  0.19816383, -1.38753759, ..., -1.54776799,
         0.97024255,  0.21653375],
       [-1.53679418,  0.29351742,  1.03290776, ...,  0.64609167,
        -1.03067011,  0.2406869 ],
       ...,
       [ 0.60498839, -0.27860412,  0.68712986, ..., -1.54776799,
         0.97024255, -1.00864308],
       [ 1.25683526,  0.29351742, -0.69598177, ...,  0.64609167,
        -1.03067011, -0.12523071],
       [ 1.46377078, -1.04143285, -0.35020386, ...,  0.64609167,
        -1.03067011, -1.07636976]])

In [18]:
input_data = pd.DataFrame(ss.fit_transform(input_data),columns=input_data.columns)
input_data.head(3)
x_train, x_test, y_train, y_test = train_test_split(input_data, output_data, test_size=0.2, random_state=42)

In [19]:
import tensorflow as tf
from keras.layers import Dense, BatchNormalization
from keras.regularizers import l2
from keras.optimizers import Adam
from keras.models import Sequential
from keras.callbacks import EarlyStopping

In [20]:
ann = Sequential()
ann.add(Dense(units=6, activation='relu', input_dim=8,kernel_regularizer=l2(0.01)))
ann.add(BatchNormalization())
ann.add(Dense(units=4, activation='relu',kernel_regularizer=l2(0.01)))
#add regularization to the hidden layers
ann.add(BatchNormalization())
ann.add(Dense(units=2, activation='relu'))
ann.add(BatchNormalization()) #add batch normalization to the hidden layers
ann.add(Dense(units=1, activation='sigmoid'))

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


In [21]:
ann.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

In [22]:
ann.fit(x_train, y_train,batch_size=100, epochs=100,validation_data=(x_test,y_test),callbacks=EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=10)) #Early stopping

Epoch 1/100
[1m80/80[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 5ms/step - accuracy: 0.6547 - loss: 0.8183 - val_accuracy: 0.7560 - val_loss: 0.7271
Epoch 2/100
[1m80/80[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.7181 - loss: 0.7247 - val_accuracy: 0.8025 - val_loss: 0.6672
Epoch 3/100
[1m80/80[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.7799 - loss: 0.6600 - val_accuracy: 0.8045 - val_loss: 0.6179
Epoch 4/100
[1m80/80[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.7840 - loss: 0.6152 - val_accuracy: 0.8075 - val_loss: 0.5753
Epoch 5/100
[1m80/80[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.7961 - loss: 0.5643 - val_accuracy: 0.8180 - val_loss: 0.5317
Epoch 6/100
[1m80/80[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.8057 - loss: 0.5195 - val_accuracy: 0.8280 - val_loss: 0.4940
Epoch 7/100
[1m80/80[0m [32m━━━

<keras.src.callbacks.history.History at 0x1e02ade6420>

In [23]:
from sklearn.metrics import confusion_matrix, accuracy_score

In [24]:
train_accuracy = ann.history.history['accuracy']
val_accuracy = ann.history.history['val_accuracy']
print(train_accuracy[-1])
print(val_accuracy[-1])

0.8523749709129333
0.8600000143051147
