<a href="https://colab.research.google.com/github/rida-manzoor/DL/blob/main/Batch_Normalization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Batch Normalization
Batch normalization (also known as batch norm) is a method used to make training of artificial neural networks faster and more stable through normalization of the layers' inputs by re-centering and re-scaling. It was proposed by Sergey Ioffe and Christian Szegedy in 2015.

[Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](http://proceedings.mlr.press/v37/ioffe15.pdf)

- It consist of normalizaing activation vector from hidden layer using mean and variance of current batch. This normalization is applied before the non linear function.

## Why Batch Normalization?


The main reasons for using Batch Normalization are:

1. **Stabilizing Learning:**
   - Batch Normalization helps stabilize and accelerate the training of deep neural networks by normalizing the input to each layer.
   - It reduces the internal covariate shift, which is the change in the distribution of the input to a neural network's layer during training. This can make training more challenging, especially in deeper networks.

2. **Faster Convergence:**
   - By normalizing the inputs, Batch Normalization reduces the risk of vanishing or exploding gradients. This enables the use of higher learning rates, leading to faster convergence during training.

3. **Reduced Sensitivity to Initialization:**
   - Batch Normalization reduces the sensitivity of a network to the choice of initial weights. This allows for more flexibility in choosing initial weights and simplifies the training process.

4. **Regularization Effect:**
   - Batch Normalization acts as a form of regularization by adding a slight noise to the activations during training. This noise can act as a form of implicit regularization, helping prevent overfitting.

5. **Applicability to Various Activation Functions:**
   - Batch Normalization works well with a variety of activation functions, including sigmoid, tanh, and rectified linear units (ReLU). This flexibility makes it suitable for different network architectures.

6. **Facilitates Deeper Networks:**
   - Batch Normalization enables the training of deeper neural networks by addressing issues such as vanishing gradients and *internal covariate shift*. Deeper networks can capture more complex features and relationships in data.



**Covariant Shift**

Covariate shift refers to a situation in which the distribution of input features (covariates) to a machine learning model changes between the training and testing phases. In other words, the relationship between the input variables and the target variable may differ between the training data and the data the model encounters during testing or deployment.


**Internal COvariant shift**

We define internal covariant shift as a change in the distribution of network activations due to change in network parameters during training.

## **How Batch Normalization Work?**

1. When we apply batch normalization, we use mini batch gradient descent.
2. It is applied on each layer of NN (Can be applied to individual layer)



In [1]:
# prompt: make a data frame of two features cgpa and iq and target column is placed. Target column has two classes. There should be 5 rows in dataframe

import pandas as pd

df = pd.DataFrame({
    'cgpa': [3.0, 3.5, 4.0, 3.2, 3.7],
    'iq': [120, 125, 130, 127, 132],
    'target': [0, 1, 0, 0, 1]
})

df


Unnamed: 0,cgpa,iq,target
0,3.0,120,0
1,3.5,125,1
2,4.0,130,0
3,3.2,127,0
4,3.7,132,1


![alt](https://www.includehelp.com/python/images/one-hidden-layer-simplest-neural-network.jpg)


**Forward Propagation**

1. Calculate 'z'

$$ z_1 = w_1 cgpa + w_2 Iq + b $$

$$ z_2 = w_1 cgpa + w_2 Iq + b $$
2. Calculate activation

$$ a_11 = F(z_1) $$

$$ a_12 = F(z_2) $$

When Batch Normalization is applied to this hidden layer, it will normalize output of activation functions(Neurons) that means their mean will be zero and standard deviation is 1.

1. Create mini batch of dataset
2. Calculate z for each neuron
3. Normalize each 'z' value

$$μ_B = \frac{1}{m}∑_{i=1}^{m}z^i$$

$$ σ_B = \sqrt(\frac{1}{m}∑_{i=1}^{m}z^i-μ_B) $$

$$ z_(11)^i = \frac{z_(11)^i - μ_B}{σ_B + ϵ} $$

$$ γz_(11)^i + β $$

Here gama and beta are learnable parameters. In keras implementation, initial value of gamma is 1 and beta is 0.
4. Calculate activation

    a = g(z)


> For each neuron batch normalization have 4 parameters, gama and beta are learnable parameters and mean and sigma are nonlearnable parameters.

## During Test

Exponentially weighted moving average (EWMA) maintained during training for each batch.


## Advantages
1. Training more stable
2. Training faster
3. It can act as regularizer.
4. Weight init impact reduced

In [2]:
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, BatchNormalization
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Preprocess the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# Create a model with Batch Normalization
model_with_batchnorm = Sequential([
    Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
    BatchNormalization(),  # Batch Normalization layer after activation
    Dense(32, activation='relu'),
    BatchNormalization(),  # Batch Normalization layer after activation
    Dense(3, activation='softmax')
])

# Compile the model with Batch Normalization
model_with_batchnorm.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train the model with Batch Normalization
model_with_batchnorm.fit(X_train, y_train, epochs=50, batch_size=32, validation_split=0.1, verbose=0)

# Evaluate the model with Batch Normalization on test data
loss_with_batchnorm, accuracy_with_batchnorm = model_with_batchnorm.evaluate(X_test, y_test)
print(f'Model with Batch Normalization - Test Accuracy: {accuracy_with_batchnorm:.4f}')


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 42ms/step - accuracy: 0.9333 - loss: 0.2210
Model with Batch Normalization - Test Accuracy: 0.9333
