# Q.1 Theory and concepts 

a. Batch normalization is a technique used to improve the training of deep neural networks by normalizing the inputs of each layer. This is done by standardizing the inputs to have a mean of zero and a variance of one within each mini-batch. Batch normalization helps to mitigate issues related to internal covariate shift, where the distribution of inputs to a layer changes during training.



b. Stabilizes Learning: Normalizing inputs reduces the variation in the distribution of activations, leading to more stable and faster convergence.
Higher Learning Rates: Allows the use of higher learning rates, which can speed up the training process.
Regularization Effect: Acts as a form of regularization, reducing the need for other regularization techniques such as dropout.
Improved Gradient Flow: Helps in maintaining gradients within a reasonable range, reducing issues related to vanishing/exploding gradients.
Reduces Sensitivity to Initialization: Makes the network less sensitive to the initial choice of parameters. 

# Q.2 implementation-

In [1]:
!pip install tensorflow
!pip install keras 

Collecting tensorflow
  Downloading tensorflow-2.17.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (601.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m601.3/601.3 MB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting wrapt>=1.11.0
  Downloading wrapt-1.16.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (80 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m80.3/80.3 kB[0m [31m11.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting tensorflow-io-gcs-filesystem>=0.23.1
  Downloading tensorflow_io_gcs_filesystem-0.37.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.1/5.1 MB[0m [31m59.2 MB/s[0m eta [36m0:00:00[0m00:01[0m:00:01[0m
[?25hCollecting grpcio<2.0,>=1.24.3
  Downloading grpcio-1.65.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.7 MB)
[2K     

In [None]:


# import require libraries
import tensorflow as tf 
from tensorflow.keras import layers,models
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical


#load dataset 

(x_train,y_train),(x_test,y_test) = cifar10.load_data()

# Normalize Pixel
x_train,x_test = x_train/255.,x_test/255.

# One Hot encodding 
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

#  Define the neural network
model = models.Sequential([
    layers.Flatten(input_shape=(32,32,3)),
    layers.Dense(512,activation='relu'),
    layers.Dense(10,activation='softmax')

])

model.compile(
    optimizer='adam',
    loss = 'categorical_crossentropy',
    metrics=['accuracy']

)

history_no_bn = model.fit(x_train,y_train,epochs=10,validation_data=(x_test,y_test))

model_bn = models.Sequential([
    layers.Flatten(input_shape=(32,32,3)),
    layers.Dense(512),
    layers.BatchNormalization(),
    layers.Activation('relu'),
    layers.Dense(10,activation='softmax')

])

model_bn.compile(
    optimizer='adam',
    loss = 'categorical_crossentropy',
    metrics=['accuracy']

)

history_bn = model_bn.fit(x_train,y_train,epochs=10,validation_data=(x_test,y_test))

import matplotlib.pyplot as plt 

plt.figure(figsize=(12,5))
plt.subplot(1,2,1)
plt.plot(history_no_bn.history['accuracy'],label='train (no bn)')
plt.plot(history_no_bn.history['val_accuracy'],label='validation no bn')
plt.plot(history_bn.history['accuracy'],label='train ( bn)')
plt.plot(history_bn.history['val_accuracy'],label='validation bn')
plt.title('model training')
plt.xlabel('train')
plt.ylabel('epochs')
plt.legend()

plt.subplot(1,2,2)
plt.plot(history_no_bn.history['loss'],label='train (no bn)')
plt.plot(history_no_bn.history['val_loss'],label='validation no bn')
plt.plot(history_bn.history['loss'],label='train ( bn)')
plt.plot(history_bn.history['val_loss'],label='validation bn')
plt.title('model loss')
plt.xlabel('loss')
plt.ylabel('epochs')
plt.legend()
plt.tight_layout()
plt.show()

batch_sizes = [32,64,128,256]
result = {}

for batch_size in batch_sizes:
    print(f'traing for batch size {batch_size}')
    model_bn = models.Sequential([
        layers.Flatten(input_shape=(32,32,3)),
        layers.Dense(512),
        layers.BatchNormalization(),
        layers.Activation('relu'),
        layers.Dense(10,activation='softmax')

    ])

    model_bn.compile(
        optimizer='adam',
        loss = 'categorical_crossentropy',
        metrics=['accuracy']

    )

    history = model_bn.fit(x_train,y_train,epochs=10,batch_size=batch_size,validation_data=(x_test,y_test))

    result[batch_size] = history



plt.figure(figsize=(14,10))
for i,batch_size in enumerate(batch_sizes):
  history = result[batch_size]
  plt.subplot(2,2,i+1)
  plt.plot(history.history['accuracy'],label='training accuracy')
  plt.plot(history.history['val_accuracy'],label='validation accuracy')
  plt.title(f'batch size {batch_size}')
  plt.xlabel('train')
  plt.ylabel('epoches')
  plt.legend()

plt.tight_layout()
plt.show()



# Q.3 

Advantages:

Accelerated Training: Batch normalization allows the use of higher learning rates and leads to faster convergence.
Improved Generalization: Acts as a form of regularization, helping to prevent overfitting.
Stabilized Training: Reduces sensitivity to initialization and learning rate choices, leading to more stable training dynamics.
Gradient Flow: Mitigates issues related to vanishing and exploding gradients, particularly in deep networks.
Potential Limitations:

Dependency on Batch Size: The performance of batch normalization can be sensitive to the choice of batch size.
Computational Overhead: Adds some computational overhead due to the additional normalization and parameter scaling operations.
Not Always Effective: In certain cases, other normalization techniques such as layer normalization or instance normalization may be more effective, especially for recurrent neural networks or small batch sizes.
Batch Statistics: Inconsistent batch statistics during training and inference (batch vs. running mean/variance) can lead to discrepancies and require careful handling.