<a href="https://colab.research.google.com/github/rssubramaniyan1/EVA8/blob/main/EVA8_Assignment5_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Normalization function**

The below code defines three types of normalization Group, Batch and Layer normalization.

The below are the links to the papers.

Layer Normilzation  : https://arxiv.org/pdf/1607.06450.pdf

Group Normalization : https://arxiv.org/pdf/1803.08494.pdf

Batch Normalization : https://arxiv.org/pdf/1502.03167.pdf

This normalization function is called in the network (taken from assignment4 best model- attempt 2)

# Batch Normalization:

> Introduced to tackle internal covariate shift. During training as the parameters are learnt from one layer optimised for that layer, while the input in the next layer are different. It causes the slowing down of the training process of a DNN due to slower convergence. 

> BN allows to speed up the process by normalising the input to layers therby ensuring not too much shift in the distribution of inputs between layers leading to faster convergence 

> The mean and standard deviation used for normalization are computed from the data set before the taining the network

>In BN the pixels sharing the same channel index are normalized together, ie for each channel, BN computes mean and standard dev.

# Group Normalization:

>This approach takes the channels divides them into groups and then for each group computes the mean and variance that is used for normalization. 

>The GN is independent of the batch size. 

>GN the mean and standard dev are computed across channels in a given group

> The groups are obtainted by channel/G (G = 32 is the hyper parameter)

> if G=1 then GN becomes layer normalization


# Layer Normalization

> Takes all channels in a current epoch

> Compute the Mean and Standard Dev across the channels 







In [None]:

# define a class with an attribute and three methods for group normalization, layer normalization and L1 Norm Batch normalization
class Normalization:
    def __init__(self, norm_type):
        self.norm_type = norm_type

    def group_norm(self, x, num_channels, num_groups=32, eps=1e-5,gamma=1, beta=0):
        x = x.view(num_groups, -1, num_channels // num_groups).contiguous()
        # contigous() is used to make sure that the tensor is stored in a contiguous block of memory
        # it is not a copy of the original tensor

        # calculate the mean and standard deviation along the second dimension # what is the second dimension with respect to?
        # the second dimension is the dimension of the input channels
        # the second dimension is the dimension along which the mean and standard deviation are calculated

        mean = x.mean(1, keepdim=True) # the 1 in mean(1, keepdim=True) indicates that the mean is calculated along the second dimension
        std = x.std(1, keepdim=True)
        # normalize the input tensor
        x = (x - mean) / (std + 1e-5)
        # reshape the input tensor to the original shape
        x = x.view(-1, num_channels).contiguous()
        # how to learn gamma and beta?
        # gamma and beta are initialized to 1 and 0 respectively
        # and are learned using backpropagation
        # requires_grad=True is used to learn gamma and beta
        # requires_grad = True makes the tensor a leaf node in the computation graph;
        # it is a tensor that requires gradient computation
        # when we call backward() on the loss function, the gradient of the loss function with respect to the leaf nodes is computed
        #
        gamma = torch.ones(num_channels, requires_grad=True)
        beta = torch.zeros(num_channels, requires_grad=True) # requires_grad=True is used to learn gamma and beta

        return x * gamma + beta

    def layer_norm(self, x, num_channels, eps=1e-5,gamma=1, beta=0):
        # Layer Normalization
        # https://arxiv.org/pdf/1607.06450.pdf
        x = x.view(-1, num_channels).contiguous() # view() is used to reshape the input tensor to the desired shape
        # calculate the mean and standard deviation along the second dimension
        mean = x.mean(1, keepdim=True)
        std = x.std(1, keepdim=True)
        # normalize the input tensor
        x = (x - mean) / (std + eps)
        # how to learn gamma and beta?
        # gamma and beta are initialized to 1 and 0 respectively
        # and are learned using backpropagation
        gamma = torch.ones(num_channels, requires_grad=True)
        beta = torch.zeros(num_channels, requires_grad=True)

        return x * gamma + beta

    def l1_norm_bn(self, x, num_channels, eps=1e-5,gamma=1, beta=0):
        # L1 Normalization
        # norm is calculated as the sum of absolute values of the input tensor
        # divided by the number of channels
        mean = x.abs().sum(1, keepdim=True) / num_channels # what is the 1 in sum(1, keepdim=True) indicating?
        # normalize the input tensor
        x = x / (mean + eps)
        gamma = torch.ones(num_channels, requires_grad=True)
        beta = torch.zeros(num_channels, requires_grad=True)
        return x * gamma + beta

    def __call__(self, x, num_channels):
        if self.norm_type == 'GN':
            return self.group_norm(x, num_channels)
        elif self.norm_type == 'LN':
            return self.layer_norm(x, num_channels)
        elif self.norm_type == 'BN':
            return self.l1_norm_bn(x, num_channels)
        else:
            return x