The most popular version of the normalization layer is the batch-norm layer.

Let's consider its operation in the simplest case, when a batch of one-dimensional vectors is supplied to the input:
A batch of one-dimensional vectors is supplied as input:

*   A batch of one-dimensional vectors is supplied as input:


![](https://ucarecdn.com/c168101b-dc7d-4832-94e2-20de6e43c54f/)

where **j** is the vector index inside the batch, **i** is the component number.

For the current batch:
* For each input component, the expected value and variance are calculated:

![](https://ucarecdn.com/e31fefb1-8398-4b7a-91ac-961278493d3e/)
![](https://ucarecdn.com/2bd38d22-00c5-4a7d-ac18-2c4fbf8e0ef2/)

* The input is normalized by the formula:
![](https://ucarecdn.com/dd34accb-876d-46f5-b092-5cd167e704d5/)

Epsilon is needed for the zero variance case.

* The normalized input is converted as follows:

![](https://ucarecdn.com/4ffd57b4-3b7f-4823-87e6-6270c3b24120/)

Where  **Gamma** and **Beta** are the learnable parameters of the layer. Please note that Gamma and Beta are vectors of the same length as the input instances.

They can be fixed, for example, the simplest case - Beta is assumed to be equal to the zero vector, Gamma - to the vector of ones.

If we take Gamma equal to the denominator of the fraction from the formula for Z, and Beta equal to the mathematical expectation, then the layer will return the input tensor unchanged. That is, the layer will be equivalent to the identity function.



Thus, the Beta and Gamma parameters make it possible not to lose information entering the layer, and at the same time, the batch norm layer normalizes the input. The latter speeds up the convergence of network parameters, and in some cases it is extremely difficult to achieve network convergence without normalization.

The final formula for converting the input is:
![](https://ucarecdn.com/e790660f-54c0-4c5e-aa58-4c17ae050594/)


# Problems 1 (5 Points)

In this task, you need to implement the batch normalization function without using the [standard function](https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm1d.html#torch.nn.BatchNorm1d) with the following simplifications:

* The Beta parameter is set to 0.
* The Gamma parameter is taken equal to 1.
* The function should work correctly only during the training phase.
* The input has the dimension number of elements in the batch * length of each instance.


> Look very carefully at the definition of the [function](https://pytorch.org/docs/stable/torch.html#torch.std) that calculates std.

In [None]:
import numpy as np
import torch
import torch.nn as nn

def custom_batch_norm1d(input_tensor, eps):
    mean = torch.mean(input_tensor, dim=0)
    var = torch.var(input_tensor, dim=0, unbiased=False)
    normed_tensor = (input_tensor - mean) / torch.sqrt(var + eps)
    return normed_tensor

input_tensor = torch.Tensor([[0.0, 0, 1, 0, 2], [0, 1, 1, 0, 10]])
batch_norm = nn.BatchNorm1d(input_tensor.shape[1], affine=False)

Validation check

In [None]:
import numpy as np
all_correct = True
for eps_power in range(10):
  eps = np.power(10., -eps_power)
  batch_norm.eps = eps
  batch_norm_out = batch_norm(input_tensor)
  custom_batch_norm_out = custom_batch_norm1d(input_tensor, eps)

  all_correct &= torch.allclose(batch_norm_out, custom_batch_norm_out)
  all_correct &= batch_norm_out.shape == custom_batch_norm_out.shape
print(all_correct)

True


# Problems 2 (5 Points)
Let's generalize the function from the previous step a little - we'll add the ability to set the Beta and Gamma parameters.

In this task, you need to implement the batch normalization function without using the [standard function](https://pytorch.org/docs/stable/nn.html#batchnorm1d) with the following simplifications:

* The function should work correctly only during the training phase.
* The input has the dimension number of elements in the batch * length of each instance.

In [None]:
import torch
import torch.nn as nn

input_size = 7
batch_size = 5
input_tensor = torch.randn(batch_size, input_size, dtype=torch.float)

eps = 1e-3

def custom_batch_norm1d(input_tensor, weight, bias, eps):
    mean = torch.mean(input_tensor, dim=0)
    var = torch.var(input_tensor, dim=0, unbiased=False)
    normed_tensor = (input_tensor - mean) / torch.sqrt(var + eps)
    return weight * normed_tensor + bias

Validation check

In [None]:
batch_norm = nn.BatchNorm1d(input_size, eps=eps)
batch_norm.bias.data = torch.randn(input_size, dtype=torch.float)
batch_norm.weight.data = torch.randn(input_size, dtype=torch.float)
batch_norm_out = batch_norm(input_tensor)
custom_batch_norm_out = custom_batch_norm1d(input_tensor, batch_norm.weight.data, batch_norm.bias.data, eps)
print(torch.allclose(batch_norm_out, custom_batch_norm_out) \
      and batch_norm_out.shape == custom_batch_norm_out.shape)

True


# Problems 3 (5 Points)

Let's get rid of one more simplification - we implement the operation of the batch normalization layer at the prediction stage.

At this stage, instead of batch statistics, we will use exponentially smoothed statistics from the layer’s training history.

In this step, you need to implement a full-fledged batch normalization class without using a [standard function](https://pytorch.org/docs/stable/nn.html#batchnorm1d) that takes a two-dimensional tensor as input. Be careful, the variance is calculated using a biased sample, and the moving average is calculated using an unbiased sample.

In [None]:
import torch
import torch.nn as nn


input_size = 3
batch_size = 5
eps = 1e-1


class CustomBatchNorm1d:
    def __init__(self, weight, bias, eps, momentum):
        self.weight = weight
        self.bias = bias
        self.eps = eps
        self.momentum = momentum
        self.running_mean = torch.zeros_like(weight)
        self.running_var = torch.ones_like(weight)
        self.training = True

    def __call__(self, input_tensor):
        if self.training:
            mean = torch.mean(input_tensor, dim=0)
            var = torch.var(input_tensor, dim=0, unbiased=False)

            with torch.no_grad():
                self.running_mean = (1 - self.momentum) * self.running_mean + self.momentum * mean
                self.running_var = (1 - self.momentum) * self.running_var + self.momentum * torch.var(input_tensor, dim=0, unbiased=True)

            normed_tensor = (input_tensor - mean) / torch.sqrt(var + self.eps)
        else:
            normed_tensor = (input_tensor - self.running_mean) / torch.sqrt(self.running_var + self.eps)

        return self.weight * normed_tensor + self.bias

    def eval(self):
        self.training = False

batch_norm = nn.BatchNorm1d(input_size, eps=eps)
batch_norm.bias.data = torch.randn(input_size, dtype=torch.float)
batch_norm.weight.data = torch.randn(input_size, dtype=torch.float)
batch_norm.momentum = 0.5

custom_batch_norm1d = CustomBatchNorm1d(batch_norm.weight.data,
                                        batch_norm.bias.data, eps, batch_norm.momentum)

Validation check

In [None]:
all_correct = True

for i in range(8):
  torch_input = torch.randn(batch_size, input_size, dtype=torch.float)
  norm_output = batch_norm(torch_input)
  custom_output = custom_batch_norm1d(torch_input)
  all_correct &= torch.allclose(norm_output, custom_output, atol=1e-04) \
  and norm_output.shape == custom_output.shape

  batch_norm.eval()
  custom_batch_norm1d.eval()

for i in range(8):
  torch_input = torch.randn(batch_size, input_size, dtype=torch.float)
  norm_output = batch_norm(torch_input)
  custom_output = custom_batch_norm1d(torch_input)
  all_correct &= torch.allclose(norm_output, custom_output, atol=1e-04) \
  and norm_output.shape == custom_output.shape

print(all_correct)

True


# Problems 4 (5 Points)

As you can see, implementing the batch norm layer at the prediction stage is not so easy, so in the later steps of this workshop we will no longer require implementing this part.

A batch normalization layer exists for input of any dimension.

In this step we will consider it for input from multi-channel two-dimensional tensors, for example, images.

If you extract each channel of the image into a vector, then the input will be three-dimensional:

* number of pictures in the batch
* number of channels in each picture
* number of pixels in the image

![](https://ucarecdn.com/2ce27998-abb8-4888-9034-97fe4efe95ef/)

Normalization process:

* The input is divided into slices parallel to the blue part. That is, each slice is all the pixels of all images in one of the channels.
* For each cut, a mat is considered. expectation and variance.
* Each slice is normalized independently.

At this step, you are asked to **implement a batch norm layer for a four-dimensional input** (for example, a batch of multi-channel two-dimensional images) without using the [standard function](https://pytorch.org/docs/stable/nn.html#batchnorm2d) with the following simplifications:

* Beta parameter = 0.
* Gamma parameter = 1.
* The function should work correctly only during the training phase.

In [None]:
import torch
import torch.nn as nn

eps = 1e-3

input_channels = 3
batch_size = 3
height = 10
width = 10

batch_norm_2d = nn.BatchNorm2d(input_channels, affine=False, eps=eps)

input_tensor = torch.randn(batch_size, input_channels, height, width, dtype=torch.float)


def custom_batch_norm2d(input_tensor, eps):
    mean = torch.mean(input_tensor, dim=(0, 2, 3), keepdim=True)
    var = torch.var(input_tensor, dim=(0, 2, 3), unbiased=False, keepdim=True)

    normed_tensor = (input_tensor - mean) / torch.sqrt(var + eps)
    return normed_tensor

Validation check

In [None]:
norm_output = batch_norm_2d(input_tensor)
custom_output = custom_batch_norm2d(input_tensor, eps)
print(torch.allclose(norm_output, custom_output) and norm_output.shape == custom_output.shape)

True


We took a closer look at batch-by-batch normalization. To simplify further presentation, we will focus on the case of a three-dimensional tensor at the input of the layer; if the input dimension is more than three, then we will extend all dimensions except the first two into one dimension.

There is normalization not only by batch, but also by other dimensions.

Take a look at the images below.

![](https://ucarecdn.com/d1894e62-5608-43ce-80a0-f767d1875ff9/)

Where:

* C - number of input channels.
* N - batch size.
* H, W - dimension according to the last (third) dimension of the input.


The following types of normalization can be seen in the image:

* By batch.
* By channel.
* By instance.
* By group.

In addition to these types, there are also many others that are beyond the scope of our cource.

We will consider these types of normalization in further steps.

# Problems 5 (5 Points)

The idea behind the per-channel normalization layer is that the network should be independent of the contrast of the original image.

Channel normalization works independently for each batch image.

![](https://ucarecdn.com/c9f3f179-7f3d-44dc-85ef-8d1a675dc6c4/)

This step asks you to implement per-channel normalization without using a [standard layer](https://pytorch.org/docs/stable/nn.html#layernorm), with the following simplifications:

* Beta parameter = 0.
* Gamma parameter = 1.
* Only the training phase needs to be implemented.
* Normalization is done across all input dimensions except zero.

Please note that the input dimension is not fixed at this step.

Let us clarify that in the “by channel” normalization layer, statistics are calculated for all dimensions except zero.

In [None]:
import torch
import torch.nn as nn


eps = 1e-10


def custom_layer_norm(input_tensor, eps):
    normalize_dims = tuple(range(1, input_tensor.dim()))
    mean = torch.mean(input_tensor, dim=normalize_dims, keepdim=True)
    var = torch.var(input_tensor, dim=normalize_dims, unbiased=False, keepdim=True)

    normed_tensor = (input_tensor - mean) / torch.sqrt(var + eps)
    return normed_tensor

Validation check

In [None]:
all_correct = True
for dim_count in range(3, 9):
  input_tensor = torch.randn(*list(range(3, dim_count + 2)), dtype=torch.float)
  layer_norm = nn.LayerNorm(input_tensor.size()[1:], elementwise_affine=False, eps=eps)

  norm_output = layer_norm(input_tensor)
  custom_output = custom_layer_norm(input_tensor, eps)

  all_correct &= torch.allclose(norm_output, custom_output, 1e-2)
  all_correct &= norm_output.shape == custom_output.shape
print(all_correct)

True


# Problems 6 (5 Points)

Instance normalization was originally developed for the style transfer task. The idea behind this layer is that the network should be independent of the contrast of the individual channels of the source image.

![](https://ucarecdn.com/fe13e8df-e2f8-4356-9001-8dc3cd734e64/)

This step asks you to implement per-instance normalization without using a standard layer with the following simplifications:

* Beta parameter = 0.
* Gamma parameter = 1.
* The input is a three-dimensional tensor (batch size, number of channels, length of each instance channel).
* Only the training phase needs to be implemented.

In the “by instance” normalization layer, statistics are calculated according to the last dimension (for each input channel of each input example).



In [None]:
import torch
import torch.nn as nn

eps = 1e-3

batch_size = 5
input_channels = 2
input_length = 30

instance_norm = nn.InstanceNorm1d(input_channels, affine=False, eps=eps)

input_tensor = torch.randn(batch_size, input_channels, input_length, dtype=torch.float)


def custom_instance_norm1d(input_tensor, eps):
    mean = torch.mean(input_tensor, dim=2, keepdim=True)
    var = torch.var(input_tensor, dim=2, unbiased=False, keepdim=True)

    normed_tensor = (input_tensor - mean) / torch.sqrt(var + eps)
    return normed_tensor

Validation check

In [None]:
norm_output = instance_norm(input_tensor)
custom_output = custom_instance_norm1d(input_tensor, eps)
print(torch.allclose(norm_output, custom_output, atol=1e-06) and norm_output.shape == custom_output.shape)

True


# Problems 7 (5 Points)

Per-group normalization is a generalization of per-channel and per-instance normalization.

The channels in an image are not completely independent, so the ability to use neighboring channel statistics is an advantage of by-group normalization over by-instance normalization.

At the same time, the image channels can vary greatly, so group normalization is more flexible than per channel normalization.

![](https://ucarecdn.com/7384f3ed-ac36-48dc-8b70-6fdc490f5092/)

This step asks you to implement "by group" normalization without using a [standard layer](https://pytorch.org/docs/stable/nn.html#groupnorm) with the following simplifications:

* Beta parameter = 0.
* Gamma parameter = 1.
* Only the training phase needs to be implemented.
* A three-dimensional tensor is supplied as input.

The layer also takes the number of groups as input.

In the “by group” normalization layer, statistics are calculated very similarly to “by channel” normalization, only the channels are divided into groups.

Validation check

In [None]:
import torch
import torch.nn as nn

channel_count = 6
eps = 1e-3
batch_size = 20
input_size = 2

input_tensor = torch.randn(batch_size, channel_count, input_size)


def custom_group_norm(input_tensor, groups, eps):
    N, C, L = input_tensor.shape
    reshaped_input = input_tensor.view(N, groups, C // groups, L)

    mean = torch.mean(reshaped_input, dim=(2, 3), keepdim=True)
    var = torch.var(reshaped_input, dim=(2, 3), unbiased=False, keepdim=True)

    normed_tensor = (reshaped_input - mean) / torch.sqrt(var + eps)
    normed_tensor = normed_tensor.view(N, C, L)

    return normed_tensor

In [None]:
all_correct = True
for groups in [1, 2, 3, 6]:
  group_norm = nn.GroupNorm(groups, channel_count, eps=eps, affine=False)
  norm_output = group_norm(input_tensor)
  custom_output = custom_group_norm(input_tensor, groups, eps)
  all_correct &= torch.allclose(norm_output, custom_output, 1e-3)
  all_correct &= norm_output.shape == custom_output.shape
print(all_correct)

True
