# CNN Encoder
This notebook walks through a practical example of a Convolutional Neural Network (CNN) Encoder, used to compress an image into a latent representation. This representation is typically trained end-to-end with a CNN Decoder block, which aims to reconstruct the source image from this compressed representation. This steers the compressed representation into something that captures the key bits of information captured in the image. Once trained, this compressed representation can be used for downstream tasks such as image classification.

The CNN Encoder component typically consists of several convolutional operators stacked on top of each other. It is thus helpful to first gain an understanding of the convolutional operator. From a practical standpoint, it will be most helpful to understand the impact of the convolutional operator's parameters on the output shape of the image at each layer, as this will dictate the final vector size of the compressed representaiton

In what follows, we will have a practical look and dive into a CNN Encoder using PyTorch.

## Convolutional Operator
Let us first showcase a simple example with one convolutional operator.

First, we will create a simple tensor representing our source image. This image will consist 6 x 6 pixels and will have three channels representing Red, Green, and Blue (RGB). Note that Pytorch typically orders its tensors as $(C, H, W)$ channels, height, width.

In [10]:
import torch

source_image = torch.rand(3, 6, 6)
src_c, src_h, src_w = source_image.size()
print(f"Source image channels: {src_c}")
print(f"Source image height: {src_h}")
print(f"Source image width: {src_w}")


Source image channels: 3
Source image height: 6
Source image width: 6


Now we can create our convolutional operator, with some intial parameters:

In [30]:
out_channels: int = 3
kernel_size: int = 2
stride: int = 1
padding: int = 0

conv = torch.nn.Conv2d(src_c, out_channels, kernel_size, stride, padding, dilation=None)

the `Conv2d` operator assumes that the first dimension of our image is the batch size, so let's quickly add an extra dimension to our sensor:

In [32]:
source_image_batched = source_image.unsqueeze(0)
source_image_batched.size()

torch.Size([1, 3, 6, 6])

Now we can pass our image through our convolution:

In [33]:
conv(source_image_batched)

tensor([[[[-0.0736, -0.2053, -0.2680, -0.4883, -0.1210],
          [-0.4622, -0.3080, -0.1221, -0.2786, -0.5613],
          [-0.4050, -0.0385, -0.4267, -0.2689, -0.3818],
          [-0.4389, -0.4080, -0.3348, -0.3209, -0.5775],
          [-0.3485, -0.2811, -0.4165, -0.1599, -0.6860]],

         [[ 0.0074, -0.3226, -0.0768, -0.4606, -0.1248],
          [-0.4148,  0.0548, -0.1139, -0.1483, -0.0555],
          [-0.1762, -0.1939, -0.3049, -0.2043, -0.6298],
          [ 0.0801,  0.1616,  0.1777,  0.0369, -0.3378],
          [-0.3325, -0.3473, -0.1215, -0.0907, -0.2764]],

         [[ 0.2902, -0.0183,  0.3065,  0.0918, -0.1053],
          [ 0.2768,  0.0523,  0.0537,  0.0480,  0.3034],
          [ 0.0597,  0.0448, -0.1029, -0.0742,  0.0137],
          [ 0.2168,  0.0698,  0.0264,  0.0750,  0.1205],
          [ 0.2202, -0.0126,  0.1453,  0.0804,  0.4039]]]],
       grad_fn=<ConvolutionBackward0>)

With our given parameters, we can see that the shape of our tensor has not changed:

In [34]:
conv(source_image_batched).size()

torch.Size([1, 3, 5, 5])