# CONVOLUTION

In PyTorch, convolutions are implemented using the __nn.Conv1d, nn.Conv2d, or nn.Conv3d__ modules, depending on the dimensionality of your data. Convolutional layers are the backbone of many deep learning models, particularly in computer vision and time-series applications.

A Convolution operation is a filter(kernel), to an input, e.g (image or a sequence), to produce a feature map. The filter slides across the input, performing  an element-wise multiplication and summation, effectively capturing the local patterns in the data.

![2D Convolution Animation](https://upload.wikimedia.org/wikipedia/commons/1/19/2D_Convolution_Animation.gif)


## Parameters of CNN Layer

1. in_channels
2. out_channels
3. kernel_size
4. Stride
5. padding
6. dilation
7. groups
8. bias

### In_channels

It specifies the number of input channels the layer expects. This is useful for the filters. They represent the dimensions of features of the input data. For an __RGB__ image, the channels are __Red, Green, Blue__ so __in_channels = 3__. For a __grayscale__ image, there is only one channel (intensity - how dark it is), so __in_channels = 1__. In audio or sequential data, channels might represent features like MFCCs, so in_channels depends on the number of extracted features.

### Out_channels

The __out_channels__ parameter in a convolutional layer determines the number of output channels (or feature maps) that the layer produces after applying the convolution operation. This number normally represents how many different feature maps/filters you would want the convolution layer to learn.

Each filter in the convolution layer learns a different set of features from the input, so increasing __out_channels__ increases the number of features the network learns. 

### Kernel Size

It determines the spatial dimensions of the filters (or kernels) applied to the input data. It defines the size of the window that slides over the input tensor to perform the convolution operation.

__2D Convolutions (Images):__

The kernel is a 2D matrix that slides over the 2D input image. The size of the kernel is defined by two dimensions: height and width.
Common choices for kernel sizes are __3x3, 5x5,__ and __7x7__, though other sizes can also be used depending on the problem.

__1D Convolutions (Sequences or Audio):__

The kernel is a 1D vector that slides over the 1D input sequence.
The kernel size is typically an odd number to ensure that there is a center element (like 3, 5, etc.).

Smaller kernels (e.g., 3x3 or 5x5) capture local features in a fine-grained way, allowing the network to focus on small spatial details.

Larger kernels (e.g., 7x7 or more) cover larger portions of the input and may capture more global features, but they also require more parameters and computation.

### Stride

It refers to the step size, at which filters/kernels  move across the input data. It controls how much the filter shifts after each operation.

- __Stride = 1:__ The filter moves one unit at a time across the input. A stride of 1 ensures that the filter slides by one pixel at a time, producing a more detailed output feature map with higher spatial resolution.
- __Stride = 2:__ The filter moves two units at a time, effectively reducing the output size by a factor of 2 in that dimension. Larger strides can help in downsampling the input (reducing the size of the feature map), which is sometimes useful in reducing the computational load and capturing high-level features.


You can see that the stride affects the output size yeah? so this is how the output stride is calculated:


### Convolution Output Size Formula

$
\text{Output Size} = \frac{\text{Input Size} - \text{Kernel Size} + 2 \times \text{Padding}}{\text{Stride}} + 1
$

#### Components:
- **Input Size**: Size of the input feature map.
- **Kernel Size**: Size of the filter applied to the input.
- **Padding**: Number of pixels added to the border of the input.
- **Stride**: Step size for the filter's movement across the input.
- **+1**: Ensures the first position of the kernel is counted.

#### Example:
For a 2D convolution with:
- Input Size = 32,
- Kernel Size = 5,
- Padding = 2,
- Stride = 1,

The output size is:

$
\text{Output Size} = \frac{32 - 5 + 2 \times 2}{1} + 1 = 32
$

### Padding

__Padding__ is the process of adding extra elements around the input feature map to control the size of the output. It is commonly used to preserve the input's spatial dimensions or to achieve specific output sizes.

Formula for Padding (Same Output Size):
$
\text{Padding} = \frac{\text{Kernel Size} − 1}{2}
$
 
This formula ensures the output size remains the same as the input size when the stride is 1. The padding is distributed evenly on both sides of the input.

#### Types of Padding

1. __Valid padding:__ No extra elements are added, the output size is reduced depending on the kernel size and stride using the formula shared in the convolution output formula
2. __Same padding:__ Padding is added  to ensure the output size is equal to the input size, using the immediate above formula
3. __Custom padding:__ users can define arbitrary padding values. For example, padding of 1 adds one extra element to all sides of the input.

Convolution layers in neural networks come in different variants tailored to the nature of the input data: __1D, 2D,__ and __3D.__

### 1D

A 1D convolution operates on sequential data along a single spatial axis. It slides a filter (or kernel) across the input, performing dot products between the filter and overlapping input segments. The result is a sequence of features that captures local patterns in the input data.

![1D convolution](https://e2eml.school/images/conv1d/aa_copy.gif)

In [3]:
conv1d = nn.Conv1d(in_channels = 1, out_channels = 4, kernel_size = 3, stride = 1, padding = 1)

### 2D

This operates in a 2D data, i.e images, it slides a 2D filter/kernel, across the input image, performing dot products between the filter and overlapping regions of the images. 

![convolution 2D](https://miro.medium.com/v2/resize:fit:828/format:webp/1*DTTpGlhwkctlv9CYannVsw.gif)

In [2]:
import torch
import torch.nn as nn

conv2d = nn.Conv2d(in_channels = 3,     out_channels = 8, kernel_size = 3, stride = 1, padding = 1)

### 3D

This operates on volumentric data, such as  video frames (time, height, width), medical imaging data or any data with three spatial dimensions (depth, height, width). Instead of sliding a 2D kernel, a 3D kernel moves through the data in three dimensions, capturing spatial and temporal relationships.

[![Convolution 3D Video](https://i.imgur.com/2nJzE83.jpg)](https://imgur.com/2nJzE83)


#### Applications of 3D Convolution

- __Video Analysis:__ Action recognition, video classification, and object tracking.
- __Medical Imaging:__ Processing 3D scans (e.g., CT, MRI).
- __Scientific Computing:__ Analyzing volumetric data in simulations.
- __Speech Processing:__ Features from spectrograms with time, frequency, and channel dimensions.




In [1]:
import torch
import torch.nn as nn

conv3d = nn.Conv3d(in_channels = 3, out_channels = 8, kernel_size = (3,3,3), stride = 1, padding = 1)

## Patterns formed when CNN Architecture

The general architecture formed is `Conv layer -> Batch norm -> Activation`. This pattern repeats itself across the entire CNN architecture.

### Conv layer

This is what learns the edges, vertices, the features -  it extracts the raw spatial features

### Batch Norm

It normalizes the output of the conv layer, so that it has a mean of 0 and a variance of 1. This is stabilizes the output of the convolution layer

### Activation layer

difinitely to introduce non-linearity

In [None]:
class CNNArchitecture(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, stride=1, padding=0):
        super(CNNArchitecture, self).__init__()
        self.conv = nn.Conv2d(in_channels, out_channels,kernel_size, stride, padding)
        self.batchnorm = nn.BatchNorm2d(out_channels)
        self.act = nn.ReLU()

    def forward(self, x):
        x = self.conv(x)
        x = self.batchnorm(x)
        x = self.act(x)

        return x

The same pattern repeats itself even in the resnet block architecture

In [None]:
class BasicCNNBlock(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, stride=1, padding=0, downsample = 0):
        super(BasicCNNBlock, self).__init__()
        self.conv1 = nn.Conv2D(in_channels, out_channels, kernel_size=3, stride = stride, padding = padding)
        self.batchnorm1 = nn.BatchNorm2d(out_channels)
        self.act1 = nn.ReLU(inplace=True)
        self.conv2 = nn.Conv2D(in_channels, out_channels, kernel_size=3, stride = stride, padding = padding)
        self.batchnorm2 = nn.BatchNorm2d(out_channels)
        self.downsample = downsample

    def forward(self, x):
        identity = x
        x = self.conv1(x)
        x = self.batchnorm1(x)
        x = self.act1(x)
        x = self.conv2(x)
        x = self.batchnorm2(x)
        if self.downsample is not None:
            identity = self.downsample(x)
        x += identity

        return self.relu(x)
        