# LAYERS

1. Linear layer
2. convolution layer
3. pooling layers (maxpool and average pooling)
4. recurrent layers
5. LSTM
6. GRU
7. Normalization layers
8. Dropout layers
9. Activation layers
10. Embedding layers
11. Attention mechanisms

## Linear Layer

performs a fully connected (dense) transformation. The equation for this layer is: __output = xW^T + b__ where:

x = input vector
W^T = weight matrix transponsed (this is what is learned), it maps input features to output features
b = bias vector

in pytorch `nn.Linear`:

```python
import torch.nn as nn
linear = nn.Linear(in_features = 128, out_features = 64)
# or
linear = nn.Linear(128, 64)
```

The input tensor is normally _[batch_size, input_features]_ where batch size is the number of samples processed in a single forward pass. The weight matrix has the shape _[out_features, in_features]_ and it is initialized randomly by default. The model will learn appopriate values of these weights during training. The bias vector has the shape _[out_features]_. This is also updated during training.

## Convolution Layer

A convolution layer is in the heart of convolution neural networks. its goal is to extract meaningful features from input data by applying convolutio operations. These layers performs a critical mathematical operation known as a __convolution__. 

This process using filters known as __kernel__ that traverse through the input image to learn complex visual patterns.

### convolution operation

This is a linear operation between two functions commonly applied to signals images or any structured data.

![2D Convolution Animation](https://upload.wikimedia.org/wikipedia/commons/1/19/2D_Convolution_Animation.gif)

when you have the input matrix, and the kernel, you overlay the kernel on the input matriz, do an elementwise multiplication (multiply each overlapping element), and then sum up the results it forms the first element in the output matrix, the slide the kernel based on the stride and and repeat until you get an output matrix.

This output matrix is known as a __feature map__.

### channels

Each channel represents a color, each pixel consists of three channels if it is an RGB. An RGB image can be described as _W x H x c_. Gray scale images have only one channel. 

### Parameters

1. Filter/Kernel size
2. Stride
3. Padding
4. Output channels
5. in channels

```python
convLayer = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=2, stride = 1, padding = 0)
```

#### Filter/kernel size

These are the learnable weight matrices. Each kernel extracts specific features like edges textures or patterns. for a filter size _k x K_ and input with c_in channels, total weights equates to _k x k x c_in_

#### input channels

represents the number of channels in the input data. for RGB it is three channels (Red, Green, Blue) and for grayscale it is just 1.

#### Kernel size

spatial size of the filter. Typically 3 x 3, 5 x 5, 7 x 7.

#### Stride

determines how much a filter move at each step. a stride of 1 means one pixel at a time. Larger strides reduces the size of the output feature map.

#### padding

adds zeros around the inputs to maintain its size after convolutions. 

#### Hyperparameters affecting parameters

1. increasing output channels increases the number of filters and hence more learnable parameters
2. increasing kernel size makes each filter larger
3. 

## Pooling layers

Pooling, also known as subsampling or downsampling, is a technique used in CNNs to reduce the spatial dimensions of feature maps while retaining essential information

1. MaxPooling
2. AvePooling

### MaxPooling

Take the maximum value from each region of the feature map. It captures the most prominent feature in each region. 

### average pooling

Take the average value from each region of the feature map

### parameters

1. __kernel size__  - determines the region to be pooled
2. __stride__ - determines the step size of the pooling window, default is equal to the kernel size to ensure no overlapping regions
3. __padding__ - add zeros around the input to control the output dimensions.

```python
max_pool = nn.MaxPool2d(kernel_size = 2, stride = 2)
avg_pool = nn.AvgPool2d(kernel_size = 2, stride = 2)
```

pooling:

1. reduces dimensions
2. translation invariance - help the network recognize patterns regardless of small shifts input
3. prevent overfitting

## Recurrent layers