### Convolution in CNNs

Convolution is a fundamental operation in Convolutional Neural Networks (CNNs) used primarily for image processing tasks. It involves the following steps:

1. **Filter (Kernel)**: A small matrix of weights, typically of size $3 \times 3$ or $5 \times 5$, which is used to scan over the input image.
2. **Sliding Window**: The filter slides over the input image, performing element-wise multiplication and summing the results to produce a single output value.
3. **Feature Map**: The output values from the sliding window operation form a new matrix called the feature map or convolved feature.
4. **Stride**: The number of pixels by which the filter moves over the input image. A stride of 1 means the filter moves one pixel at a time.
5. **Padding**: Adding extra pixels around the input image to control the spatial dimensions of the output feature map. Common padding types are 'valid' (no padding) and 'same' (padding to keep the output size the same as the input size).

The convolution operation can be described by the following equation:

$$
(I * K)(i, j) = \sum_m \sum_n I(i+m, j+n) \cdot K(m, n)
$$

</br>
</br>

$$
c = \left[ (f * g)(t) = \int_{0}^{t} f(\tau) g(t - \tau) \, d\tau \right]
$$

where:
- $I$ is the input image  
- $K$ is the filter (kernel)  
- $(i, j)$ are the coordinates of the output feature map  
- $m$ and $n$ are the coordinates within the filter  

Convolution helps in extracting features such as edges, textures, and patterns from the input image, which are crucial for tasks like image classification, object detection, and segmentation.


![Image](https://github.com/user-attachments/assets/f9d937d9-5098-46e5-a7f7-4c1242ce6983)

![Image](https://github.com/user-attachments/assets/543fd3f2-79ca-4ed6-aec4-7cf3b4961c12)

<button><a href="https://deeplizard.com/resource/pavq7noze2">Click here to Show Convulation Graphically</a></button>


### Pooling in CNNs

Pooling is another essential operation in Convolutional Neural Networks (CNNs) that is used to reduce the spatial dimensions of the input feature maps. This helps in reducing the computational complexity and preventing overfitting. There are two main types of pooling:

1. **Max Pooling**: This operation selects the maximum value from each patch of the feature map. It helps in retaining the most prominent features while reducing the spatial dimensions.
2. **Average Pooling**: This operation calculates the average value of each patch of the feature map. It provides a smoother representation of the feature map.

The pooling operation involves the following steps:

1. **Filter (Kernel)**: A small matrix, typically of size $2 \times 2$ or $3 \times 3$, which is used to scan over the input feature map.
2. **Sliding Window**: The filter slides over the input feature map, performing the pooling operation (max or average) on each patch.
3. **Stride**: The number of pixels by which the filter moves over the input feature map. A stride of 2 means the filter moves two pixels at a time.

Pooling can be described by the following equations:

- **Max Pooling**:
$$
P_{max}(i, j) = \max \{ I(i+m, j+n) \}
$$

- **Average Pooling**:
$$
P_{avg}(i, j) = \frac{1}{mn} \sum_{m} \sum_{n} I(i+m, j+n)
$$

where:
- $I$ is the input feature map  
- $(i, j)$ are the coordinates of the output pooled feature map  
- $m$ and $n$ are the coordinates within the filter  

Pooling helps in achieving translation invariance and reducing the spatial dimensions, which is crucial for tasks like image classification and object detection.


![Image](https://github.com/user-attachments/assets/aeac8560-7378-466d-a632-f0c2b0d4b1b0)

<button><a href="https://deeplizard.com/resource/pavq7noze3">Click here to Show MaxPool Graphically</a></button>


### Flattening in CNNs

Flattening is a process used in Convolutional Neural Networks (CNNs) to transform the multi-dimensional output of the convolutional and pooling layers into a one-dimensional vector. This step is essential before feeding the data into fully connected layers (dense layers) for classification or regression tasks.

The flattening operation involves the following steps:

1. **Input Feature Map**: The multi-dimensional array resulting from the convolutional and pooling layers.
2. **Flattening**: Converting the multi-dimensional array into a one-dimensional vector by concatenating the rows or columns.

For example, if the output feature map from the pooling layer is of shape (4, 4, 3), flattening will convert it into a one-dimensional vector of shape (48,).

Flattening helps in connecting the convolutional and pooling layers to the fully connected layers, enabling the network to make predictions based on the extracted features.


![Image](https://github.com/user-attachments/assets/e6bc3e03-732d-495b-b492-d7769871e364)

![Image](https://github.com/user-attachments/assets/52dfa6f6-e2aa-4ce5-8676-a0c990172b46)

### Terms in one shot
- Convulation - important edge getting
- Pool - Size reduced
- Flatten - Convert 1 D array