**Lecture Notes: The Convolution Operation and Feature Extraction**


### I. Prerequisites: Image Representation

To understand the Convolution Operation, one must first understand how computers store images.

#### A. Grayscale Images (Black and White)
*   Grayscale images consist of **one channel**.
*   They are stored as a **2D array** (or grid) of pixels.
*   Each pixel value typically ranges from 0 (Black) to 255 (White).
*   The computer processes these images as **2D number arrays**.

#### B. RGB Images (Color)
*   Color images (RGB) consist of **three channels**: Red, Green, and Blue.
*   The image is composed of three separate "sheets," where each sheet acts like a single grayscale image.
*   Each channel contains values between 0 and 255.
*   The shape of an RGB image is typically $(N \times N \times 3)$, representing height, width, and depth (number of channels).

### II. The Convolution Operation

The Convolution Operation is fundamentally different from the matrix multiplication used in Artificial Neural Networks (ANNs). Its purpose is **feature extraction**.

<a href="https://deeplizard.com/resource/pavq7noze2" target="_blank">See demo here</a>
#### A. Key Components
1.  **Image:** The input data (e.g., a $6 \times 6$ grid).
2.  **Filter (or Kernel):** A small matrix of numbers, typically $3 \times 3$.
    *   The filter is designed or learned to detect a specific feature, such as a horizontal or vertical edge.
3.  **Feature Map (or Activation Map):** The output generated after performing the convolution operation.

#### B. Process (Element-wise Multiplication and Summation)
1.  The filter is placed over the top-left corner of the input image.
2.  **Element-wise multiplication** is performed between the filter values and the corresponding pixel values underneath it.
3.  The products of these multiplications are then **summed up**.
4.  This resultant single number is recorded in the corresponding cell of the Feature Map.
5.  The filter is then **slid/moved** (e.g., one step to the right) across the image, and the process is repeated until the entire image is covered.

#### C. Feature Detection (Edges)
*   The early layers of a CNN are tasked with detecting **primitive features**, such as edges.
*   An **edge** is mathematically defined as a **change in intensity** (e.g., moving from a black area, value 0, to a white area, value 255).
*   By using specific filter values (e.g., positive numbers on one side and negative numbers on the other side), the convolution operation can mathematically detect where this intensity change occurs.

### III. Filter Learning and Activation

<img src="https://www.analytixlabs.co.in/wp-content/uploads/2024/01/8-1.jpg">

#### A. Filter Values
*   Different values within the filter matrix allow the CNN to detect different architectural features (e.g., horizontal, vertical, slanted, right, or left edges).
*   In deep learning, computer scientists **do not manually create these filters**.
*   The filter values are randomly initialized and then **automatically adjusted/learned** during training using **Backpropagation** (similar to how weights are learned in ANNs). This adaptability is key to the success of CNNs.

<img src="https://miro.medium.com/v2/format:webp/0*N7YTWQ-7P7s6Z9DZ.gif">

#### B. The Role of ReLU
*   When a filter (e.g., a "Left Edge" detector) is applied, the Feature Map may contain both positive activations (indicating a true left edge, sometimes visualized as red) and negative activations (indicating the opposite feature, e.g., a right edge, visualized as blue).
*   To ensure the network only focuses on the desired feature (the positive result), the **ReLU activation function** is applied to the Feature Map.
*   ReLU converts all **negative values to zero**, while positive values remain unchanged, resulting in a feature map containing only the desired feature activations.

### IV. Convolution with Multi-Channel Images (RGB)

When performing convolution on a color (RGB) image, the filter automatically adapts to match the depth (number of channels) of the input image.

*   **Filter Shape:** If the input image has 3 channels (RGB), the filter will automatically have a shape of $M \times M \times 3$ (e.g., $3 \times 3 \times 3$).
*   **The Operation:** The $M \times M \times 3$ filter volume is placed over the corresponding volume of the image (e.g., 27 values in a $3 \times 3 \times 3$ region). Element-wise multiplication is performed across all channels, and all 27 results are summed together.
*   **Crucial Outcome:** Convolving a multi-channel image ($N \times N \times C$) with a multi-channel filter ($M \times M \times C$) always yields a **Single Channel Feature Map**.

### V. Managing Input and Output Dimensions

#### A. Output Shape Calculation (Single Filter)
The size of the resulting Feature Map depends on the size of the input image and the filter.

*   If the **Input Image shape** is $N \times N$ and the **Filter shape** is $M \times M$.
*   The **Feature Map shape** will be:
    $$\mathbf{(N - M + 1) \times (N - M + 1)}$$
*   Example: A $6 \times 6$ image convolved with a $3 \times 3$ filter results in a $(6 - 3 + 1) = 4 \times 4$ Feature Map.

#### B. Using Multiple Filters
In practice, CNNs almost always use **multiple filters** simultaneously (e.g., one for vertical, one for horizontal, one for slanted edges).

*   If $K$ different filters are applied, **$K$ different feature maps** are generated.
*   These individual feature maps are then stacked together to form a final output volume (or volume).
*   The shape of the final output volume is determined by: (Height $\times$ Width $\times$ **Number of Filters Used**).
*   Example: If $4 \times 4$ feature maps are generated using 10 filters, the resulting volume shape is $4 \times 4 \times 10$. This output volume then acts as the input for the next layer of the CNN.
---

In [None]:
from tensorflow import keras
from keras import layers
from keras import Sequential

model=Sequential()
model.add(layers.Conv2D(32,(3,3),activation='relu',input_shape=(50,50,3)))
# here 32 is the no of filters and 3,3 is the shape of filter depth of filter will atomatillicaly be sorted