```{contents}
```

## Convolution Operation in CNNs

A CNN processes an image by applying **filters (kernels)** to extract useful features like edges, textures, and shapes. The **first step** in this process is the *convolution operation*.

---

### 1. Input Image Setup

Example image:

* Size: **6 × 6 pixels**
* Grayscale → only **1 channel**
* So shape = **6 × 6 × 1**
* Pixel values range from **0 (black)** to **255 (white)**

Example shown:
Black (0) on one side, White (255) on the other → a clear **vertical edge** exists.

---

### 2. Step 1 — Normalization

Before convolution, pixel values are **scaled to $$0,1$$**:

$$
\text{Normalized Pixel} = \frac{\text{Original Pixel}}{255}
$$

So:

* 0 → 0
* 255 → 1

Now the image is **6×6 with 0s and 1s**.

---

### 3. Filters (Kernels) = Feature Detectors

A **filter** is a small matrix, e.g.:

```
3 × 3 filter  (example: vertical edge detector)
```

This filter slides over the image and extracts features.

Filter example used:

```
[ 1   0  -1
  2   0  -2
  1   0  -1 ]
```

This is a **vertical edge detection filter**.

---

### 4. Convolution Operation

Procedure:

1. Place the **3×3 filter** on a **3×3 patch** of the image.
2. Multiply filter values with overlapping pixel values.
3. Add all results → **one output value**
4. Slide the filter and repeat.

This sliding jump is called **stride**.

* Here: **stride = 1**

Example pixel calculation:
Multiply each filter value with the corresponding image value → sum → one output.

This turns the input **6×6** into **4×4** output.

Why?
$$
\text{Output size} = (N - F) + 1 = (6 - 3) + 1 = 4
$$

So final output = **4 × 4**

---

### 5. Output Meaning

After convolution with this **vertical edge filter**, the output highlights **edges**:

* Higher values (or 255 after scaling) → white
* Lower values (or 0 after scaling) → black

Result:
A **vertical line** appears in the output, showing where the black–white transition exists in the input.

This is how CNNs **learn features automatically**.

---

### 6. Why Stride and Padding Matter

Problem:
6×6 → 4×4 output = **loss of information**

Solution:
Add **padding** (extra border around the image) to preserve dimensions.

Padding is discussed next, but its purpose is:

* Prevent shrinking of output
* Retain edge pixel information

---

### 7. Filters = Feature Extraction Units

Each filter detects **one type of pattern**, like:

* Vertical edges
* Horizontal edges
* Corners
* Curves
* Textures

Multiple filters → multiple feature maps.

In real CNNs:

* Filter values are **not manually set**
* They are **randomly initialized**
* Learned using **forward + backpropagation**

---

**Final Takeaway**

**Convolution = Filter × Image Patch + Sliding**

It helps CNNs:

* Detect features (edges, textures, shapes)
* Preserve spatial information
* Reduce manual feature engineering

**First operation in CNN = Convolution**
Next comes: **Padding**, then **Pooling**, then **Flattening → Fully Connected Layers**
