# Lab 7, Module 0: What Is a Convolution?

**Estimated time:** 10 minutes

---

## **Opening: How Computers See Images**

When you look at a photo, your brain instantly detects edges, corners, textures, and objects. You recognize a dog by its ears, fur texture, and snout‚Äînot by memorizing every possible pixel arrangement.

**How does a computer do the same?**

The answer is **convolution**‚Äîa simple but powerful operation that slides small patterns across an image, looking for features like edges, textures, and shapes.

In this module, you'll learn the fundamental operation that powers modern computer vision systems, from face recognition on your phone to medical image analysis in hospitals.

---

## üìò **What Is Convolution? (The Big Idea)**

**Convolution is:**
> A sliding window operation that detects patterns in an image by multiplying and adding pixel values.

**In plain language:**
1. You have a small pattern you're looking for (called a **filter** or **kernel**)
2. You slide this pattern across the entire image
3. At each position, you multiply the filter values with the pixel values
4. You add up all those products to get one output number
5. That number tells you "how much the pattern matched" at that location

### **Analogy: Pattern Matching with Stamps**

Imagine you have:
- A large piece of paper with random marks (the **image**)
- A small stamp with a vertical line pattern (the **filter**)

You press the stamp at different locations:
- Where there's a vertical line, the stamp matches well ‚Üí **high response**
- Where there's a horizontal line, the stamp doesn't match ‚Üí **low response**
- Where there's nothing, the stamp barely matches ‚Üí **zero response**

**That's convolution!** The "match score" at each location creates a new image showing where patterns were found.

---

## üî¢ **The Math (Simple Version)**

Don't worry‚Äîwe'll keep this intuitive!

### **Basic Formula:**
```
Convolution = Sliding Window + Multiply-and-Add
```

### **Example with Tiny Arrays:**

**Image patch (3√ó3):**
```
[1  2  1]
[0  1  0]
[1  0  1]
```

**Vertical edge filter (3√ó3):**
```
[-1  0  1]
[-1  0  1]
[-1  0  1]
```

**Convolution operation:**
```
Output = (1√ó-1) + (2√ó0) + (1√ó1) + 
         (0√ó-1) + (1√ó0) + (0√ó1) + 
         (1√ó-1) + (0√ó0) + (1√ó1)
       
       = -1 + 0 + 1 + 0 + 0 + 0 + -1 + 0 + 1
       = 0
```

**Interpretation:** A result of 0 means this patch doesn't have a strong vertical edge.

### **Why This Filter Detects Vertical Edges:**

Look at the filter pattern:
```
[-1  0  1]
[-1  0  1]
[-1  0  1]
```

- **Left column:** negative values (dark pixels)
- **Middle column:** zeros (ignore)
- **Right column:** positive values (bright pixels)

**When there's a vertical edge** (dark on left, bright on right), this filter produces a **strong positive response**!

---

## üîß **Common Filters and What They Detect**

Different filters detect different patterns:

### **1. Vertical Edge Detector**
```
[-1  0  1]
[-1  0  1]
[-1  0  1]
```
**Detects:** Vertical lines and edges

### **2. Horizontal Edge Detector**
```
[-1 -1 -1]
[ 0  0  0]
[ 1  1  1]
```
**Detects:** Horizontal lines and edges

### **3. Blur Filter (Averaging)**
```
[1  1  1]
[1  1  1]  √∑ 9
[1  1  1]
```
**Effect:** Averages nearby pixels, smooths noise

### **4. Sharpen Filter**
```
[ 0 -1  0]
[-1  5 -1]
[ 0 -1  0]
```
**Effect:** Emphasizes differences, enhances edges

### **5. Identity Filter (No Change)**
```
[0  0  0]
[0  1  0]
[0  0  0]
```
**Effect:** Leaves image unchanged (useful for testing!)

---

## üí° **Why Convolution Works for Images**

Convolution is perfect for image processing because of three key properties:

### **1. Local Patterns Matter**
Nearby pixels are related (they're part of the same edge, texture, or object). Distant pixels are usually unrelated.

**Example:** If pixel (10, 15) and pixel (11, 15) are both bright, they're probably part of the same edge. But pixels 100 pixels apart are likely unrelated.

### **2. Translation Invariance**
The same filter detects the same pattern **anywhere** in the image.

**Example:** A vertical edge detector finds vertical edges whether they're:
- Top-left corner
- Center of the image
- Bottom-right corner

No need to train separate detectors for each location!

### **3. Parameter Sharing**
A 3√ó3 filter has only **9 parameters**, but it's applied to the entire image.

**Comparison:**
- **Fully-connected layer** on 256√ó256 image: ~16 billion parameters
- **Convolutional layer** with 32 filters: ~288 parameters

**Result:** CNNs are much more efficient than fully-connected networks for images!

---

## üñºÔ∏è **Hands-On: Tiny Convolution Example**

Let's see convolution in action with a simple 5√ó5 image.

---

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle

# Create a simple 5x5 image with a vertical edge
image = np.array([
    [0, 0, 1, 1, 1],
    [0, 0, 1, 1, 1],
    [0, 0, 1, 1, 1],
    [0, 0, 1, 1, 1],
    [0, 0, 1, 1, 1]
], dtype=float)

# Define three different filters
filters = {
    'Vertical Edge': np.array([[-1, 0, 1],
                               [-1, 0, 1],
                               [-1, 0, 1]]),
    
    'Horizontal Edge': np.array([[-1, -1, -1],
                                 [ 0,  0,  0],
                                 [ 1,  1,  1]]),
    
    'Identity': np.array([[0, 0, 0],
                         [0, 1, 0],
                         [0, 0, 0]])
}

# Function to perform convolution
def convolve2d(image, kernel):
    """Simple 2D convolution (no padding, stride=1)"""
    i_height, i_width = image.shape
    k_height, k_width = kernel.shape
    
    output_height = i_height - k_height + 1
    output_width = i_width - k_width + 1
    output = np.zeros((output_height, output_width))
    
    for i in range(output_height):
        for j in range(output_width):
            patch = image[i:i+k_height, j:j+k_width]
            output[i, j] = np.sum(patch * kernel)
    
    return output

# Visualize image and filters
fig, axes = plt.subplots(2, 4, figsize=(16, 8))

# Plot original image
axes[0, 0].imshow(image, cmap='gray', vmin=0, vmax=1)
axes[0, 0].set_title('Original Image\n(5√ó5)', fontsize=12, fontweight='bold')
axes[0, 0].grid(True, which='both', color='red', linewidth=0.5)
axes[0, 0].set_xticks(np.arange(-0.5, 5, 1))
axes[0, 0].set_yticks(np.arange(-0.5, 5, 1))
axes[0, 0].set_xticklabels([])
axes[0, 0].set_yticklabels([])

# Add text to show it has a vertical edge
axes[0, 0].text(2.5, -1, 'Notice: Vertical edge at column 2', 
                ha='center', fontsize=10, style='italic')

# Plot each filter and its output
for idx, (name, kernel) in enumerate(filters.items()):
    # Plot filter
    axes[0, idx+1].imshow(kernel, cmap='RdBu', vmin=-1, vmax=1)
    axes[0, idx+1].set_title(f'{name} Filter\n(3√ó3)', fontsize=12, fontweight='bold')
    axes[0, idx+1].grid(True, which='both', color='black', linewidth=0.5)
    axes[0, idx+1].set_xticks(np.arange(-0.5, 3, 1))
    axes[0, idx+1].set_yticks(np.arange(-0.5, 3, 1))
    axes[0, idx+1].set_xticklabels([])
    axes[0, idx+1].set_yticklabels([])
    
    # Add filter values as text
    for i in range(3):
        for j in range(3):
            axes[0, idx+1].text(j, i, f'{kernel[i, j]:.0f}', 
                               ha='center', va='center', fontsize=11, fontweight='bold')
    
    # Compute and plot output
    output = convolve2d(image, kernel)
    im = axes[1, idx+1].imshow(output, cmap='RdBu', vmin=-6, vmax=6)
    axes[1, idx+1].set_title(f'Output After\n{name}', fontsize=12, fontweight='bold')
    axes[1, idx+1].grid(True, which='both', color='black', linewidth=0.5)
    axes[1, idx+1].set_xticks(np.arange(-0.5, 3, 1))
    axes[1, idx+1].set_yticks(np.arange(-0.5, 3, 1))
    axes[1, idx+1].set_xticklabels([])
    axes[1, idx+1].set_yticklabels([])
    
    # Add output values as text
    for i in range(output.shape[0]):
        for j in range(output.shape[1]):
            axes[1, idx+1].text(j, i, f'{output[i, j]:.1f}', 
                               ha='center', va='center', fontsize=10)
    
    # Add colorbar
    plt.colorbar(im, ax=axes[1, idx+1], fraction=0.046, pad=0.04)

# Hide the bottom-left subplot
axes[1, 0].axis('off')

plt.tight_layout()
plt.suptitle('Convolution in Action: How Filters Detect Patterns', 
             fontsize=14, fontweight='bold', y=1.02)
plt.show()

print("\n" + "="*70)
print("KEY OBSERVATIONS:")
print("="*70)
print("\n1. VERTICAL EDGE FILTER:")
print("   - Output shows STRONG positive response (around +6)")
print("   - Why? The image has a vertical edge (dark‚Üíbright transition)")
print("   - The filter is designed to detect exactly this pattern!\n")

print("2. HORIZONTAL EDGE FILTER:")
print("   - Output shows ZERO response (all values near 0)")
print("   - Why? The image has no horizontal edges")
print("   - The filter is looking for a pattern that doesn't exist here\n")

print("3. IDENTITY FILTER:")
print("   - Output looks similar to the original image")
print("   - Why? The identity filter preserves the image (multiplies by 1)")
print("   - Notice the output is slightly smaller (3√ó3 instead of 5√ó5)")
print("="*70)

---

## üé® **What Just Happened?**

In the visualization above, you saw:

1. **Original Image (5√ó5):** A simple pattern with a vertical edge
   - Dark pixels (0) on the left
   - Bright pixels (1) on the right

2. **Three Filters (3√ó3):** Different patterns we're looking for
   - Vertical edge detector
   - Horizontal edge detector
   - Identity filter (no change)

3. **Three Outputs (3√ó3):** Results of convolution
   - **Vertical edge output:** Strong positive values (filter found the pattern!)
   - **Horizontal edge output:** Near zero (pattern not present)
   - **Identity output:** Similar to original (filter preserves image)

### **Why Did the Output Get Smaller?**

Notice the output is 3√ó3, but the input was 5√ó5. Why?

**Answer:** The 3√ó3 filter can only fit in certain positions:
- Starting at position (0,0), ending at position (2,2)
- We get (5-3+1) √ó (5-3+1) = 3√ó3 output positions

**In real CNNs:** We often use "padding" (adding zeros around the image) to keep the output the same size as the input. We're keeping it simple here!

---

## üîó **Connection to Lab 4 (Hidden Layers)**

Remember Lab 4, Module 0, where you saw how neural networks transform 2D data into higher dimensions?

**Convolution does something similar:**

| Lab 4 (Fully-Connected) | Lab 7 (Convolution) |
|-------------------------|---------------------|
| Takes all input values | Takes local patches |
| Creates new representation | Creates new representation |
| One transformation per neuron | One transformation per filter |
| Learns patterns globally | Learns patterns locally |

**Both are creating new representations!**
- Lab 4: $(x, y) \rightarrow (h_1, h_2, h_3, \ldots)$ (hidden layer activations)
- Lab 7: Image ‚Üí (edge map, texture map, shape map, ...) (feature maps)

**Key difference:**
- Fully-connected layers look at the entire input at once
- Convolutional layers look at small local patches, one at a time

This makes convolution **much more efficient** for images!

---

## üîó **Connection to Lab 6 (Saliency Maps)**

In Lab 6, you learned about **saliency maps**‚Äîvisualizations showing which parts of an input were important for a prediction.

**The connection:**
- **Lab 6 (Saliency):** Shows **WHERE** the model looks
- **Lab 7 (Convolution):** Shows **WHAT** the model extracts

**Example: Dog Image Classification**
- **Saliency map:** "The model focused on the dog's face and ears"
- **Convolution feature maps:** "The model extracted edge patterns, fur textures, and triangular shapes (ears)"

**Together, they explain how CNNs work:**
1. Convolution layers extract features (edges, textures, shapes)
2. The model uses those features to make predictions
3. Saliency maps show which extracted features were most important

---

## üìù **Questions (Q1-Q4)**

Before moving on, let's check your understanding. Record your answers in the **Answer Sheet**.

---

### **Q1. In your own words, what does a convolution operation do?**

*Hint: Think about the sliding window + multiply-and-add operation*

**Record your answer in the Answer Sheet.**

---

### **Q2. Look at the vertical edge filter `[[-1,0,1],[-1,0,1],[-1,0,1]]`. Why would this highlight vertical edges?**

*Hint: What happens when you multiply this filter with a patch that has dark pixels on the left and bright pixels on the right?*

**Record your answer in the Answer Sheet.**

---

### **Q3. What happens when you convolve an image with the identity filter `[[0,0,0],[0,1,0],[0,0,0]]`?**

*Hint: Look at the visualization above. Why does the identity filter preserve the image?*

**Record your answer in the Answer Sheet.**

---

### **Q4. How is convolution different from the dimension-lifting you saw in Lab 4 Module 0? What's similar?**

*Hint: Both create new representations. But one looks at the whole input, while the other looks at local patches.*

**Record your answer in the Answer Sheet.**

---

## ‚úÖ Module 0 Complete!

You now understand:
- **What convolution is** (sliding window + multiply-and-add)
- **Why it works** (local patterns, translation invariance, parameter sharing)
- **What filters detect** (edges, textures, patterns)
- **How it connects to previous labs** (hidden layers, saliency maps)

**Key insight:** Convolution is the fundamental operation that lets computers "see" patterns in images!

**Ready for real images?**

Move on to **Module 1: Applying Filters to Real Images**, where you'll use classic computer vision filters (Sobel, Laplacian, blur, sharpen) on actual photographs!

---