# Week 8: Convolutional Neural Networks - Homework

**ML2: Advanced Machine Learning**

**Estimated Time**: 1 hour

---

This homework combines programming exercises and knowledge-based questions to reinforce this week's concepts.

## Setup

Run this cell to import necessary libraries:

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.nn as nn

# Set random seed for reproducibility
np.random.seed(42)
torch.manual_seed(42)

print('✓ Libraries imported successfully')

---
## Part 1: Programming Exercises (60%)

Complete the following programming tasks. Read each description carefully and implement the requested functionality.

### Exercise 1: Experiment: Hierarchical Features in CNNs

**Time**: 10 min

Observe how CNN layers learn from simple edges to complex patterns.

In [None]:
import torch
import torch.nn as nn
import matplotlib.pyplot as plt

# Simple CNN to understand feature hierarchies
class SimpleCNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 8, kernel_size=3, padding=1)  # Layer 1: edges
        self.conv2 = nn.Conv2d(8, 16, kernel_size=3, padding=1)  # Layer 2: patterns
        self.conv3 = nn.Conv2d(16, 32, kernel_size=3, padding=1)  # Layer 3: objects
        self.pool = nn.MaxPool2d(2, 2)
        self.fc = nn.Linear(32 * 3 * 3, 10)
    
    def forward(self, x):
        x1 = torch.relu(self.conv1(x))  # First layer activations
        x = self.pool(x1)
        x2 = torch.relu(self.conv2(x))  # Second layer activations  
        x = self.pool(x2)
        x3 = torch.relu(self.conv3(x))  # Third layer activations
        x = self.pool(x3)
        x = x.view(-1, 32 * 3 * 3)
        x = self.fc(x)
        return x, (x1, x2, x3)

# TODO: Train on MNIST and visualize what each layer learns
# Layer 1 = edges/simple patterns
# Layer 2 = combinations of edges  
# Layer 3 = digit-specific features

---
## Part 2: Knowledge Questions (40%)

Answer the following questions to test your conceptual understanding.

### Question 1 (Short Answer)

**Question 1 - Why Convolution for Images?**

A 28x28 image = 784 pixels. A fully connected layer would have 784 weights PER neuron.

A 3x3 convolution has only 9 weights that slide across the entire image.

Explain:
1. Why does parameter sharing (reusing the same 3x3 filter) make sense for images?
2. What assumption are we making about images?
3. When might this assumption fail?

**Hint**: Images have LOCAL spatial structure. A horizontal edge detector works everywhere in the image.

**Your Answer**:

[Write your answer here in 2-4 sentences]

### Question 2 (Short Answer)

**Question 2 - Hierarchical Feature Learning**

CNN Layer 1 learns: edges, corners, color blobs
CNN Layer 2 learns: textures, simple shapes  
CNN Layer 3 learns: object parts (eyes, wheels, ears)
CNN Layer 4 learns: whole objects (faces, cars, dogs)

Explain:
1. Why does this hierarchy emerge automatically?
2. How does each layer build on the previous one?
3. Why can't Layer 1 learn complex objects directly?

**Hint**: Early layers have small receptive fields. Deeper layers see larger regions.

**Your Answer**:

[Write your answer here in 2-4 sentences]

### Question 3 (Multiple Choice)

**Question 3 - Receptive Field**

A neuron's receptive field = the region of the input image it "sees".

After two 3x3 conv layers, what is the receptive field?

A) Still 3x3
B) 6x6  
C) 5x5
D) 9x9

A) Still 3x3
B) 6x6
C) 5x5
D) 9x9

**Hint**: Each layer adds context. 3x3 + 3x3 overlaps to create a 5x5 field.

**Your Answer**: [Write your answer here - e.g., 'B']

**Explanation**: [Explain why this is correct]

### Question 4 (Short Answer)

**Question 4 - Pooling's Purpose**

MaxPooling (2x2) reduces a 28x28 image to 14x14.

Explain:
1. What information is lost?
2. What is gained?
3. Why is this tradeoff beneficial?

**Hint**: Lost: exact position. Gained: translation invariance, reduced computation.

**Your Answer**:

[Write your answer here in 2-4 sentences]

### Question 5 (Short Answer)

**Question 5 - Translation Invariance**

A dog in the left side of an image vs right side should both be recognized as "dog".

Explain:
1. How do CNNs achieve translation invariance?
2. What role do convolutions play?
3. What role does pooling play?

**Hint**: Convolution: same filter everywhere. Pooling: discards exact position.

**Your Answer**:

[Write your answer here in 2-4 sentences]

### Question 6 (Multiple Choice)

**Question 6 - Filter/Kernel Count**

A conv layer with 64 filters means:

A) Each filter detects a specific feature (edge, texture, etc.)
B) All 64 filters do the same thing
C) More filters = slower only, no benefit
D) Filters must be 3x3

A) Each filter detects a specific feature (edge, texture, etc.)
B) All 64 filters do the same thing
C) More filters = slower only, no benefit
D) Filters must be 3x3

**Hint**: Each filter learns to detect a different pattern.

**Your Answer**: [Write your answer here - e.g., 'B']

**Explanation**: [Explain why this is correct]

### Question 7 (Short Answer)

**Question 7 - Why Not Fully Connected?**

Compare:
- FC layer on 224x224x3 image: ~150 million parameters
- Conv layers: ~1-10 million parameters

Explain:
1. Why does the FC layer have so many more parameters?
2. What's the problem with that many parameters?
3. Why is weight sharing in CNNs more sample-efficient?

**Hint**: FC: every pixel connects to every neuron. Conv: 3x3 filter reused everywhere.

**Your Answer**:

[Write your answer here in 2-4 sentences]

### Question 8 (Short Answer)

**Question 8 - 1x1 Convolutions**

A 1x1 convolution seems useless (no spatial context).

But they're actually very useful! Explain: What does a 1x1 conv accomplish?

**Hint**: It changes the NUMBER of channels (dimensionality reduction/expansion), not spatial dims.

**Your Answer**:

[Write your answer here in 2-4 sentences]

### Question 9 (Short Answer)

**Question 9 - Skip Connections (ResNet)**

ResNets add skip connections: output = F(x) + x

This allows training networks with 100+ layers.

Explain:
1. Why do very deep networks without skip connections fail to train?
2. How do skip connections solve this?
3. What does this enable?

**Hint**: Vanishing gradients. Skip connections provide gradient highways.

**Your Answer**:

[Write your answer here in 2-4 sentences]

### Question 10 (Short Answer)

**Question 10 - Real-World Application**

Medical imaging uses CNNs to detect tumors in X-rays.

Explain:
1. Why are CNNs better than fully connected networks for this task?
2. What might early CNN layers learn to detect?
3. What might deeper layers detect?

**Hint**: Early: edges, tissue textures. Deep: tumor-specific patterns, anomalies.

**Your Answer**:

[Write your answer here in 2-4 sentences]

---
## Submission

Before submitting:
1. Run all cells to ensure code executes without errors
2. Check that all questions are answered
3. Review your explanations for clarity

**To Submit**:
- File → Download → Download .ipynb
- Submit the notebook file to your course LMS

**Note**: Make sure your name is in the filename (e.g., homework_01_yourname.ipynb)