# Part 1: The Geometry of Convolutions
## Q1.1 Manual Dimension Mapping


We design a CNN with:
- 3 convolution layers (3×3)
- 2 max pooling layers (2×2)
- 1 fully connected layer

Input image size: 64×64×3

Layer-wise spatial dimensions:

Input: 64×64×3

Conv1: 3×3, stride=1, padding=1, filters=32
ReLU

Conv2: 3×3, stride=1, padding=1, filters=64
ReLU

MaxPool1: 2×2, stride=2

Conv3: 3×3, stride=1, padding=1, filters=128
ReLU

MaxPool2: 2×2, stride=2

Flatten

FC: 10 outputs (Tiny-ImageNet-10 classes)


In [7]:
import torch
import torch.nn as nn
import torch.nn.functional as F


In [8]:
#CNN Architecture
class CNN_Q1(nn.Module):
    def __init__(self):
        super(CNN_Q1, self).__init__()

        # Convolution layers
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(32, 64, 3, padding=1)
        self.conv3 = nn.Conv2d(64, 128, 3, padding=1)

        # Pooling
        self.pool = nn.MaxPool2d(2, 2)

        # Fully connected
        self.fc = nn.Linear(16 * 16 * 128, 10)

    def forward(self, x):

        print("Input:", x.shape)

        x = F.relu(self.conv1(x))
        print("After Conv1:", x.shape)

        x = F.relu(self.conv2(x))
        print("After Conv2:", x.shape)

        x = self.pool(x)
        print("After MaxPool1:", x.shape)

        x = F.relu(self.conv3(x))
        print("After Conv3:", x.shape)

        x = self.pool(x)
        print("After MaxPool2:", x.shape)

        x = x.view(x.size(0), -1)
        print("After Flatten:", x.shape)

        x = self.fc(x)
        print("After FC:", x.shape)

        return x


In [9]:
model = CNN_Q1()

dummy_input = torch.randn(1, 3, 64, 64)  # batch_size=1

output = model(dummy_input)


Input: torch.Size([1, 3, 64, 64])
After Conv1: torch.Size([1, 32, 64, 64])
After Conv2: torch.Size([1, 64, 64, 64])
After MaxPool1: torch.Size([1, 64, 32, 32])
After Conv3: torch.Size([1, 128, 32, 32])
After MaxPool2: torch.Size([1, 128, 16, 16])
After Flatten: torch.Size([1, 32768])
After FC: torch.Size([1, 10])


# Effect of Removing Pooling Layers

With pooling:
Flatten size = 32768  
FC parameters = 32768 × 10 + 10 = 327,690  

Without pooling:
Flatten size = 524,288  
FC parameters = 524,288 × 10 + 10 = 5,242,890  

This shows a ~16× increase in parameters, known as Parameter Explosion.

Pooling reduces spatial dimensions, making the network more efficient,
reducing overfitting, and improving training stability.


In [10]:
class CNN_NoPooling(nn.Module):
    def __init__(self):
        super(CNN_NoPooling, self).__init__()

        self.conv1 = nn.Conv2d(3, 32, 3, padding=1)
        self.conv2 = nn.Conv2d(32, 64, 3, padding=1)
        self.conv3 = nn.Conv2d(64, 128, 3, padding=1)

        self.fc = nn.Linear(64 * 64 * 128, 10)

    def forward(self, x):

        print("Input:", x.shape)

        x = F.relu(self.conv1(x))
        print("After Conv1:", x.shape)

        x = F.relu(self.conv2(x))
        print("After Conv2:", x.shape)

        x = F.relu(self.conv3(x))
        print("After Conv3:", x.shape)

        x = x.view(x.size(0), -1)
        print("After Flatten:", x.shape)

        x = self.fc(x)
        print("After FC:", x.shape)

        return x


In [11]:
model_no_pool = CNN_NoPooling()

dummy_input = torch.randn(1, 3, 64, 64)

output = model_no_pool(dummy_input)


Input: torch.Size([1, 3, 64, 64])
After Conv1: torch.Size([1, 32, 64, 64])
After Conv2: torch.Size([1, 64, 64, 64])
After Conv3: torch.Size([1, 128, 64, 64])
After Flatten: torch.Size([1, 524288])
After FC: torch.Size([1, 10])


To validate the parameter explosion, we implemented the same CNN
architecture without pooling layers.

The flatten size increased from 32,768 → 524,288,
confirming the theoretical calculation and showing why pooling is
essential for controlling model complexity.
