# Implement a CNN for CIFAR-10 (With Custom Layers)


The tasks was taken from: https://github.com/Exorust/TorchLeet/tree/main/torch/medium

## Problem Statement
You are tasked with implementing a **Convolutional Neural Network (CNN)** for image classification on the **CIFAR-10** dataset using PyTorch. However, instead of using PyTorch's built-in `nn.Conv2d` and `nn.MaxPool2d`, you must implement these layers **from scratch** using `nn.Module`. Your model will include convolutional layers for feature extraction, pooling layers for downsampling, and fully connected layers for classification.

### Requirements
1. **Implement Custom Layers:**
   - Create a custom `Conv2dCustom` class that mimics the behavior of `nn.Conv2d`.
   - Create a custom `MaxPool2dCustom` class that mimics the behavior of `nn.MaxPool2d`.
2. **Define the CNN Model:**
   - Use `Conv2dCustom` for convolutional layers.
   - Use `MaxPool2dCustom` for pooling layers.
   - Use standard `nn.Linear` for fully connected layers.
   - The model should process input images of shape `(3, 32, 32)` as in the CIFAR-10 dataset.

### Constraints
   - You must not use `nn.Conv2d` or `nn.MaxPool2d`. Use your own custom implementations.
   - The CNN should include multiple convolutional and pooling layers, followed by fully connected layers.
   - Ensure the model outputs class predictions for **10 classes**, as required by CIFAR-10.

**! Hint:**
   - Define `Conv2dCustom` and `MaxPool2dCustom` as subclasses of `nn.Module`. - Use nested loops and tensor slicing to perform the operations.
   - In `CNNModel.__init__`, use these custom layers to build the architecture.
   - Implement the forward pass to pass inputs through convolution, activation, pooling, flattening, and fully connected layers.

### Code template

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import torch.nn.functional as F

# Load CIFAR-10 dataset
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

train_dataset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)

test_dataset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=64, shuffle=False)

class Conv2dCustom(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, stride=1, padding=0):
        ...

    def forward(self, x):
        ...

class MaxPool2dCustom(nn.Module):
    def __init__(self, kernel_size, stride=None):
        ...

    def forward(self, x):
        ...

# Define the CNN Model
class CNNModel(nn.Module):
    def __init__(self):
        super(CNNModel, self).__init__()
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1)  # Output: 32x32x32
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1)  # Output: 64x32x32
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)  # Output: 64x16x16
        self.fc1 = nn.Linear(64 * 16 * 16, 128)
        self.fc2 = nn.Linear(128, 10)
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.relu(self.conv1(x))
        x = self.pool(self.relu(self.conv2(x)))
        x = x.view(x.size(0), -1)  # Flatten
        x = self.relu(self.fc1(x))
        x = self.fc2(x)
        return x

# Initialize the model, loss function, and optimizer
model = CNNModel()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
epochs = 10
for epoch in range(epochs):
    for images, labels in train_loader:
        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward pass and optimization
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    print(f"Epoch [{epoch + 1}/{epochs}], Loss: {loss.item():.4f}")

# Evaluate on the test set
correct = 0
total = 0
with torch.no_grad():
    for images, labels in test_loader:
        outputs = model(images)
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f"Test Accuracy: {100 * correct / total:.2f}%")

## Solution

### Rephrase

- Implement convolution and pooling - it involves intensive work with multidimensional tensors (`[batch, channels, height, width]`)
   - work with `view(), permute(), reshape(), transpose(), expand(), repeat()`
   - Implement complex indexing and slicing
- Initialization of scales
- Calculating gradients
- include multiple convolutional and pooling layers, followed by fully connected layers
- Ensure the model outputs class predictions for 10 classes, as required by CIFAR-10.

### Ideas behined

The key goal of this task is to break down the barrier between the "user" of a neural network and its "creator."

Using `nn.Conv2d` is the user level. You know what it does and can use it.

Implementing `Conv2dCustom` is the developer/researcher level. You know how it works and can create one from scratch. This transition is crucial for professional growth in Machine Learning.

So, even if you always use built-in modules in real work, this experience will give you invaluable insights that will help you debug models more effectively, design new architectures, and gain a deeper understanding of the underlying processes.

### Implementing note

- Formally nn.Conv2d uses Cross-correlation, so we are not going to rotate the core matrix

## Solution Code

### First Code Attempt

In [2]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import math


class Conv2dCustom(nn.Module):
  def __init__(
    self,
    in_channels: int,
    out_channels: int,
    kernel_size: tuple[int, int] = (1, 1),
    stride: tuple[int, int] = (1, 1),
    padding: tuple[int, int] = (0, 0),
    dilation: tuple[int, int] = (1, 1),
  ):
    super().__init__()

    # save hyperparams
    self.in_channels = in_channels
    self.out_channels = out_channels
    self.kernel_size = kernel_size
    self.stride = stride
    self.padding = padding
    self.dilation = dilation

    # init weights
    # https://docs.pytorch.org/docs/stable/generated/torch.nn.parameter.Parameter.html#torch.nn.parameter.Parameter
    self.weight = nn.Parameter(
        torch.Tensor(out_channels, in_channels, kernel_size[0], kernel_size[1])
        )

    # Init weight by Kaiming Uniform (same as in PyTorch),
    # by logic it's Xavier
    # read more: https://github.com/galkinc/ml-projects-lab/blob/main/foundations/pytorch-coding-challenges/medium_tasks/implement_parameter_initialization_for_cnn.ipynb
    nn.init.kaiming_uniform_(self.weight, a=math.sqrt(5))

  def add_padding(self, input_tensor):
    padding = self.padding
    if  padding == (0, 0):
      return input_tensor

    if len(padding) == 4:
      return F.pad(input_tensor, padding, mode='constant', value=0)

    pad_height, pad_width = padding
    # F.pad expects to be filled in the format: [left, right, top, bottom]
    padding_tuple = (pad_width, pad_width, pad_height, pad_height)

    return F.pad(input_tensor, padding_tuple, mode='constant', value=0)

  def calculate_output_size(self, input_size):
    # output_size = floor((input_size + 2*padding - dilation*(kernel_size-1) - 1) / stride + 1)
    def calc_dim(input_dim, kernel, stride, pad, dil):
        return (input_dim + 2 * pad - dil * (kernel - 1) - 1) // stride + 1

    # expand values
    height, width = input_size
    kernel_h, kernel_w = self.kernel_size
    stride_h, stride_w = self.stride
    pad_h, pad_w = self.padding
    dilation_h, dilation_w = self.dilation

    out_h = calc_dim(height, kernel_h, stride_h, pad_h, dilation_h)
    out_w = calc_dim(width, kernel_w, stride_w, pad_w, dilation_w)
    return out_h, out_w

  def create_output_tensor(self, input_tensor):
    batch_size = input_tensor.shape[0]
    input_height, input_width = input_tensor.shape[2], input_tensor.shape[3]
    output_size = self.calculate_output_size(input_size=(input_height, input_width))

    out_height, out_width = output_size

    # create a tensor with a normal shape and on the same device
    return torch.empty(
        batch_size,
        self.out_channels,
        out_height,
        out_width,
        device=input_tensor.device,
        dtype=input_tensor.dtype
    )

  def conv_implementation(self, x_padded, output):
    in_channels = self.in_channels
    batch_size, _, in_height, in_width = x_padded.shape
    kernel_h, kernel_w = self.kernel_size
    stride_h, stride_w = self.stride

    out_height = output.shape[2]
    out_width = output.shape[3]

    for oc in range(self.out_channels):
      for b in range(batch_size):
          for oh in range(out_height):
              for ow in range(out_width):
                # Calculate the starting position of the window in the input tensor
                h_start = oh * stride_h
                w_start = ow * stride_w
                h_end = h_start + kernel_h
                w_end = w_start + kernel_w

                # Extracting a window from the input tensor
                window = x_padded[b, :, h_start:h_end, w_start:w_end] # window shape: [in_channels, kernel_h, kernel_w]

                # We get the weights for the current output channel
                weights = self.weight[oc]  # shape: [in_channels, kernel_h, kernel_w]

                # Convolution calculation - element-wise multiplication and summation of all elements of the resulting tensor
                result = torch.sum(window* weights)
                output[b, oc, oh, ow] = result

    return output


  def forward(self, x):

      # 1. Add padding
      x_padded = self.add_padding(x)

      # 2. Calculate size of the output tensor and prepare it
      output = self.create_output_tensor(x)

      # 3. Custom convolution
      output = self.conv_implementation(x_padded, output)

      return output


### Full solution code

## Results

| Solution Checking | Target | Result | Status |
|----------|------------|----------|--------|
| Conv2D  | 0.2770     | 71.13%   | âœ… **Best** |
| CustomConv2D | 2.3027     | 10.00%   | ðŸ”´ **Failed** |
| Conv2D + MaxPool2d | 57.3428    | 28.92%   | ðŸŸ¡ **Poor** |
| CustomConv2D + MaxPool2d | 57.3428    | 28.92%   | ðŸŸ¡ **Poor** |
| Conv2D + CustomMaxPool2d  | 0.1859     | 70.12%   | ðŸŸ¢ **Good** |
| CustomConv2D + CustomMaxPool2d  | 0.1859     | 70.12%   | ðŸŸ¢ **Good** |

---

## Post-Thinking notes

...

## Interesting Publications