# Problem Description
Tiny ImageNet contains 100000 images of 200 classes (500 for each class) downsized to 64×64 colored images. Each class has 500 training images, 50 validation images and 50 test images.

# Metrics
For the evaluation of the model, we will use accuracy as our metric. It is straightforward and defined as follows:
$$ \text{Accuracy} = \frac{\text{correct classifications}}{\text{all classifications}} $$

However, accuracy has a disadvantage for multiclass classification problems, as it does not consider class imbalances. If our model is biased towards one class, and that class has the highest occurrence, accuracy may fail to reflect this bias. In our case, since the dataset does not have class imbalances, accuracy should be sufficient for our evaluation.

To estimate the error in the chosen metric, we could also consider using an alternative metric like the F1 Score, which penalizes false predictions rather than just summarizing the correct ones.


# Base Architecture
- Explain base model
- Train with single sample or batch and show that it works

In [12]:
import torch
import torch.nn as nn
import utils
from typing import List, Tuple, Dict


class CNN(nn.Module):
    def __init__(
        self,
        dim: int,
        num_classes: int,
        confs: List[Tuple[str, Dict]],
        in_channels: int,
        weight_init=None,
    ):
        super(CNN, self).__init__()

        self.net = nn.ModuleList()

        linear_idxs = [idx for idx, (layer, _) in enumerate(confs) if layer == "L"]
        linear_start = linear_idxs[0]
        convolution_conf = confs[:linear_start]
        linear_conf = confs[linear_start:]
        for layer, conf in convolution_conf:
            if layer == "C":
                self.net.append(
                    nn.Conv2d(
                        in_channels,
                        out_channels=conf["channels"],
                        kernel_size=conf["kernel"],
                        stride=conf.get("stride", 1),
                        padding=conf.get("padding", 0),
                    )
                )
                self.net.append(nn.ReLU())
                if conf.get("batch_norm", False):
                    self.net.append(nn.BatchNorm2d(conf["channels"]))
                if conf.get("dropout", 0):
                    self.net.append(nn.Dropout(conf["dropout"]))
                in_channels = conf["channels"]
            elif layer == "P":
                self.net.append(nn.MaxPool2d(kernel_size=conf["kernel"]))
            else:
                raise NotImplementedError(f"Layer {layer} not implemented")

        self.dim = utils.get_dim_after_conv_and_pool(dim_init=dim, confs=confs)
        print(f"self.dim: {self.dim},\nin_channels: {in_channels}")
        for idx, (layer, conf) in enumerate(linear_conf):
            if idx == 0:
                self.net.append(nn.Flatten())
                self.net.append(
                    nn.Linear(self.dim * self.dim * in_channels, conf["units"])
                )
                self.net.append(nn.ReLU())
                if conf.get("dropout", 0):
                    self.net.append(nn.Dropout(conf["dropout"]))
            elif idx == len(linear_conf) - 1:
                self.net.append(nn.Linear(conf["units"], num_classes))
            else:
                self.net.append(nn.Linear(conf["units"], conf["units"]))
                self.net.append(nn.ReLU())
                if conf.get("dropout", 0):
                    self.net.append(nn.Dropout(conf["dropout"]))

    def forward(self, x):
        N, H, W, C = x.shape
        x = x.permute(
            0, 3, 1, 2
        )  # Adjust (batch_size, H, W, C) to (batch_size, C, H, W)
        assert x.shape == (N, C, H, W)

        for layer in self.net:
            x = layer(x)

        return x

In [13]:
confs = [
    ("C", {"kernel": 3, "channels": 16}),
    ("P", {"kernel": 2}),
    ("C", {"kernel": 3, "channels": 32}),
    ("P", {"kernel": 2}),
    ("L", {"units": 500, "dropout": 0.5}),
    ("L", {"units": 500, "dropout": 0.5}),
]

In [24]:
x = torch.rand(10, 64, 64, 3)
model = CNN(dim=64, num_classes=200, confs=confs, in_channels=3)
model(x)

self.dim: 14,
in_channels: 32


tensor([[ 0.1232,  0.0999,  0.0368,  ...,  0.0669,  0.0408,  0.0596],
        [ 0.0231,  0.0232, -0.0239,  ...,  0.0165,  0.0469,  0.1135],
        [-0.0378,  0.0657,  0.0646,  ...,  0.0202, -0.0382, -0.0960],
        ...,
        [ 0.0591, -0.0282,  0.0562,  ...,  0.0490, -0.0234, -0.0127],
        [ 0.0834,  0.1145,  0.0871,  ...,  0.0984, -0.0463,  0.0839],
        [ 0.0103,  0.0018,  0.0955,  ..., -0.0178, -0.0084,  0.0198]],
       grad_fn=<AddmmBackward0>)

## Discussion

# SGD, Tuning of Learning Rate and Batch Size
- Explain SGD
- Explain Learning Rate
- Explain Batch Size

In [15]:
def my_code():
    pass

## Discussion

# SGD, Weight Initialization, Model Complexity, Convolution Settings
- Explain what you will do here

## Weight Initialization
- Explain the different weight initialization methods

In [16]:
def my_code():
    pass

### Discussion

## Model Complexity
- Explain what you will do

### Model Variant 1
- Explain

### Model Variant 2
- Explain

### Model Variant 3
- Explain

### Model Variant 4
- Explain

### Discussion
- Variant 1
- Variant 2
- Variant 3
- Variant 4

# Regularization
- Briefly describe what the goal of regularization methods in general is

## L1/L2
- Explain

In [17]:
def my_code():
    pass

## Dropout
- Explain

In [18]:
def my_code():
    pass

## Discussion
- To what extent is this goal achieved in the given case?

# Batchnorm (without REG, with SGD)
- Evaluate whether Batchnorm is useful. Describe what the idea of BN is, what it is supposed to help.

In [19]:
def my_code():
    pass

## Discussion

# Adam
- Explain

## Without BN, without REG
- Explain

In [20]:
def my_code():
    pass

## Without BN, with REG
- Explain

In [21]:
def my_code():
    pass

## Discussion

# Transfer Learning
- Explain

In [22]:
def my_code():
    pass

## Discussion

# Conclusion