<a href="https://colab.research.google.com/github/MatchLab-Imperial/deep-learning-course/blob/master/04_Common_CNN_architectures.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Coursework**

## Task 1: Classification on Tiny-ImageNet

In this task, we are going to explore different models to do classification on 64x64 Tiny-ImageNet. Tiny-ImageNet is a smaller version of ImageNet (as the name indicates), containing "only" 200 classes. Each class has 500 images. The test set contains 10,000 images. All images are 64x64 RGB images.

In the Network Training notebook, we explained how to define a validation set, and now we will put that into practice. Hence, as we now have a bigger dataset, we are going to use the standard split of training, validation, and test data. Therefore, you will check the performance of the network in the validation set while training your network. Hence, your decisions need to be based on validation performance. Once you have obtained your best model using the training and validation data, you need to report the performance on the test set. Please try to no overfit to the test data, as in other problems it may not be available to you.

In this exercise, you are asked to train VGG models with different strategies. Optionally, you are asked to use any other architecture of your choice to do classification in Tiny-ImageNet.

Run the following script to get the data.




In [None]:
# Download TinyImageNet
! git clone https://github.com/seshuad/IMagenet

In [None]:
def get_id_dictionary():
    id_dict = {}
    for i, line in enumerate(open('IMagenet/tiny-imagenet-200/wnids.txt', 'r')):
        id_dict[line.replace('\n', '')] = i
    return id_dict

def get_class_to_id_dict():
    id_dict = get_id_dictionary()
    all_classes = {}
    result = {}
    for i, line in enumerate(open('IMagenet/tiny-imagenet-200/words.txt', 'r')):
        n_id, word = line.split('\t')[:2]
        all_classes[n_id] = word
    for key, value in id_dict.items():
        result[value] = (key, all_classes[key])

    return result

def get_data(id_dict):
    train_data, val_data, test_data = [], [], []
    train_labels, val_labels, test_labels = [], [], []
    for key, value in id_dict.items():
        train_data += [cv2.imread('IMagenet/tiny-imagenet-200/train/{}/images/{}_{}.JPEG'.format(key, key, str(i))) for i in range(450)]
        train_labels += [value] * 450

        val_data += [cv2.imread('IMagenet/tiny-imagenet-200/train/{}/images/{}_{}.JPEG'.format(key, key, str(i))) for i in range(450, 500)]
        val_labels += [value] * 50

    for line in open('IMagenet/tiny-imagenet-200/val/val_annotations.txt'):
        img_name, class_id = line.split('\t')[:2]
        test_data.append(cv2.imread(f'IMagenet/tiny-imagenet-200/val/images/{img_name}'))
        test_labels.append(id_dict[class_id])

    return np.array(train_data), np.array(train_labels), np.array(val_data), np.array(val_labels), np.array(test_data), np.array(test_labels)

def shuffle_data(train_data, train_labels, val_data, val_labels):
    # This function shuffles separately the train set and the
    # validation set
    size = len(train_data)
    train_idx = np.arange(size)
    np.random.shuffle(train_idx)

    size = len(val_data)
    val_idx = np.arange(size)
    np.random.shuffle(val_idx)

    return train_data[train_idx], train_labels[train_idx], val_data[val_idx], val_labels[val_idx]

train_data, train_labels, val_data, val_labels, test_data, test_labels = get_data(get_id_dictionary())
train_data, train_labels, val_data, val_labels = shuffle_data(train_data, train_labels, val_data, val_labels)

# Let's visualize some examples
N=3
start_val = 0 # pick an element for the code to plot the following N**2 values
fig, axes = plt.subplots(N,N)
for row in range(N):
  for col in range(N):
    idx = start_val+row+N*col
    tmp = cv2.cvtColor(train_data[idx],cv2.COLOR_BGR2RGB)
    axes[row,col].imshow(tmp, cmap='gray')
    fig.subplots_adjust(hspace=0.5)
    axes[row,col].set_xticks([])
    axes[row,col].set_yticks([])

In [None]:
train_data, train_labels, val_data, val_labels, test_data, test_labels = get_data(get_id_dictionary())
train_data, train_labels, val_data, val_labels = shuffle_data(train_data, train_labels, val_data, val_labels)

# Normalize to [0, 1]
train_data = torch.from_numpy(train_data).permute(0, 3, 1, 2).float() / 255.0
val_data   = torch.from_numpy(val_data).permute(0, 3, 1, 2).float() / 255.0
test_data  = torch.from_numpy(test_data).permute(0, 3, 1, 2).float() / 255.0

# Normalize to [-1, 1], channel independent
mean = train_data.mean(dim=(0, 2, 3))
std  = train_data.std(dim=(0, 2, 3))
train_data = (train_data - mean[None, :, None, None]) / (std[None, :, None, None] + 1e-7)
val_data   = (val_data   - mean[None, :, None, None]) / (std[None, :, None, None] + 1e-7)
test_data  = (test_data  - mean[None, :, None, None]) / (std[None, :, None, None] + 1e-7)

# Build data loader
train_dataset = TensorDataset(train_data, torch.from_numpy(train_labels))
train_loader = DataLoader(train_dataset, batch_size=128, shuffle=True)

val_dataset = TensorDataset(val_data, torch.from_numpy(val_labels))
val_loader = DataLoader(val_dataset, batch_size=128, shuffle=False)

test_dataset = TensorDataset(test_data, torch.from_numpy(test_labels))
test_loader = DataLoader(test_dataset, batch_size=128, shuffle=False)

**Report**:
*   In a plot, please report the training and validation accuracy curves for the following models:

> *   VGG16 trained from scratch.

> *   Transfer Learning VGG16: load pre-trained ImageNet weights and only train the newly added dense layers.  To do so, freeze all layers and only train the dense layers you have modified in the model.

> *   Fine-tuning VGG16: load pre-trained ImageNet weights and train the whole architecture.

*   Discuss the previous figure in the main text. And report in a table the test accuracy and the training and inference times of previous VGG16 experiments. Training times are computed per epoch and you can find them displayed in the .fit() method information. Report either the total training time, or the number of epochs and training time per epoch. Inference times are computed per image, and we give you the code below to obtain them.

*   Now that we are familiar with loading and using models in Pytorch, you can use any model of your choice to classify Tiny-ImageNet. You can take the model directly from Pytorch, any GitHub repository, or do the code yourself. You need to report your results in the previous table and compare your model of choice with previous VGG16 networks.

Note that training/inference time will depend on which GPU you are using. Report the time results in the same instance, or at least when using the same GPU. Report also the GPU you were using to compute those inference times.

In [None]:
import copy
import time
# You may want to import more modules.

# Early stopping utility
class EarlyStopping:
    def __init__(self, patience=5, min_delta=0.0):
        self.patience = patience
        self.min_delta = min_delta
        self.counter = 0
        self.best_accuracy = None
        self.best_model_weight = None

    def __call__(self, model, val_accuracy):
        if self.best_accuracy is None:
            self.best_accuracy = val_accuracy
            self.best_model_weight = copy.deepcopy(model.state_dict())
        elif val_accuracy < self.best_accuracy + self.min_delta:
            self.counter += 1
            return self.counter >= self.patience
        else:
            self.best_accuracy = val_accuracy
            self.best_model_weight = copy.deepcopy(model.state_dict())
            self.counter = 0
            return False


set_seed(42)

# Define your model here
# model = ...

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-3)
early_stopping = EarlyStopping(patience=5, min_delta=1e-3)

for epoch in range(20):
    # Training phase
    model.train()
    train_loss = 0.0
    train_correct = 0.0

    time_start = time.time()
    for inputs, labels in train_loader:
        inputs, labels = inputs.to(device), labels.to(device)

        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        train_loss += loss.item() * inputs.size(0)
        train_correct += (outputs.argmax(1) == labels).sum().item()
    time_end = time.time()

    # Validation phase
    model.eval()
    val_loss = 0.0
    val_correct = 0

    with torch.no_grad():
        for inputs, labels in val_loader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)

            loss = criterion(outputs, labels)
            val_loss += loss.item() * inputs.size(0)
            val_correct += (outputs.argmax(1) == labels).sum().item()

    train_loss /= len(train_loader.dataset)
    train_accuracy = train_correct / len(train_loader.dataset) * 100
    val_loss /= len(val_loader.dataset)
    val_accuracy = val_correct / len(val_loader.dataset) * 100

    print((
        f"Epoch [{epoch+1}/20] "
        f"Train Loss: {train_loss:.4f}, "
        f"Train Accuracy: {train_accuracy:.2f}%, "
        f"Validation Loss: {val_loss:.4f}, "
        f"Validation Accuracy: {val_accuracy:.2f}%, "
        f"Training Time/Epoch: {time_end-time_start:.2f}s"
    ))

    # Early stopping check
    if early_stopping(model, val_accuracy):
        print("Early stopping triggered.")
        print()
        break

# Load best model weights
model.load_state_dict(early_stopping.best_model_weight)

# Evaluate on test set
running_time = 0.0
test_loss = 0.0
test_correct = 0

model.eval()
with torch.no_grad():
    for inputs, labels in test_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        time_start = time.time()
        outputs = model(inputs)
        time_end = time.time()
        loss = criterion(outputs, labels)

        test_loss += loss.item() * inputs.size(0)
        test_correct += (outputs.argmax(1) == labels).sum().item()
        running_time += time_end - time_start

test_loss /= len(test_loader.dataset)
test_accuracy = test_correct / len(test_loader.dataset) * 100
time_per_image = running_time / len(test_loader.dataset)

print(f'Test Loss: {test_loss:.4f}, Test Accuracy: {test_accuracy:.2f}%, Inference Time/Image: {time_per_image:.4f}s')

## Task 2: ConvNeXt Model Scaling on Tiny-ImageNet
ConvNeXt is a modern convolutional neural network architecture introduced by Facebook AI Research in the 2022 paper titled "[A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545)" by Liu et al. It reimagines the classic ResNet architecture by integrating design principles from Vision Transformers (ViTs), such as large kernel sizes, inverted bottlenecks, depthwise convolutions, Layer Normalization, and GELU activations. Despite being fully convolutional, ConvNeXt matches or even surpasses the performance of transformer-based models like the Swin Transformer on benchmarks such as ImageNet. Its success demonstrated that, with the right architectural updates, CNNs can remain competitive in the transformer era, significantly influencing how researchers view the future of convolutional models in computer vision.

Pytorch offers pretrained model architecture for ConvNeXt at varying sizes, which can be seen [here](https://docs.pytorch.org/vision/main/models/convnext.html).

In this task, we explore how increasing model size affects performance while keeping architecture the same. You will load ConvNeXt-Tiny, Small, Base, and Large pretrained on ImageNet, and evaluate them on the previously loaded TinyImageNet data to investigate the tradeoff in performance for size constraints.

*Note: You are NOT training or fine-tuning these models.*

Use the code below to load the data in the format required by the models.

In [None]:
# Define transforms for ConvNeXt input (resize to 224, normalize like ImageNet)
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

class TinyImageNetTestDataset(Dataset):
    def __init__(self, images, labels, transform=None):
        self.images = images
        self.labels = np.argmax(labels, axis=1)
        self.transform = transform

    def __len__(self):
        return len(self.images)

    def __getitem__(self, idx):
        img = self.images[idx]
        label = self.labels[idx]
        if self.transform:
            img = self.transform(img)
        return img, label

test_dataset = TinyImageNetTestDataset(test_data, test_labels, transform=transform)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)

In [None]:
def fit_model_head(model, num_classes=200):
    # Map model_name string to torchvision convnext constructor
    # Replace the classifier head to match Tiny-ImageNet classes
    num_ftrs = model.classifier[2].in_features  # ConvNeXt classifier is nn.Sequential(..., nn.Linear)
    model.classifier[2] = nn.Linear(num_ftrs, num_classes)
    return model

# ...

**Report**:
*   In a plot, report the accuracy and test MSE against number of parameters of the model.

*   Discuss the previous figure(s) in the main text, explaining the trend seen between model size and performance and comment on the benefits/drawbacks of using a larger/smaller model.
