# Assignment 2 -- CNN

## Goal

1. Learn to design an CNN architecture.

2. Learn to implement and realize the batch normalization.

3. Learn to implement the pooling layer.

4. Learn to implement the residual block.

5. Learn to implement the **fully** convolutional residual network without fully connected layers or MLP.


## Score

1. Part 4: Batch Normalization 20%

2. Part 6: Residual Block 20%

3. Part 7: Residual Network 20%

4. Model size 15%:

* 10%: If your model (the number of parameters) is smaller than **6MB**, you will get 10%. Otherwise, no points will be awarded.
* 5%:  The remaining 5% will depend on your ranking within the class.

5. Model accuracy 15%:

* 10%: If your accuracy is higher than **78%**, you will get 10%. Otherwise, no points will be awarded.
* 5%:  The remaining 5% will depend on your ranking within the class.

6. Model accuracy on another dataset 10%: it will depand on your ranking within the class.

## Rule

1. Please do NOT call any existing library for your implementations.
2. Please do NOT attempt to modify the sections `DO NOT MODIFY`.

## Submission

Upload your files to NTU Cool.
* This .ipynb file: Please rename this file with the format (DL_HW2_StudentID.ipynb)
* Model : .pt file
* Output: .csv file

Deadline: 4/8 midnight (23:59)

## Task

In the following instuction, please design a **fully** convolutional residual network  to label images from The CIFAR-10 dataset. The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

Here are the classes in the dataset, as well as 10 random images from each:
![picture](https://drive.google.com/uc?id=1ipIz2kN9fbvaDE1tSXgED3Sthhtyy_Gh)

The classes are completely mutually exclusive. There is no overlap between automobiles and trucks. "Automobile" includes sedans, SUVs, things of that sort. "Truck" includes only big trucks. Neither includes pickup trucks.

You can find more information on https://www.cs.toronto.edu/~kriz/cifar.html.

Please fill your student ID number below

In [60]:
# Please fill your student ID number
student_id = 'b10705016'

## Part 1

Import the necessary libraries

`DO NOT MODIFY`

In [61]:
# Model
import torch
import torch.nn as nn

# Dataset
from torchvision import datasets
from torch.utils.data.dataset import Dataset
from torch.utils.data import DataLoader
from scipy.io import loadmat

# Optimizer
from torch.optim.optimizer import Optimizer

# Pre-processing
import torchvision.transforms as trns
from PIL import Image

## Part 2

`DO NOT MODIFY`

Global variables.

Please keep these hyper-parameters unchange.

In [62]:
batch_size = 16
num_classes = 10
input_channel = 3
num_epoch = 10
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

## Part 3

`DO NOT MODIFY`

Create dataloader with pre-processing of dataset.

In [63]:
# Create train/test transforms
train_transform = trns.Compose([
    trns.ToTensor(),
])

test_transform = trns.Compose([
    trns.ToTensor(),
])

# Create train/test datasets with pre-processing
# The dataset will automatic download if does not exist
data_train = datasets.CIFAR10(root='./dataset/', train=True, transform=train_transform, download=True)
data_test = datasets.CIFAR10(root='./dataset/', train=False, transform=test_transform, download=True)

# Create train/test dataloader for datasets with  pre-processing
train_loader = DataLoader(data_train, batch_size=batch_size, shuffle=True)
test_loader  = DataLoader(data_test,  batch_size=batch_size, shuffle=False)

Files already downloaded and verified
Files already downloaded and verified


## Part 4

Please implement batch normalization.

![picture](https://drive.google.com/uc?id=1agPNiE0-YmnmMs711RW52CrH67ETSgPU)

In [64]:
class myBatchNorm(nn.Module):

    def __init__(self, input_channel, eps=1e-4, momentum=0.1):

        super().__init__()

        self.eps = eps
        self.momentum = momentum
        shape = (1, input_channel, 1, 1)

        self.gamma = nn.Parameter(torch.ones(shape))
        self.beta = nn.Parameter(torch.zeros(shape))

        self.moving_mean = torch.zeros(shape)
        self.moving_var = torch.ones(shape)

    def forward(self, x):

        if self.moving_mean.device != x.device:
            self.moving_mean = self.moving_mean.to(x.device)
            self.moving_var = self.moving_var.to(x.device)

        y, self.moving_mean, self.moving_var = self.batch_norm(
            x, self.gamma, self.beta, self.moving_mean,
            self.moving_var, self.eps, self.momentum)

        return y

    def batch_norm(self, x, gamma, beta, moving_mean, moving_var, eps, momentum):
        if not torch.is_grad_enabled():
            # Use moving averages for inference
            x_hat = (x - moving_mean) / torch.sqrt(moving_var + eps)
        else:
            # Compute mean and variance from x (current batch) for training
            batch_mean = torch.mean(x, dim=(0, 2, 3), keepdim=True)
            batch_var = torch.var(x, dim=(0, 2, 3), keepdim=True, unbiased=False)

            # Normalize the input
            x_hat = (x - batch_mean) / torch.sqrt(batch_var + eps)

            # Update moving averages
            moving_mean.data = momentum * moving_mean.data + (1.0 - momentum) * batch_mean.data
            moving_var.data = momentum * moving_var.data + (1.0 - momentum) * batch_var.data

        # Scale and shift
        y = gamma * x_hat + beta

        return y, moving_mean, moving_var


## Part 5

`DO NOT MODIFY`

Basic convolutional layer, activation function, and pooling layers.

In [65]:
class myConvolution(nn.Module):

    def __init__(self, input_channel, output_channel, kernel_size = 1, stride = 1, padding = 0):

        super().__init__()

        self.conv = nn.Conv2d(input_channel, output_channel, kernel_size, stride, padding)

    def forward(self, x):

        return self.conv(x)

In [66]:
class myActivation(nn.Module):

    def __init__(self):

        super().__init__()

        # ReLU activation function
        self.act = nn.ReLU()

    def forward(self, x):

        return self.act(x)

In [67]:
class myMaxPooling(nn.Module):

    def __init__(self, kernel_size = 2, stride = 2, padding = 0):

        super().__init__()

        # Max poling layer
        self.pool = nn.MaxPool2d(kernel_size, stride, padding)

    def forward(self, x):

        return self.pool(x)

In [68]:
class myAvgPooling(nn.Module):

    def __init__(self, kernel_size = 2, stride = 2, padding = 0):

        super().__init__()

        # Average poling layer
        self.pool = nn.AvgPool2d(kernel_size, stride, padding)

    def forward(self, x):

        return self.pool(x)

## Part 6

Please implement at least one of the following residual blocks.

![picture](https://drive.google.com/uc?id=1T-prdNyAWnS5qbmxTwt3d6-KxfijiqYW)

In [69]:
class myResBlock(nn.Module):
    def __init__(self, input_channel, med_channel, stride=1, padding=1):
        super(myResBlock, self).__init__()
        # First Conv -> BN -> ReLU
        self.conv1 = myConvolution(input_channel, med_channel, kernel_size=3, stride=stride, padding=padding)
        self.bn1 = myBatchNorm(med_channel)
        self.relu = myActivation()

        # Second Conv -> BN
        self.conv2 = myConvolution(med_channel, med_channel, kernel_size=3, stride=1, padding=padding)
        self.bn2 = myBatchNorm(med_channel)

        # Shortcut connection
        self.shortcut = nn.Sequential()
        if stride != 1 or input_channel != med_channel:
            # Adjusting the shortcut to match dimensions if needed
            self.shortcut = nn.Sequential(
                myConvolution(input_channel, med_channel, kernel_size=1, stride=stride),
                myBatchNorm(med_channel)
            )

    def forward(self, x):
        identity = self.shortcut(x)

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        # Adding the identity (skip connection)
        out += identity
        out = self.relu(out)

        return out


## Part 7

Please design your CNN architecture using the implementation of {myBatchNorm, myConvolution, myMaxPooling, myAvgPooling, myActivation, myResBlock}.

You have the flexibility to determine the number of layers, the number of hidden neurons in each layer, and the activation function of each layer to design your CNN. Please note that your score will depend on both the size and accuracy of your model.

In [70]:
class myCNN(nn.Module):
    def __init__(self, input_channel=3, num_classes=10):
        super(myCNN, self).__init__()
        # Define the sequence of ResBlocks and Pooling layers
        self.layer1 = myResBlock(input_channel, 128)
        self.pool1 = myMaxPooling(2, 2)
        self.layer2 = myResBlock(128, 64)
        self.pool2 = myMaxPooling(2, 2)
        self.layer3 = myResBlock(64, 32)
        self.pool3 = myMaxPooling(2, 2)
        self.layer4 = myResBlock(32, 128)
        self.pool4 = myMaxPooling(2, 2)
        self.layer5 = myResBlock(128, 256)
        self.pool5 = myMaxPooling(2, 2)
        self.layer6 = myResBlock(256, num_classes)
        self.avgpool = myAvgPooling((1, 1))  # Global average pooling to reduce spatial dimensions to 1x1

    def forward(self, x):
        x = self.layer1(x)
        x = self.pool1(x)
        x = self.layer2(x)
        x = self.pool2(x)
        x = self.layer3(x)
        x = self.pool3(x)
        x = self.layer4(x)
        x = self.pool4(x)
        x = self.layer5(x)
        x = self.pool5(x)
        x = self.layer6(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)

        return x

model = myCNN(input_channel, num_classes).to(device)

## Part 8

`DO NOT MODIFY`

Multiclass cross-entropy loss

In [71]:
criterion = nn.CrossEntropyLoss()

## Part 9

`DO NOT MODIFY`

Mini-batch SGD with momentum and weight decay.

In [72]:
optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9, weight_decay=0.0001)

## Part 10

`DO NOT MODIFY`

Model training

In [73]:
model.train()

for epoch in range(num_epoch):

    losses = []

    for batch_num, input_data in enumerate(train_loader):

        optimizer.zero_grad()

        x, y = input_data
        x = x.to(device).float()
        y = y.to(device)

        output = model(x)
        loss = criterion(output, y)
        loss.backward()
        losses.append(loss.item())

        optimizer.step()

        if batch_num % 500 == 0:
            print('\tEpoch %d | Batch %d | Loss %6.4f' % (epoch, batch_num, loss.item()))

    print('Epoch %d | Loss %6.4f' % (epoch, sum(losses)/len(losses)))

torch.save(model, student_id + '_submission.pt')

	Epoch 0 | Batch 0 | Loss 2.9537
	Epoch 0 | Batch 500 | Loss 1.5500
	Epoch 0 | Batch 1000 | Loss 1.3492
	Epoch 0 | Batch 1500 | Loss 1.2026
	Epoch 0 | Batch 2000 | Loss 1.4346
	Epoch 0 | Batch 2500 | Loss 1.0975
	Epoch 0 | Batch 3000 | Loss 1.0860
Epoch 0 | Loss 1.2908
	Epoch 1 | Batch 0 | Loss 0.2640
	Epoch 1 | Batch 500 | Loss 0.7613
	Epoch 1 | Batch 1000 | Loss 0.8320
	Epoch 1 | Batch 1500 | Loss 0.6086
	Epoch 1 | Batch 2000 | Loss 0.7496
	Epoch 1 | Batch 2500 | Loss 1.3592
	Epoch 1 | Batch 3000 | Loss 0.8054
Epoch 1 | Loss 0.8909
	Epoch 2 | Batch 0 | Loss 0.3259
	Epoch 2 | Batch 500 | Loss 0.8763
	Epoch 2 | Batch 1000 | Loss 0.6875
	Epoch 2 | Batch 1500 | Loss 1.0773
	Epoch 2 | Batch 2000 | Loss 0.7218
	Epoch 2 | Batch 2500 | Loss 0.9519
	Epoch 2 | Batch 3000 | Loss 0.2953
Epoch 2 | Loss 0.7291
	Epoch 3 | Batch 0 | Loss 0.5826
	Epoch 3 | Batch 500 | Loss 0.7251
	Epoch 3 | Batch 1000 | Loss 0.4232
	Epoch 3 | Batch 1500 | Loss 0.2890
	Epoch 3 | Batch 2000 | Loss 0.7856
	Epoch 3 | Bat

## Part 11

`DO NOT MODIFY`

Model evaluation

In [74]:
import csv
model.eval()

with open(student_id + '_submission.csv', 'w') as f:

    fieldnames = ['ImageId', 'Prediction', 'Label']

    writer = csv.DictWriter(f, fieldnames=fieldnames, lineterminator = '\n')
    writer.writeheader()

    correct = 0
    total = 0

    with torch.no_grad():

        for x, t in test_loader:

            x = x.to(device).float()
            output = model(x).argmax(dim=1)

            for y,l in zip(output, t):

                writer.writerow({fieldnames[0]: (total+1),
                                 fieldnames[1]: y.item(),
                                 fieldnames[2]: l.item()})

                total += 1
                if y.item() == l.item():
                    correct += 1

    print('Accuracy: %6.4f' % (correct / total))

Accuracy: 0.8102
