# Assignment 1 -- MLP

## Goal

1. Learn to design a MLP architecture.

2. Learn to implement activation functions.

3. Learn to implement a loss function.

4. Learn to implement optimizers and adjust hyper-parameters.


## Score

1. MLP 15%

2. Activation function 15%

3. Loss function 15%

4. Optimizer 15%

6. Model size 15%:

* 10%: If your model (the number of parameters) is smaller than 2MB, you will get 10%. Otherwise, no points will be awarded.
* 5%:  The remaining 5% will depend on your ranking within the class.

7. Model accuracy 15%:

* 10%: If your accuracy is higher than 85%, you will get 10%. Otherwise, no points will be awarded.
* 5%:  The remaining 5% will depend on your ranking within the class.

8. Model accuracy on another dataset 10%: it will depand on your ranking within the class.

## Task

Please use Google Colab for implementation.

In the following instuction, please design a MLP to label images from Fashion-MNIST. Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes.

#### Labels

Each training and test example is assigned to one of the following labels:

* 0 T-shirt/top
* 1 Trouser
* 2 Pullover
* 3 Dress
* 4 Coat
* 5 Sandal
* 6 Shirt
* 7 Sneaker
* 8 Bag
* 9 Ankle boot

You can find more information on https://www.tensorflow.org/datasets/catalog/fashion_mnist.

## Rule

1. Please do NOT call any existing library for your implementations.
2. Please do NOT attempt to modify the sections `DO NOT MODIFY`.

## Submission

Upload your files to NTU Cool.
* This .ipynb file: Please rename this file with the format (DL_HW1_StudentID.ipynb)
* Model : .pt file
* Output: .csv file

Deadline: 3/25 midnight (23:59)

Please fill your student ID number below.

In [None]:
# Please fill your student ID number
student_id = 'xxxxx'

## Part 1

`DO NOT MODIFY`

Import the necessary libraries


In [None]:
# Model
import torch
import torch.nn as nn

# Dataset
from torchvision import datasets
from torch.utils.data.dataset import Dataset
from torch.utils.data import DataLoader
from scipy.io import loadmat

# Optimizer
from torch.optim.optimizer import Optimizer

# Pre-processing
import torchvision.transforms as trns
from PIL import Image

## Part 2

`DO NOT MODIFY`

Global variables.

Please keep these hyper-parameters unchange.

In [None]:
batch_size = 16
num_classes = 10
input_dim = (28 * 28)
num_epoch = 10
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

## Part 3

`DO NOT MODIFY`

Create dataloader with pre-processing of dataset

In [None]:
# Create train/test transforms
train_transform = trns.Compose([
    trns.ToTensor(),
    trns.Lambda(lambda x: torch.flatten(x)),
])

test_transform = trns.Compose([
    trns.ToTensor(),
    trns.Lambda(lambda x: torch.flatten(x)),
])

# Create train/test datasets with pre-processing
# The dataset will automatic download if does not exist
data_train = datasets.FashionMNIST(root='./dataset/', train=True, transform=train_transform, download=True)
data_test = datasets.FashionMNIST(root='./dataset/', train=False, transform=test_transform, download=True)

# Create train/test dataloader for datasets with  pre-processing
train_loader = DataLoader(data_train, batch_size=batch_size, shuffle=True)
test_loader  = DataLoader(data_test,  batch_size=batch_size, shuffle=False)

## Part 4

Please implement at least one of following activation functions (15%), and use it to build your MLP.

![picture](https://drive.google.com/uc?id=1IemALenuP0kuxEyAgfPTT0JqtVaIUbra)

In [None]:
class myActivation(nn.Module):

    def __init__(self):

        super().__init__()

    def forward(self, x):

        # example: identity
        out = x

        return out

Please design you MLP architecture (15%).

You can decide the number of layers, the number of hidden neurons of each layer, and the activation function of each layer.
Please notice that your score also depands on both the size of your model (the number of parameters) and the accuracy of your model.

In [None]:
class myMLP(nn.Module):

    def __init__(self, input_dim, num_classes):

        super(myMLP, self).__init__()

        # example: 2 hidden layers, 1 output layer MLP
        self.mlp = nn.Sequential(
            nn.Linear(input_dim, 512),
            myActivation(),
            nn.Linear(512, 512),
            myActivation(),
            nn.Linear(512, num_classes)
        )

    def forward(self, x):

        out = self.mlp(x)

        return out

model = myMLP(input_dim, num_classes).to(device)

## Part 5

Please implement the multiclass cross-entropy loss (15%).

![picture](https://drive.google.com/uc?id=1MBwkcvt5thDpc7L8J6htoGEdy0yc2_PN)

In [None]:
class myLoss(nn.Module):

    def __init__(self):

        super(myLoss, self).__init__()

        self.softmax = nn.Softmax(dim=1)

    def forward(self, outputs, targets):

        # Transform targets to one-hot vector
        # ex.
        # 0 => [1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
        # 3 => [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
        targets_onehot = torch.zeros_like(outputs)
        targets_onehot.zero_()
        targets_onehot.scatter_(1, targets.unsqueeze(-1), 1)

        # example: mean square error loss
        loss = (targets_onehot.float() - outputs) ** 2

        return torch.mean(loss)

criterion = myLoss()

## Part 6

Please implement mini-batch SGD with momentum and weight decay (15%).

![picture](https://drive.google.com/uc?id=1TsnWXvJjjsLZX4apuCVKHewzBA35UP_-)

In [None]:
class myOptimizer(Optimizer):

    def __init__(self, params, lr=0.01, momentum=0.0, weight_decay=0.0):

        defaults = dict(lr=lr, momentum=momentum, weight_decay=weight_decay)
        super(myOptimizer, self).__init__(params, defaults)

    def step(self, closure=None):

        loss = None

        if closure is not None:
            loss = closure()

        for group in self.param_groups:

            for p in group['params']:

                if p.grad is None:
                    continue

                # p.data: weight
                # p.grad.data: gradient

                # example: mini-batch SGD
                # new_weight = old_weight + ( -lr * gradient)
                p.data.add_(p.grad.data, alpha=-group['lr'])

                # hint: momentum need to store additional state, for example 'm_state'.
                # following code show how to add new state into the optimizer and how to get stored state

                # # add new state
                # param_state = self.state[p]
                # if 'm_state' not in param_state:
                #        param_state['m_state'] = xxx

                # # get stored state
                # xxx = param_state['m_state']

        return loss

optimizer = myOptimizer(model.parameters(), 0.01)

## Part 7

`DO NOT MODIFY`

Model training

In [None]:
model.train()

for epoch in range(num_epoch):

    losses = []

    for batch_num, input_data in enumerate(train_loader):

        optimizer.zero_grad()

        x, y = input_data
        x = x.to(device).float()
        y = y.to(device)

        output = model(x)
        loss = criterion(output, y)
        loss.backward()
        losses.append(loss.item())

        optimizer.step()

        if batch_num % 500 == 0:
            print('\tEpoch %d | Batch %d | Loss %6.4f' % (epoch, batch_num, loss.item()))

    print('Epoch %d | Loss %6.4f' % (epoch, sum(losses)/len(losses)))

torch.save(model, student_id + '_submission.pt')

## Part 8

`DO NOT MODIFY`

Model evaluation

In [None]:
import csv
model.eval()

with open(student_id + '_submission.csv', 'w') as f:

    fieldnames = ['ImageId', 'Prediction', 'Label']

    writer = csv.DictWriter(f, fieldnames=fieldnames, lineterminator = '\n')
    writer.writeheader()

    correct = 0
    total = 0

    with torch.no_grad():

        for x, t in test_loader:

            x = x.to(device).float()
            output = model(x).argmax(dim=1)

            for y,l in zip(output, t):

                writer.writerow({fieldnames[0]: (total+1),
                                 fieldnames[1]: y.item(),
                                 fieldnames[2]: l.item()})

                total += 1
                if y.item() == l.item():
                    correct += 1

    print('Accuracy: %6.4f' % (correct / total))