Name:   Anh Tuan Tran
Matrikelnummer:  7015463
Email:   antr00001@stud.uni-saarland.de
   
Name:   Deborah Dormah Kanubala
Matrikelnummer:   7025906
Email:  dkanubala@aimsammi.org

Name:    Irem Begüm Gündüz
Matrikelnummer:     7026821
Email: irgu00001@stud.uni-saarland.de

#### Preamble

In [44]:
# TODO: Import necessary libraries
import torch
from torchvision import transforms
from torchvision import datasets

import torch.nn as nn
import torchvision.datasets as dsets
import torch.nn.functional as F

from sklearn.metrics import f1_score

import numpy as np

# 7.5 Build your own regularized NN

In this exercise you get to use your previously built networks, but this time you need to add regularization in the form of dropout and $L_2$-regularization.

Each layer has the option of using dropout. Your code needs to allow for this flexibility.

Additionally, adding $L_2$-regularization should also be optional upon creation.

**NOTE**: You are allowed to use built-in functions from pytorch to incorporate this functionality.

### 7.5.1 Implement a regularized model (1 point)

Implement your own model (using `torch`) using the skeleton code provided.

In [33]:
class Model(nn.Module):
    """
    Implement a model that incorporates dropout and L2 regularization
    depending on arguments passed.
    
    Args:
    input_dim: dimensionality of the inputs
    hidden_dim: how many units each hidden layer will have
    out_dim: how many output units
    num_layers: how many hidden layers to create/use
    dropout: a list of booleans specifying which hidden layers will have dropout
    dropout_p: the probability used for the `Dropout` layers
    l2_reg: a boolean value that indicates whether L2 regularization should be used
    """
    def __init__(self,
                 input_dim: int,
                 hidden_dim: int,
                 out_dim: int,
                 num_layers: int,
                 dropout: list,
                 dropout_p: float,
                 l2_reg: bool):
        super(Model, self).__init__()
        # Using l2 regularization or not (will be used when calculating loss)
        self.l2_reg = l2_reg
        self.input_dim = input_dim
        
        # hidden layers
        layers = []
        for i in range (num_layers):
            if i == 0:
                _in_dim = input_dim
            else:
                _in_dim = hidden_dim
            _out_dim = hidden_dim
            layers.append(nn.Linear(_in_dim, _out_dim, bias=True))
            layers.append(nn.ReLU())
            if dropout[i]:
                layers.append(nn.Dropout(p=dropout_p))
        
        # output layer
        layers.append(nn.Linear(hidden_dim, out_dim))
        self.layers = nn.Sequential(*layers)
    
    def forward (self, inp):
        inp = inp.view (-1, self.input_dim)
        return self.layers(inp)

### 7.5.2 Experiment with your model (1 point)

Use the MNIST dataset and evaluation code from the previous assignment to run some experiments. Run the following experiments:

1. Shallow network (not more than 1 hidden layer)
1. Shallow regularized network
1. Deep network (at least 3 hidden layers)
1. Deep regularized network

Report Accuracy and $F_1$ metrics for your experiments and discuss your results. What did you expect to see and what did you end up seeing.

**NOTE**: You can choose how you use regularization. Ideally you would experiment with various parameters for this regularization, the 4 listed variants are merely what you must cover as a minimum. Report results for all your experiments concisely in a table.

**NOTE 2**: Make sure to report your metrics on the training and evaluation/heldout sets.

In [6]:
# Load the data
# DO NOT CHANGE THE CODE IN THIS CELL EXCEPT FOR THE BATCH SIZE IF NECESSARY
transform_fn = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.7,), (0.7,)),])

mnist_train = datasets.MNIST(root='./data', train=True, download=True, transform=transform_fn)
train_dl = torch.utils.data.DataLoader(mnist_train, batch_size=32, shuffle=True)

mnist_test = datasets.MNIST(root='./data', train=False, download=True, transform=transform_fn)
test_dl = torch.utils.data.DataLoader(mnist_test, batch_size=32, shuffle=False)

# Use the above data for your experiments

2.2%

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz


31.0%IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)

79.2%IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)

100.0%


Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw


102.8%


Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz
Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz



100.0%


Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz


112.7%

Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw






In [75]:
def train_loop(data, model, loss_fn, l2_weight=1e-2, num_epoch=4, learning_rate=1e-2):  
    if model.l2_reg:
        optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate, weight_decay=l2_weight)
    else:
        optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
    model.train ()
    losses = []
    for i_epoch in range (num_epoch):
        epoch_losses = []
        for batch in  iter(data):
            x, y = batch
            optimizer.zero_grad()
            y_pred = model(x)
            loss = loss_fn (y_pred, y)
            loss.backward()
            optimizer.step()
            epoch_losses.append (loss.detach ().numpy ())
#         if i_epoch % 100 == 0:
        print ("epoch:", i_epoch, "mean epoch loss:", np.mean (epoch_losses))
        losses.append (np.mean (epoch_losses))
    return losses

def evaluate_loop(data, model, loss_fn, set_name='Test'):
    model.eval()
    size = len(data)
    test_loss, correct = 0, 0
    i = 0
    
    TP, FP, FN = 0, 0, 0
    
    preds = []
    gts = []
    
    with torch.no_grad():
        for batch in iter(data):
            
            X, y = batch
            i += int(y.size(0))
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).sum().item()
            pred = pred.argmax(1).numpy().tolist ()
            preds.extend (pred)
            gts.extend (y.numpy ().tolist ())
            
    test_loss /= i
    correct /= i
    f1_scores = f1_score (preds, gts, average='micro')
    
    print(set_name + f" Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f}")
    print (set_name + " f1 scores: ", f1_scores)
    

In [68]:
input_dim = 28 * 28
output_dim = 10
loss_fn = nn.CrossEntropyLoss()
learning_rate = 1e-2
hidden_dim = 128
l2_weight = 0.01

#### Shallow network (not use L2 regularization and dropout)

In [69]:
torch.manual_seed (10)

num_layers = 1
dropout = [False] * num_layers
dropout_p = 0.2
l2_reg = False

model = Model(input_dim, hidden_dim, output_dim, num_layers, dropout, dropout_p, l2_reg)
train_loop(train_dl, model, loss_fn, num_epoch=10)
evaluate_loop(test_dl, model, loss_fn)
evaluate_loop(train_dl, model, loss_fn, "Train")

epoch: 0 mean epoch loss: 0.5171073
epoch: 1 mean epoch loss: 0.39869726
epoch: 2 mean epoch loss: 0.36913988
epoch: 3 mean epoch loss: 0.36770475
epoch: 4 mean epoch loss: 0.35216275
epoch: 5 mean epoch loss: 0.34529477
epoch: 6 mean epoch loss: 0.3470298
epoch: 7 mean epoch loss: 0.33369783
epoch: 8 mean epoch loss: 0.32655314
epoch: 9 mean epoch loss: 0.33219647
TestAccuracy: 91.0%, Avg loss: 0.010272
Test f1 scores:  0.9098
TrainAccuracy: 91.0%, Avg loss: 0.009636
Train f1 scores:  0.9100666666666667


#### Shallow network (use L2 regularization and dropout for all layers)

In [70]:
torch.manual_seed (10)

num_layers = 1
dropout = [True] * num_layers
dropout_p = 0.2
l2_reg = True

model = Model(input_dim, hidden_dim, output_dim, num_layers, dropout, dropout_p, l2_reg)
train_loop(train_dl, model, loss_fn, num_epoch=10)
evaluate_loop(test_dl, model, loss_fn)
evaluate_loop(train_dl, model, loss_fn, "Train")

epoch: 0 mean epoch loss: 1.1116096
epoch: 1 mean epoch loss: 1.2757297
epoch: 2 mean epoch loss: 1.2971585
epoch: 3 mean epoch loss: 1.2848763
epoch: 4 mean epoch loss: 1.2920935
epoch: 5 mean epoch loss: 1.3280604
epoch: 6 mean epoch loss: 1.4148575
epoch: 7 mean epoch loss: 1.3138914
epoch: 8 mean epoch loss: 1.4355352
epoch: 9 mean epoch loss: 1.3010944
TestAccuracy: 77.8%, Avg loss: 0.030824
Test f1 scores:  0.7781
TrainAccuracy: 76.7%, Avg loss: 0.031459
Train f1 scores:  0.7670166666666667


#### Deep network (3 hidden layers, not use L2 regularization and dropout)

In [85]:
torch.manual_seed (10)

num_layers = 3
dropout = [False] * num_layers
dropout_p = 0.2
l2_reg = False

model = Model(input_dim, hidden_dim, output_dim, num_layers, dropout, dropout_p, l2_reg)
train_loop(train_dl, model, loss_fn, num_epoch=10, learning_rate=1e-3)
evaluate_loop(test_dl, model, loss_fn)
evaluate_loop(train_dl, model, loss_fn, "Train")

epoch: 0 mean epoch loss: 0.42152494
epoch: 1 mean epoch loss: 0.2077022
epoch: 2 mean epoch loss: 0.16131927
epoch: 3 mean epoch loss: 0.13877788
epoch: 4 mean epoch loss: 0.12274186
epoch: 5 mean epoch loss: 0.10574757
epoch: 6 mean epoch loss: 0.096944235
epoch: 7 mean epoch loss: 0.08912283
epoch: 8 mean epoch loss: 0.0848133
epoch: 9 mean epoch loss: 0.07850643
Test Accuracy: 96.9%, Avg loss: 0.003288
Test f1 scores:  0.9693
Train Accuracy: 98.2%, Avg loss: 0.001741
Train f1 scores:  0.9815166666666667


#### Deep network (3 hidden layers, use L2 regularization and dropout for all layers)

In [86]:
torch.manual_seed (10)

num_layers = 3
dropout = [True] * num_layers
dropout_p = 0.2
l2_reg = True

model = Model(input_dim, hidden_dim, output_dim, num_layers, dropout, dropout_p, l2_reg)
train_loop(train_dl, model, loss_fn, num_epoch=10, learning_rate=1e-3)
evaluate_loop(test_dl, model, loss_fn)
evaluate_loop(train_dl, model, loss_fn, "Train")

epoch: 0 mean epoch loss: 0.6207891
epoch: 1 mean epoch loss: 0.44457653
epoch: 2 mean epoch loss: 0.42135078
epoch: 3 mean epoch loss: 0.41527197
epoch: 4 mean epoch loss: 0.4058945
epoch: 5 mean epoch loss: 0.40442845
epoch: 6 mean epoch loss: 0.40433857
epoch: 7 mean epoch loss: 0.40379792
epoch: 8 mean epoch loss: 0.40211317
epoch: 9 mean epoch loss: 0.4072769
Test Accuracy: 92.7%, Avg loss: 0.007682
Test f1 scores:  0.9273
Train Accuracy: 92.7%, Avg loss: 0.007810
Train f1 scores:  0.9268166666666666


#### Deep network (5 hidden layers, not use L2 regularization and dropout)

In [83]:
torch.manual_seed (10)

num_layers = 5
dropout = [False] * num_layers
dropout_p = 0.2
l2_reg = False

model = Model(input_dim, hidden_dim, output_dim, num_layers, dropout, dropout_p, l2_reg)
train_loop(train_dl, model, loss_fn, num_epoch=10, learning_rate=1e-3)
evaluate_loop(test_dl, model, loss_fn)
evaluate_loop(train_dl, model, loss_fn, "Train")

epoch: 0 mean epoch loss: 0.48359734
epoch: 1 mean epoch loss: 0.23092112
epoch: 2 mean epoch loss: 0.17692493
epoch: 3 mean epoch loss: 0.1572411
epoch: 4 mean epoch loss: 0.13244267
epoch: 5 mean epoch loss: 0.12145918
epoch: 6 mean epoch loss: 0.11138751
epoch: 7 mean epoch loss: 0.096630335
epoch: 8 mean epoch loss: 0.09768703
epoch: 9 mean epoch loss: 0.0872717
Test Accuracy: 96.2%, Avg loss: 0.004154
Test f1 scores:  0.9617
Train Accuracy: 97.3%, Avg loss: 0.002636
Train f1 scores:  0.9727333333333333


#### Deep network (5 hidden layers, use dropout for all layers)

In [82]:
torch.manual_seed (10)

num_layers = 5
dropout = [True] * num_layers
dropout_p = 0.2
l2_reg = False

model = Model(input_dim, hidden_dim, output_dim, num_layers, dropout, dropout_p, l2_reg)
train_loop(train_dl, model, loss_fn, num_epoch=10, learning_rate=1e-3)
evaluate_loop(test_dl, model, loss_fn)
evaluate_loop(train_dl, model, loss_fn, "Train")

epoch: 0 mean epoch loss: 0.7416826
epoch: 1 mean epoch loss: 0.43363568
epoch: 2 mean epoch loss: 0.3850245
epoch: 3 mean epoch loss: 0.361695
epoch: 4 mean epoch loss: 0.33900133
epoch: 5 mean epoch loss: 0.31971908
epoch: 6 mean epoch loss: 0.3133735
epoch: 7 mean epoch loss: 0.30384713
epoch: 8 mean epoch loss: 0.29254916
epoch: 9 mean epoch loss: 0.29017526
Test Accuracy: 95.0%, Avg loss: 0.005383
Test f1 scores:  0.9501
Train Accuracy: 95.2%, Avg loss: 0.005090
Train f1 scores:  0.9520166666666665


#### Deep network (5 hidden layers, use dropout for first hidden layers)

In [80]:
torch.manual_seed (10)

num_layers = 5
dropout = [False] * num_layers
dropout[0] = True
dropout_p = 0.2
l2_reg = False

model = Model(input_dim, hidden_dim, output_dim, num_layers, dropout, dropout_p, l2_reg)
train_loop(train_dl, model, loss_fn, num_epoch=10, learning_rate=1e-3)
evaluate_loop(test_dl, model, loss_fn)
evaluate_loop(train_dl, model, loss_fn, "Train")

epoch: 0 mean epoch loss: 0.62281764
epoch: 1 mean epoch loss: 0.33964366
epoch: 2 mean epoch loss: 0.29113665
epoch: 3 mean epoch loss: 0.26444286
epoch: 4 mean epoch loss: 0.24793516
epoch: 5 mean epoch loss: 0.23228389
epoch: 6 mean epoch loss: 0.22169606
epoch: 7 mean epoch loss: 0.20569672
epoch: 8 mean epoch loss: 0.20357373
epoch: 9 mean epoch loss: 0.19426337
Test Accuracy: 95.8%, Avg loss: 0.004240
Test f1 scores:  0.9576
Train Accuracy: 96.3%, Avg loss: 0.003743
Train f1 scores:  0.9627666666666667


#### Deep network (5 hidden layers, not use L2 regularization)

In [84]:
torch.manual_seed (10)

num_layers = 5
dropout = [False] * num_layers
dropout_p = 0.2
l2_reg = True

model = Model(input_dim, hidden_dim, output_dim, num_layers, dropout, dropout_p, l2_reg)
train_loop(train_dl, model, loss_fn, num_epoch=10, learning_rate=1e-3)
evaluate_loop(test_dl, model, loss_fn)
evaluate_loop(train_dl, model, loss_fn, "Train")

epoch: 0 mean epoch loss: 1.1919228
epoch: 1 mean epoch loss: 0.53703046
epoch: 2 mean epoch loss: 0.45945904
epoch: 3 mean epoch loss: 0.42421094
epoch: 4 mean epoch loss: 0.4101319
epoch: 5 mean epoch loss: 0.3975013
epoch: 6 mean epoch loss: 0.39467883
epoch: 7 mean epoch loss: 0.3911738
epoch: 8 mean epoch loss: 0.38654754
epoch: 9 mean epoch loss: 0.3843605
Test Accuracy: 89.6%, Avg loss: 0.011126
Test f1 scores:  0.8957
Train Accuracy: 89.9%, Avg loss: 0.011001
Train f1 scores:  0.89875


### 7.5.3 Get the best model! (1 + 1 point (bonus))

* Present your model during a tutorial session. Justify your decisions when designing your model/solution.
* If you achieve one of the top N results, you get yet another extra point!