Name: Subrat Kishore Dutta  
Matrikelnummer:  7028082
Email:   subratkishoredutta1234@gmail.com,sudu00001@stud.uni-saarland.de
   
Name:   Prathvish Mithare
Matrikelnummer:   7028692
Email: prmi00001@stud.uni-saarland.de

#### Preamble

In [1]:
# TODO: Import necessary libraries
import torch
import torch.nn as nn
from torchvision import transforms
from torchvision import datasets
import torch.nn.functional as F
from tqdm import tqdm
from torchmetrics import F1Score
import matplotlib.pyplot as plt

In [2]:
torch.cuda.is_available()
device = torch.device("cpu")

# 7.5 Build your own regularized NN

In this exercise you get to use your previously built networks, but this time you need to add regularization in the form of dropout and $L_2$-regularization.

Each layer has the option of using dropout. Your code needs to allow for this flexibility.

Additionally, adding $L_2$-regularization should also be optional upon creation.

**NOTE**: You are allowed to use built-in functions from pytorch to incorporate this functionality.

### 7.5.1 Implement a regularized model (1 point)

Implement your own model (using `torch`) using the skeleton code provided.

In [4]:
class Model(nn.Module):
    """
    Implement a model that incorporates dropout and L2 regularization
    depending on arguments passed.
    
    Args:
    input_dim: dimensionality of the inputs
    hidden_dim: how many units each hidden layer will have
    out_dim: how many output units
    num_layers: how many hidden layers to create/use
    dropout: a list of booleans specifying which hidden layers will have dropout
    dropout_p: the probability used for the `Dropout` layers
    l2_reg: a boolean value that indicates whether L2 regularization should be used
    """
    # TODO: Implement
    def __init__(self, input_dim: int, hidden_dim: int, out_dim: int, num_layers: int, dropout: list, dropout_p: float,
                 l2_reg: bool):
        super(Model, self).__init__()
        self.input_dim = input_dim
        self.hidden_dim = hidden_dim
        self.out_dim = out_dim
        self.num_layers = num_layers
        self.dropout = dropout
        self.dropout_p = dropout_p
        self.l2_reg = l2_reg
        self.fci = nn.Linear(self.input_dim, self.hidden_dim)
        self.fch = nn.Linear(self.hidden_dim, self.hidden_dim)
        self.fco = nn.Linear(self.hidden_dim,self.out_dim)
        self.softmax = nn.Softmax()
        self.m = nn.Dropout(p=self.dropout_p)
        
        
    def __call__(self,x):
        out = self.fci(x)
        out = F.relu(out)
        #print(self.out)
        for num in range(self.num_layers):
            out = self.fch(out)
            out = F.relu(out)
            if self.dropout[num]:
                out = self.m(out)
        out = self.fco(out)
        #out = self.softmax(out)
        return out
    
    def train(self,train_loader,learning_rate=0.01,epochs=5,lam=0.01):
        loss_fn = nn.CrossEntropyLoss()
        optimizer = torch.optim.Adam(self.parameters(),lr=learning_rate)
        ep=[]
        lossrec=[]
        for epoch in range(epochs):
            tloss=0
            ep.append(epoch)
            for i,(xs,ys) in enumerate(train_loader):
                xs=xs.to(device)
                ys=ys.to(device)
                pred = self(xs.view(-1,28*28))
                if self.l2_reg:
                    a=torch.tensor(0.).to(device)
                    for p in self.parameters():
                        a+=torch.norm(p)
                    loss = loss_fn(pred,ys)+lam*a
                else:
                    loss = loss_fn(pred,ys)
                    
                optimizer.zero_grad()
                loss.backward()
                optimizer.step()
                tloss+=loss
            lossrec.append((tloss/len(train_loader)).log().item())
            print('epoch:',epoch,'loss:',(tloss/len(train_loader)).item())
        
        plt.figure(figsize=(10,10))
        plt.plot(ep,lossrec,color='orange')
        plt.xlabel('epochs')
        plt.ylabel('log loss')
        return lossrec
            
    def test(self,test_loader):
        loss_fn = nn.CrossEntropyLoss()
        loss=0
        for i,(xs,ys) in enumerate(test_loader):
            xs=xs.to(device)
            ys=ys.to(device)
            pred = self(xs.view(-1,28*28))
            loss+= loss_fn(pred,ys).item()
        print(loss/len(test_loader))
        return loss/len(test_loader)
            
    

In [5]:
##function to get accuracy:
def get_accuracy(data,model):
    accdata=torch.utils.data.DataLoader(data,batch_size=len(data))
    for X,Y in accdata:
        X=X.to(device)
        Y=Y.to(device)
        ypred=model(X.view(-1,28*28))
        train_acc = torch.sum(ypred.argmax(1) == Y)
        accuracy = train_acc/len(data)
    print(accuracy.item()*100,"%")

In [6]:
def get_f1(data,model):
    accdata=torch.utils.data.DataLoader(data,batch_size=len(data))
    for X,Y in accdata:
        X=X.to(device)
        Y=Y.to(device)
        f1= F1Score(task="multiclass", num_classes=10)
        ypred=model(X.view(-1,28*28))
        f1 = f1(ypred.argmax(1),Y)
    print("F1 score:",f1.item())

### 7.5.2 Experiment with your model (1 point)

Use the MNIST dataset and evaluation code from the previous assignment to run some experiments. Run the following experiments:

1. Shallow network (not more than 1 hidden layer)
1. Shallow regularized network
1. Deep network (at least 3 hidden layers)
1. Deep regularized network

Report Accuracy and $F_1$ metrics for your experiments and discuss your results. What did you expect to see and what did you end up seeing.

**NOTE**: You can choose how you use regularization. Ideally you would experiment with various parameters for this regularization, the 4 listed variants are merely what you must cover as a minimum. Report results for all your experiments concisely in a table.

**NOTE 2**: Make sure to report your metrics on the training and evaluation/heldout sets.

In [7]:
# Load the data
# DO NOT CHANGE THE CODE IN THIS CELL EXCEPT FOR THE BATCH SIZE IF NECESSARY
transform_fn = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.7,), (0.7,)),])

mnist_train = datasets.MNIST(root='./data', train=True, download=True, transform=transform_fn)
train_dl = torch.utils.data.DataLoader(mnist_train, batch_size=128, shuffle=True)

mnist_test = datasets.MNIST(root='./data', train=False, download=True, transform=transform_fn)
test_dl = torch.utils.data.DataLoader(mnist_test, batch_size=128, shuffle=False)

# Use the above data for your experiments

### Experiments:

#### 1.Choosing the correct learning rate

We took a shallow model to test a range of learning rates within a range and choose the best performing one going ahead to train our models.

In [None]:
lrs = 10**torch.linspace(-4,-2,50)
losses=[]
for lr in tqdm(lrs):
    print('lr=',lr)
    modellr = Model(28*28, 500, 10,1, [0,0,0,0,0],0,False)
    loss=modellr.train(train_dl,epochs = 10, learning_rate=lr,lam=0)
    loss=torch.tensor(loss).exp()
    losses.append(min(loss).item())

  0%|                                                                                           | 0/50 [00:00<?, ?it/s]

lr= tensor(1.0000e-04)
epoch: 0 loss: 0.7344396710395813
epoch: 1 loss: 0.3338369131088257
epoch: 2 loss: 0.2800363302230835
epoch: 3 loss: 0.2392803430557251
epoch: 4 loss: 0.20572535693645477
epoch: 5 loss: 0.1783585101366043
epoch: 6 loss: 0.15549670159816742
epoch: 7 loss: 0.1395590901374817
epoch: 8 loss: 0.12360269576311111
epoch: 9 loss: 0.11185803264379501


  2%|█▌                                                                              | 1/50 [02:26<1:59:28, 146.29s/it]

lr= tensor(0.0001)
epoch: 0 loss: 0.7074087262153625
epoch: 1 loss: 0.325470894575119
epoch: 2 loss: 0.26737648248672485
epoch: 3 loss: 0.22245457768440247
epoch: 4 loss: 0.19126781821250916
epoch: 5 loss: 0.163621187210083
epoch: 6 loss: 0.14394983649253845
epoch: 7 loss: 0.12815718352794647
epoch: 8 loss: 0.11349968612194061
epoch: 9 loss: 0.10290881246328354


  4%|███▏                                                                            | 2/50 [04:55<1:57:38, 147.04s/it]

lr= tensor(0.0001)
epoch: 0 loss: 0.6690120100975037
epoch: 1 loss: 0.31775954365730286
epoch: 2 loss: 0.26141199469566345
epoch: 3 loss: 0.21578697860240936
epoch: 4 loss: 0.18142224848270416
epoch: 5 loss: 0.15617986023426056
epoch: 6 loss: 0.13621804118156433
epoch: 7 loss: 0.12166616320610046
epoch: 8 loss: 0.10664428770542145
epoch: 9 loss: 0.09684914350509644


  6%|████▊                                                                           | 3/50 [07:38<1:59:07, 152.07s/it]

lr= tensor(0.0001)
epoch: 0 loss: 0.6618608832359314
epoch: 1 loss: 0.3155348002910614
epoch: 2 loss: 0.2544049024581909
epoch: 3 loss: 0.20766378939151764
epoch: 4 loss: 0.175788015127182
epoch: 5 loss: 0.14920586347579956
epoch: 6 loss: 0.13078856468200684
epoch: 7 loss: 0.11626923829317093
epoch: 8 loss: 0.10357780754566193
epoch: 9 loss: 0.09434712678194046


  8%|██████▍                                                                         | 4/50 [10:17<1:58:05, 154.04s/it]

lr= tensor(0.0001)
epoch: 0 loss: 0.6352406740188599
epoch: 1 loss: 0.30909615755081177
epoch: 2 loss: 0.24739566445350647
epoch: 3 loss: 0.1991301029920578
epoch: 4 loss: 0.16666141152381897
epoch: 5 loss: 0.14313487708568573
epoch: 6 loss: 0.12389469146728516
epoch: 7 loss: 0.10982787609100342
epoch: 8 loss: 0.09792826324701309
epoch: 9 loss: 0.08667797595262527


 10%|████████                                                                        | 5/50 [12:43<1:53:39, 151.55s/it]

lr= tensor(0.0002)
epoch: 0 loss: 0.6137754321098328
epoch: 1 loss: 0.29647642374038696
epoch: 2 loss: 0.23391462862491608
epoch: 3 loss: 0.19154486060142517
epoch: 4 loss: 0.16032056510448456
epoch: 5 loss: 0.13575603067874908
epoch: 6 loss: 0.1181708574295044
epoch: 7 loss: 0.10292201489210129
epoch: 8 loss: 0.09188222885131836
epoch: 9 loss: 0.0810440182685852


 12%|█████████▌                                                                      | 6/50 [15:18<1:51:55, 152.63s/it]

lr= tensor(0.0002)
epoch: 0 loss: 0.5994277596473694
epoch: 1 loss: 0.3022109270095825
epoch: 2 loss: 0.2426067739725113
epoch: 3 loss: 0.19822770357131958
epoch: 4 loss: 0.16294272243976593
epoch: 5 loss: 0.13868972659111023
epoch: 6 loss: 0.11935918033123016
epoch: 7 loss: 0.10501895844936371
epoch: 8 loss: 0.09305837750434875


In [None]:
## lr vs losses:
optimal_lr = lrs[losses.index(min(losses))]
plt.figure(figsize=(10,10))
plt.plot(lrs,torch.tensor(losses))
plt.xlabel('learning rates')
plt.ylabel('exp(losses)')
plt.axvline(x=optimal_lr,color='orange',linestyle='--',alpha=0.5)

In [None]:
print('optimal learning rate: \n',optimal_lr.item())
print('least training loss',min(losses))

#### 2. correct regularization coefficient (lambda) 

Taking the optimal learning rate we now move on with similar experimental setup to find the optimal regularisation term.

In [None]:
lambdas = 10**torch.linspace(-4,-2,50)
losses=[]
for lam in tqdm(lambdas):
    modellam = Model(28*28, 500, 10,1, [0,0,0,0,0],0,True)
    modellam.train(train_dl,epochs = 10, learning_rate=optimal_lr.item(),lam=lam)
    loss = modellam.test(test_dl)
    losses.append(loss)

In [None]:
## lr vs losses:
optimal_lam = lambdas[losses.index(min(losses))]
plt.figure(figsize=(10,10))
plt.plot(lambdas,losses)
plt.xlabel('lambdas')
plt.ylabel('log(losses)')
plt.axvline(x=optimal_lam,color='orange',linestyle='--',alpha=0.5)

In [None]:
print(f'optimal lambda: \n{optimal_lam.item():.7f}')
print(f'least training loss: {min(losses):.4f}')

#### 3. Shallow network without regularisation

In [None]:
#shallow non regularised
model1 = Model(28*28, 500, 10,1, [0,0,0,0,0],0,False)
model1.to(device)
model1.train(train_dl,epochs = 100, learning_rate=optimal_lr,lam=optimal_lam)
print("Loss on test:")
model1.test(test_dl)
print("Training accuracy:")
get_accuracy(mnist_train,model1)
print("Test accuracy:")
get_accuracy(mnist_test,model1)
print("Training f1 score:")
get_f1(mnist_train,model1)
print("Test f1 score:")
get_f1(mnist_test,model1)

#### 4. Shallow network with regularisation

In [None]:
model2 = Model(28*28, 500, 10,1, [1,1,1,1,1],0.2,True)
model2.to(device)
model2.train(train_dl,epochs = 100, learning_rate=optimal_lr,lam=optimal_lam)
print("Loss on test:")
model2.test(test_dl)
print("Training accuracy:")
get_accuracy(mnist_train,model2)
print("Test accuracy:")
get_accuracy(mnist_test,model2)
print("Training f1 score:")
get_f1(mnist_train,model2)
print("Test f1 score:")
get_f1(mnist_test,model2)

#### 5. Deep network without regularisation

In [None]:
model3 = Model(28*28, 500, 10,5, [0,0,0,0,0],0,False)
model3.to(device)
model3.train(train_dl,epochs = 100, learning_rate=optimal_lr,lam=optimal_lam)
print("Loss on test:")
model3.test(test_dl)
print("Training accuracy:")
get_accuracy(mnist_train,model3)
print("Test accuracy:")
get_accuracy(mnist_test,model3)
print("Training f1 score:")
get_f1(mnist_train,model3)
print("Test f1 score:")
get_f1(mnist_test,model3)

#### 6. Deep network with regularisation

In [None]:
model4 = Model(28*28, 500, 10,5, [1,1,1,1,1],0.05,True)
model4.to(device)
model4.train(train_dl,epochs = 5, learning_rate=optimal_lr,lam=optimal_lam)
print("Loss on test:")
model4.test(test_dl)
print("Training accuracy:")
get_accuracy(mnist_train,model4)
print("Test accuracy:")
get_accuracy(mnist_test,model4)
print("Training f1 score:")
get_f1(mnist_train,model4)
print("Test f1 score:")
get_f1(mnist_test,model4)

In [None]:
get_f1(mnist_test)

### 7.5.3 Get the best model! (1 + 1 point (bonus))

* Present your model during a tutorial session. Justify your decisions when designing your model/solution.
* If you achieve one of the top N results, you get yet another extra point!

In [None]:
model(xs[15].view(-1,28*28)).argmax()

In [None]:
plt.imshow(xs[15][0])
print(ys[15])