## Lab 2
### Part 2: Dealing with overfitting

Today we work with [Fashion-MNIST dataset](https://github.com/zalandoresearch/fashion-mnist) (*hint: it is available in `torchvision`*).

Your goal for today:
1. Train a FC (fully-connected) network that achieves >= 0.885 test accuracy.
2. Cause considerable overfitting by modifying the network (e.g. increasing the number of network parameters and/or layers) and demonstrate in in the appropriate way (e.g. plot loss and accurasy on train and validation set w.r.t. network complexity).
3. Try to deal with overfitting (at least partially) by using regularization techniques (Dropout/Batchnorm/...) and demonstrate the results.

__Please, write a small report describing your ideas, tries and achieved results in the end of this file.__

*Note*: Tasks 2 and 3 are interrelated, in task 3 your goal is to make the network from task 2 less prone to overfitting. Task 1 is independent from 2 and 3.

*Note 2*: We recomment to use Google Colab or other machine with GPU acceleration.

In [None]:
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
import torchsummary
from IPython.display import clear_output
from matplotlib import pyplot as plt
from matplotlib.pyplot import figure
import numpy as np
import os


device = 'cuda:0' if torch.cuda.is_available() else 'cpu'

In [None]:
# Technical function
def mkdir(path):
    if not os.path.exists(root_path):
        os.mkdir(root_path)
        print('Directory', path, 'is created!')
    else:
        print('Directory', path, 'already exists!')

root_path = 'fmnist'
mkdir(root_path)

Directory fmnist already exists!


In [None]:
download = True
train_transform = transforms.ToTensor()
test_transform = transforms.ToTensor()
transforms.Compose((transforms.ToTensor()))


fmnist_dataset_train = torchvision.datasets.FashionMNIST(root_path,
                                                        train=True,
                                                        transform=train_transform,
                                                        target_transform=None,
                                                        download=download)
fmnist_dataset_test = torchvision.datasets.FashionMNIST(root_path,
                                                       train=False,
                                                       transform=test_transform,
                                                       target_transform=None,
                                                       download=download)

In [None]:
train_loader = torch.utils.data.DataLoader(fmnist_dataset_train,
                                           batch_size=128,
                                           shuffle=True,
                                           num_workers=2)
test_loader = torch.utils.data.DataLoader(fmnist_dataset_test,
                                          batch_size=256,
                                          shuffle=False,
                                          num_workers=2)

In [None]:
len(fmnist_dataset_test)

10000

In [None]:
for img, label in train_loader:
    print(img.shape)
#     print(img)
    print(label.shape)
    print(label.size(0))
    break

torch.Size([128, 1, 28, 28])
torch.Size([128])
128


### Task 1
Train a network that achieves $\geq 0.885$ test accuracy. It's fine to use only Linear (`nn.Linear`) layers and activations/dropout/batchnorm. Convolutional layers might be a great use, but we will meet them a bit later.

In [None]:
class TinyNeuralNetwork(nn.Module):
    def __init__(self, input_shape=28*28, num_classes=10, input_channels=1):
        super(self.__class__, self).__init__()
        self.model = nn.Sequential(
            nn.Flatten(), # This layer converts image into a vector to use Linear layers afterwards
            # Your network structure comes here
            nn.Linear(input_shape, 512),
            nn.ReLU(),
            nn.Linear(512, 128),
            nn.ReLU(),
            nn.Linear(128, num_classes)
        )

    def forward(self, inp):
        out = self.model(inp)
        return out

In [None]:
torchsummary.summary(TinyNeuralNetwork().to(device), (28*28,))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
           Flatten-1                  [-1, 784]               0
            Linear-2                  [-1, 512]         401,920
              ReLU-3                  [-1, 512]               0
            Linear-4                  [-1, 128]          65,664
              ReLU-5                  [-1, 128]               0
            Linear-6                   [-1, 10]           1,290
Total params: 468,874
Trainable params: 468,874
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.02
Params size (MB): 1.79
Estimated Total Size (MB): 1.81
----------------------------------------------------------------


Your experiments come here:

In [None]:
model = TinyNeuralNetwork().to(device)
opt = torch.optim.Adam(model.parameters(), lr=0.001)
loss_func = nn.CrossEntropyLoss()

num_epochs = 15

for epoch in range(num_epochs):
    model.train()
    for X_batch, y_batch in train_loader:
        X_batch, y_batch = X_batch.to(device), y_batch.to(device)

        opt.zero_grad()
        logits = model(X_batch)
        loss = loss_func(logits, y_batch)
        loss.backward()
        opt.step()

    model.eval()
    total_correct = 0
    total_samples = 0
    with torch.no_grad():
        for X_val, y_val in test_loader:
            X_val, y_val = X_val.to(device), y_val.to(device)

            logits = model(X_val)
            _, predicted = torch.max(logits.data, 1)
            total_samples += y_val.size(0)
            total_correct += (predicted == y_val).sum().item()

    accuracy = total_correct / total_samples
    print(f'Эпоха {epoch+1}, Accuracy {accuracy}')

Эпоха 1, Accuracy 0.8442
Эпоха 2, Accuracy 0.859
Эпоха 3, Accuracy 0.8714
Эпоха 4, Accuracy 0.8724
Эпоха 5, Accuracy 0.8759
Эпоха 6, Accuracy 0.8843
Эпоха 7, Accuracy 0.8809
Эпоха 8, Accuracy 0.8857
Эпоха 9, Accuracy 0.8825
Эпоха 10, Accuracy 0.8873
Эпоха 11, Accuracy 0.884
Эпоха 12, Accuracy 0.8885
Эпоха 13, Accuracy 0.8857
Эпоха 14, Accuracy 0.8886
Эпоха 15, Accuracy 0.8949


### Task 2: Overfit it.
Build a network that will overfit to this dataset. Demonstrate the overfitting in the appropriate way (e.g. plot loss and accurasy on train and test set w.r.t. network complexity).

*Note:* you also might decrease the size of `train` dataset to enforce the overfitting and speed up the computations.

In [None]:
class OverfittingNeuralNetwork(nn.Module):
    def __init__(self, input_shape=28*28, num_classes=10, input_channels=1):
        super(self.__class__, self).__init__()
        self.model = nn.Sequential(
            nn.Flatten(), # This layer converts image into a vector to use Linear layers afterwards
            # Your network structure comes here
            nn.Linear(input_shape, 2048),
            nn.ReLU(),
            nn.Linear(2048, 1024),
            nn.ReLU(),
            nn.Linear(1024, 512),
            nn.ReLU(),
            nn.Linear(512, 128),
            nn.ReLU(),
            nn.Linear(128, num_classes)
        )

    def forward(self, inp):
        out = self.model(inp)
        return out

In [None]:
torchsummary.summary(OverfittingNeuralNetwork().to(device), (28*28,))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
           Flatten-1                  [-1, 784]               0
            Linear-2                 [-1, 2048]       1,607,680
              ReLU-3                 [-1, 2048]               0
            Linear-4                 [-1, 1024]       2,098,176
              ReLU-5                 [-1, 1024]               0
            Linear-6                  [-1, 512]         524,800
              ReLU-7                  [-1, 512]               0
            Linear-8                  [-1, 128]          65,664
              ReLU-9                  [-1, 128]               0
           Linear-10                   [-1, 10]           1,290
Total params: 4,297,610
Trainable params: 4,297,610
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.06
Params size (MB): 16.39
Estima

In [None]:
model = OverfittingNeuralNetwork().to(device)
opt = torch.optim.Adam(model.parameters(), lr=0.001)
loss_func = nn.CrossEntropyLoss()

num_epochs = 30

for epoch in range(num_epochs):
    model.train()
    for X_batch, y_batch in train_loader:
        X_batch, y_batch = X_batch.to(device), y_batch.to(device)

        opt.zero_grad()
        logits = model(X_batch)
        loss = loss_func(logits, y_batch)
        loss.backward()
        opt.step()

    model.eval()
    total_correct = 0
    total_samples = 0
    with torch.no_grad():
        for X_val, y_val in test_loader:
            X_val, y_val = X_val.to(device), y_val.to(device)

            logits = model(X_val)
            _, predicted = torch.max(logits.data, 1)
            total_samples += y_val.size(0)
            total_correct += (predicted == y_val).sum().item()

    accuracy = total_correct / total_samples
    print(f'Эпоха {epoch+1}, Accuracy {accuracy}')

Эпоха 1, Accuracy 0.831
Эпоха 2, Accuracy 0.8622
Эпоха 3, Accuracy 0.8722
Эпоха 4, Accuracy 0.8767
Эпоха 5, Accuracy 0.8797
Эпоха 6, Accuracy 0.8739
Эпоха 7, Accuracy 0.8869
Эпоха 8, Accuracy 0.8839
Эпоха 9, Accuracy 0.8893
Эпоха 10, Accuracy 0.8883
Эпоха 11, Accuracy 0.8907
Эпоха 12, Accuracy 0.8835
Эпоха 13, Accuracy 0.8898
Эпоха 14, Accuracy 0.8901
Эпоха 15, Accuracy 0.8877
Эпоха 16, Accuracy 0.8866
Эпоха 17, Accuracy 0.8936
Эпоха 18, Accuracy 0.8923
Эпоха 19, Accuracy 0.8942
Эпоха 20, Accuracy 0.8923
Эпоха 21, Accuracy 0.8957
Эпоха 22, Accuracy 0.8945
Эпоха 23, Accuracy 0.8912
Эпоха 24, Accuracy 0.8906
Эпоха 25, Accuracy 0.9006
Эпоха 26, Accuracy 0.8965
Эпоха 27, Accuracy 0.8962
Эпоха 28, Accuracy 0.8989
Эпоха 29, Accuracy 0.896
Эпоха 30, Accuracy 0.9007


### Task 3: Fix it.
Fix the overfitted network from the previous step (at least partially) by using regularization techniques (Dropout/Batchnorm/...) and demonstrate the results.

In [None]:
class FixedNeuralNetwork(nn.Module):
    def __init__(self, input_shape=28*28, num_classes=10, input_channels=1):
        super(self.__class__, self).__init__()
        self.model = nn.Sequential(
            nn.Flatten(), # This layer converts image into a vector to use Linear layers afterwards
            # Your network structure comes here
            nn.Linear(input_shape, 2048),
            nn.ReLU(),
            nn.BatchNorm1d(2048),
            nn.Dropout(0.3),

            nn.Linear(2048, 1024),
            nn.ReLU(),
            nn.BatchNorm1d(1024),
            nn.Dropout(0.3),

            nn.Linear(1024, 512),
            nn.ReLU(),
            nn.BatchNorm1d(512),
            nn.Dropout(0.3),

            nn.Linear(512, 128),
            nn.ReLU(),
            nn.BatchNorm1d(128),
            nn.Dropout(0.3),

            nn.Linear(128, num_classes)
        )

    def forward(self, inp):
        out = self.model(inp)
        return out

In [None]:
torchsummary.summary(FixedNeuralNetwork().to(device), (28*28,))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
           Flatten-1                  [-1, 784]               0
            Linear-2                 [-1, 2048]       1,607,680
              ReLU-3                 [-1, 2048]               0
       BatchNorm1d-4                 [-1, 2048]           4,096
           Dropout-5                 [-1, 2048]               0
            Linear-6                 [-1, 1024]       2,098,176
              ReLU-7                 [-1, 1024]               0
       BatchNorm1d-8                 [-1, 1024]           2,048
           Dropout-9                 [-1, 1024]               0
           Linear-10                  [-1, 512]         524,800
             ReLU-11                  [-1, 512]               0
      BatchNorm1d-12                  [-1, 512]           1,024
          Dropout-13                  [-1, 512]               0
           Linear-14                  [

In [None]:
model = FixedNeuralNetwork().to(device)
opt = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-5)
loss_func = nn.CrossEntropyLoss()

num_epochs = 30

for epoch in range(num_epochs):
    model.train()
    for X_batch, y_batch in train_loader:
        X_batch, y_batch = X_batch.to(device), y_batch.to(device)

        opt.zero_grad()
        logits = model(X_batch)
        loss = loss_func(logits, y_batch)
        loss.backward()
        opt.step()

    model.eval()
    total_correct = 0
    total_samples = 0
    with torch.no_grad():
        for X_val, y_val in test_loader:
            X_val, y_val = X_val.to(device), y_val.to(device)

            logits = model(X_val)
            _, predicted = torch.max(logits.data, 1)
            total_samples += y_val.size(0)
            total_correct += (predicted == y_val).sum().item()

    accuracy = total_correct / total_samples
    print(f'Эпоха {epoch+1}, Accuracy {accuracy}')

Эпоха 1, Accuracy 0.852
Эпоха 2, Accuracy 0.8602
Эпоха 3, Accuracy 0.8591
Эпоха 4, Accuracy 0.8656
Эпоха 5, Accuracy 0.8624
Эпоха 6, Accuracy 0.867
Эпоха 7, Accuracy 0.8668
Эпоха 8, Accuracy 0.8792
Эпоха 9, Accuracy 0.8737
Эпоха 10, Accuracy 0.8821
Эпоха 11, Accuracy 0.879
Эпоха 12, Accuracy 0.8825
Эпоха 13, Accuracy 0.884
Эпоха 14, Accuracy 0.8845
Эпоха 15, Accuracy 0.878
Эпоха 16, Accuracy 0.8876
Эпоха 17, Accuracy 0.8832
Эпоха 18, Accuracy 0.8832
Эпоха 19, Accuracy 0.8893
Эпоха 20, Accuracy 0.8851
Эпоха 21, Accuracy 0.8865
Эпоха 22, Accuracy 0.8838
Эпоха 23, Accuracy 0.8907
Эпоха 24, Accuracy 0.8891
Эпоха 25, Accuracy 0.8875
Эпоха 26, Accuracy 0.89
Эпоха 27, Accuracy 0.887
Эпоха 28, Accuracy 0.8901
Эпоха 29, Accuracy 0.8878
Эпоха 30, Accuracy 0.89


### Conclusions:
_Write down small report with your conclusions and your ideas._