1. ##    1. Importing Required Packages

In [1]:
from __future__ import print_function
#%matplotlib inline
import argparse
import os
import random
import torch
import torch.nn as nn
import torch.nn.parallel
import torch.backends.cudnn as cudnn
import torch.optim as optim
import torch.utils.data
import torchvision
# import torchvision.datasets as dset
# import torchvision.transforms as transforms
# import torchvision.utils as vutils
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from IPython.display import HTML
from copy import deepcopy
import time
from itertools import product
# Set random seed for reproducibility
manual_seed = random.randint(1, 10000)
print("Random Seed: ", manual_seed)
random.seed(manual_seed)
torch.manual_seed(manual_seed)

!mkdir results

Random Seed:  2529
mkdir: cannot create directory ‘results’: File exists


## 2. Downloading Dataset

Here, I will use Fashion MNIST as out input data set. It is image data set where each image is clothes. 
![Fashion-MNIST](https://github.com/zalandoresearch/fashion-mnist/raw/master/doc/img/fashion-mnist-sprite.png)

Each data is 28×28 with grayscale image associated with 10 class labels, total 784 dimensions for input size.

0.	T-shirt/top
1.	Trouser
2.	Pullover
3.	Dress
4.	Coat
5.	Sandal
6.	Shirt
7.	Sneaker
8.	Bag
9.	Ankle boot


The data set contains total 60,000 data for training and 10,000 data for testing. Here I will use 50,000 among training set for training and 10,000 for validation, and finally 10,000 for testing.

In [2]:
transform = torchvision.transforms.Compose(
    [torchvision.transforms.ToTensor(), 
     torchvision.transforms.Normalize((0,), (0.5,))])
# Need to add normalize
trainset = torchvision.datasets.FashionMNIST(root='../data', train=True,
                                        download=True, transform=transform)
testset = torchvision.datasets.FashionMNIST(root='../data', train=False,
                                       download=True, transform=transform)
partition = {'train+val': trainset, 'test':testset}

0it [00:00, ?it/s]

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to ../data/FashionMNIST/raw/train-images-idx3-ubyte.gz


26427392it [00:04, 5865455.51it/s]                              


Extracting ../data/FashionMNIST/raw/train-images-idx3-ubyte.gz to ../data/FashionMNIST/raw


0it [00:00, ?it/s]

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to ../data/FashionMNIST/raw/train-labels-idx1-ubyte.gz


32768it [00:00, 39870.20it/s]                           
0it [00:00, ?it/s]

Extracting ../data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to ../data/FashionMNIST/raw
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to ../data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz


4423680it [00:02, 1683046.56it/s]                             
0it [00:00, ?it/s]

Extracting ../data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to ../data/FashionMNIST/raw
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to ../data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz


8192it [00:00, 15078.95it/s]            

Extracting ../data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to ../data/FashionMNIST/raw
Processing...
Done!





In [3]:
print(len(partition["train+val"]))
print(len(partition["test"]))

60000
10000


## 3. Model of neural network

Here, I will use MLP with batch normalization and dropout. I will fix size of neural net as 576-500-500-500-500-10 so we get enough overfitting, and regularization method works well. 

Also, I will apply weight initialization method. Here I will use xavier init for tanh/sigmoid activation function, and He init for ReLU activation function. For this choice, I referred this article [Weight Initialization](https://reniew.github.io/13/). 

One of the regularization method we studied was ridge/lasso regularization. However, it is not implemented in error function, because weight_decay in optimizer's field works exactly same as l2 regularization. So I will use that for another regularization method.

In [0]:
class MLP(nn.Module):
    def __init__(self, in_dim, out_dim, hid_dim, n_layer, act, dropout, batchnorm):
        super(MLP, self).__init__()
        self.in_dim = in_dim
        self.out_dim = out_dim
        self.hid_dim = hid_dim
        self.n_layer = n_layer
        self.act = act
        self.dropout = dropout
        self.batchnorm = batchnorm
        
        # ====== Create Linear Layers ====== #
        self.fc1 = nn.Linear(self.in_dim, self.hid_dim)
        
        self.linears = nn.ModuleList()
        self.bns = nn.ModuleList()
        self.init_bn = nn.BatchNorm1d(self.hid_dim)
        self.last_bn = nn.BatchNorm1d(self.out_dim)
        for i in range(self.n_layer-1):
            self.linears.append(nn.Linear(self.hid_dim, self.hid_dim))
            if self.batchnorm:
                self.bns.append(nn.BatchNorm1d(self.hid_dim))
                
        self.fc2 = nn.Linear(self.hid_dim, self.out_dim)
        
        # ====== Create Activation Function ====== #
        if self.act == 'relu':
            self.act = nn.ReLU()
        elif self.act == 'tanh':
            self.act == nn.Tanh()
        elif self.act == 'sigmoid':
            self.act = nn.Sigmoid()
        else:
            raise ValueError('no valid activation function selected!')
        
        # ====== Create Regularization Layer ======= #
        self.dropout = nn.Dropout(self.dropout)
        self.weight_init()
          
    def forward(self, x):
        # Ordering : FC -> BatchNorm -> act -> dropout
        # I referred this stackoverflow answer for ordering. 
        # https://stackoverflow.com/a/40295999
        x = self.fc1(x)
        x = self.init_bn(x)
        x = self.act(x)
        x = self.dropout(x)
        for i in range(len(self.linears)):
            x = self.linears[i](x)
            if self.batchnorm: 
                x = self.bns[i](x)
            x = self.act(x)
            x = self.dropout(x)
        x = self.fc2(x)
        x = self.last_bn(x)
        x = self.act(x)
        # No dropout in last
        return x
    
    def weight_init(self):
        for linear in self.linears:
            if self.act == 'tanh' or self.act == 'sigmoid':
                nn.init.xavier_normal_(linear.weight)
            else:
                nn.init.kaiming_normal_(linear.weight)
            linear.bias.data.fill_(0.01)
            
test_net = MLP(in_dim = 784, out_dim = 10, hid_dim = 100, n_layer = 4, act = 'sigmoid', dropout = 0.1, batchnorm = True) # Testing Model Construction

## 4. Defining Hyperparameters. 

In [5]:
# parser declaration
parser = argparse.ArgumentParser()
args = parser.parse_args("")
args.in_dim = 784 # Fixed
args.out_dim = 10 # Fixed
args.batch_size = 8 # Fixed
args.test_batch_size = 1000 # Fixed
args.lr = .01 # Scheduler will change
args.epoch = 21 # Fixed
args.hid_dim = 300 # Fixed
args.n_layer = 4 # Fixed
args.dropout = .1 # tuned
args.batch_norm = True # tuned
args.act = 'relu' # tuned
args.step_size = 3 # Fixed
args.gamma = 1/np.sqrt(10) # Fixed
args.weight_decay = 1 # tuned

print(args)


Namespace(act='relu', batch_norm=True, batch_size=8, dropout=0.1, epoch=21, gamma=0.31622776601683794, hid_dim=300, in_dim=784, lr=0.01, n_layer=4, out_dim=10, step_size=3, test_batch_size=1000, weight_decay=1)


## 5. Defining training related function, train and validate, test.

Here, I used cross validation. Unlike cross validation on our lecture slide, I randomly sampled 50,000 data for training for each epoch. 

In [0]:
def train(net, partition, optimizer, criterion, args):
    trainloader = torch.utils.data.DataLoader(partition['train'], 
                                              batch_size=args.batch_size, 
                                              shuffle=True, num_workers=2)
    net.train()
    optimizer.zero_grad()

    correct = 0
    total = 0
    train_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs
        inputs, labels = data
        inputs = inputs.view(-1, args.in_dim)
        outputs = net(inputs)

        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        train_loss += loss.item()
        assert(not np.isnan(loss.item()))
        #print(loss.item())
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    train_loss = train_loss / len(trainloader)
    train_acc = 100 * correct / total
    return net, train_loss, train_acc

In [0]:
def validate(net, partition, criterion, args):
    valloader = torch.utils.data.DataLoader(partition['val'], 
                                            batch_size=args.test_batch_size, 
                                            shuffle=False, num_workers=2)
    net.eval()

    correct = 0
    total = 0
    val_loss = 0 
    with torch.no_grad():
        for data in valloader:
            images, labels = data
            images = images.view(-1, args.in_dim)
            outputs = net(images)

            loss = criterion(outputs, labels)
            
            val_loss += loss.item()
            assert(not np.isnan(loss.item()))
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

        val_loss = val_loss / len(valloader)
        val_acc = 100 * correct / total
    return val_loss, val_acc

In [0]:
def test(net, partition, args):
    testloader = torch.utils.data.DataLoader(partition['test'], 
                                             batch_size=args.test_batch_size, 
                                             shuffle=False, num_workers=2)
    net.eval()
    
    correct = 0
    total = 0
    with torch.no_grad():
        for data in testloader:
            images, labels = data
            images = images.view(-1, args.in_dim)

            outputs = net(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

        test_acc = 100 * correct / total
    return test_acc

In [0]:
def experiment(partition, args):
  
    net = MLP(args.in_dim, args.out_dim, args.hid_dim, args.n_layer, args.act, args.dropout, args.batchnorm)
    # net.cuda()
    print(args)

    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(net.parameters(), lr=args.lr, weight_decay=args.weight_decay)
    scheduler = optim.lr_scheduler.StepLR(optimizer, args.step_size, args.gamma)
    
    train_losses = []
    val_losses = []
    train_accs = []
    val_accs = []
        
    for epoch in range(args.epoch):  # loop over the dataset multiple times
        ts = time.time()
        trainset, valset = torch.utils.data.random_split(partition["train+val"], [50000, 10000])
        net, train_loss, train_acc = train(net, {"train":trainset, "val":valset, "test":partition["test"]}, optimizer, criterion, args)
        val_loss, val_acc = validate(net, {"train":trainset, "val":valset, "test":partition["test"]}, criterion, args)

        te = time.time()
        scheduler.step()
        
        train_losses.append(train_loss)
        val_losses.append(val_loss)
        train_accs.append(train_acc)
        val_accs.append(val_acc)
        
        print('Epoch {}, Acc(train/val): {:2.2f}/{:2.2f}, Loss(train/val) {:2.2f}/{:2.2f}. Took {:2.2f} sec'.format(epoch, train_acc, val_acc, train_loss, val_loss, te-ts))
        
    test_acc = test(net, partition, args)    
    
    result = {}
    result['train_losses'] = train_losses
    result['val_losses'] = val_losses
    result['train_accs'] = train_accs
    result['val_accs'] = val_accs
    result['train_acc'] = train_acc
    result['val_acc'] = val_acc
    result['test_acc'] = test_acc
    return vars(args), result

## 6. Plotting & Saving experiment result

In [0]:
import hashlib
import json
from os import listdir
from os.path import isfile, join
import pandas as pd

def save_exp_result(setting, result):
    exp_name = setting['exp_name']
    del setting['epoch']
    del setting['test_batch_size']

    hash_key = hashlib.sha1(str(setting).encode()).hexdigest()[:6]
    filename = './results/{}-{}.json'.format(exp_name, hash_key)
    result.update(setting)
    with open(filename, 'w') as f:
        json.dump(result, f)

    
def load_exp_result(exp_name):
    dir_path = './results'
    filenames = [f for f in listdir(dir_path) if isfile(join(dir_path, f)) if '.json' in f]
    list_result = []
    for filename in filenames:
        if exp_name in filename:
            with open(join(dir_path, filename), 'r') as infile:
                results = json.load(infile)
                list_result.append(results)
    df = pd.DataFrame(list_result) # .drop(columns=[])
    return df

In [0]:
def plot_acc(var1, var2, df):

    fig, ax = plt.subplots(1, 3)
    fig.set_size_inches(15, 6)
    sns.set_style("darkgrid", {"axes.facecolor": ".9"})

    sns.barplot(x=var1, y='train_acc', hue=var2, data=df, ax=ax[0])
    sns.barplot(x=var1, y='val_acc', hue=var2, data=df, ax=ax[1])
    sns.barplot(x=var1, y='test_acc', hue=var2, data=df, ax=ax[2])
    
    ax[0].set_title('Train Accuracy')
    ax[1].set_title('Validation Accuracy')
    ax[2].set_title('Test Accuracy')

    
def plot_loss_variation(var1, var2, df, **kwargs):

    list_v1 = df[var1].unique()
    list_v2 = df[var2].unique()
    list_data = []

    for value1 in list_v1:
        for value2 in list_v2:
            row = df.loc[df[var1]==value1]
            row = row.loc[df[var2]==value2]

            train_losses = list(row.train_losses)[0]
            val_losses = list(row.val_losses)[0]

            for epoch, train_loss in enumerate(train_losses):
                list_data.append({'type':'train', 'loss':train_loss, 'epoch':epoch, var1:value1, var2:value2})
            for epoch, val_loss in enumerate(val_losses):
                list_data.append({'type':'val', 'loss':val_loss, 'epoch':epoch, var1:value1, var2:value2})

    df = pd.DataFrame(list_data)
    g = sns.FacetGrid(df, row=var2, col=var1, hue='type', **kwargs)
    g = g.map(plt.plot, 'epoch', 'loss', marker='.')
    g.add_legend()
    g.fig.suptitle('Train loss vs Val loss')
    plt.subplots_adjust(top=0.89) # 만약 Title이 그래프랑 겹친다면 top 값을 조정해주면 됩니다! 함수 인자로 받으면 그래프마다 조절할 수 있겠죠?


def plot_acc_variation(var1, var2, df, **kwargs):
    list_v1 = df[var1].unique()
    list_v2 = df[var2].unique()
    list_data = []

    for value1 in list_v1:
        for value2 in list_v2:
            row = df.loc[df[var1]==value1]
            row = row.loc[df[var2]==value2]

            train_accs = list(row.train_accs)[0]
            val_accs = list(row.val_accs)[0]
            test_acc = list(row.test_acc)[0]

            for epoch, train_acc in enumerate(train_accs):
                list_data.append({'type':'train', 'Acc':train_acc, 'test_acc':test_acc, 'epoch':epoch, var1:value1, var2:value2})
            for epoch, val_acc in enumerate(val_accs):
                list_data.append({'type':'val', 'Acc':val_acc, 'test_acc':test_acc, 'epoch':epoch, var1:value1, var2:value2})

    df = pd.DataFrame(list_data)
    g = sns.FacetGrid(df, row=var2, col=var1, hue='type', **kwargs)
    g = g.map(plt.plot, 'epoch', 'Acc', marker='.')

    def show_acc(x, y, metric, **kwargs):
        plt.scatter(x, y, alpha=0.3, s=1)
        metric = "Test Acc: {:1.3f}".format(list(metric.values)[0])
        plt.text(0.05, 0.95, metric,  horizontalalignment='left', verticalalignment='center', transform=plt.gca().transAxes, bbox=dict(facecolor='yellow', alpha=0.5, boxstyle="round,pad=0.1"))
    g = g.map(show_acc, 'epoch', 'Acc', 'test_acc')

    g.add_legend()
    g.fig.suptitle('Train Accuracy vs Val Accuracy')
    plt.subplots_adjust(top=0.89)

## 7. Experiment

My experiment will proceed as following. Our hyperparameter tuning will focus on 3 parameters, dropout, batchnorm, weight_decay(works for l2 regularization). 

The goal of this experiment is minimizing validation loss. 

The hyperparameter tuning will be grid tested for following list.

dropout : \[0.0, 0.1, 0.2, 0.3, 0.4\]

batchnorm : \[True, False\]

weight_decay : \[$10^{-2}, 10^{-1}, 1, 10^1, 10^2$\]

For learning rate, I will use learning rate scheduler in following schedule.

| epoch   &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;     | lr          |
|--------------|-------------|
| $0\le e<3$   | $10^{-2}$   |
| $3\le e<6$   | $10^{-2.5}$ |
| $6\le e<9$   | $10^{-3}$   |
| $9\le e<12$  | $10^{-3.5}$ |
| $12\le e<15$ | $10^{-4}$   |

In [0]:
args.exp_name = "exp1"

name_var1 = 'weight_decay'
name_var2 = 'dropout'
name_var3 = 'batchnorm'
list_var1 = [.1, 1, 10]
list_var2 = [.0, .1, .2, .3, .4]
list_var3 = [True, False]

for var_list in product(list_var1, list_var2, list_var3):
    setattr(args, name_var1, var_list[0])
    setattr(args, name_var2, var_list[1])
    setattr(args, name_var3, var_list[2])
    setting, result = experiment(partition, deepcopy(args))
    save_exp_result(setting, result)

Namespace(act='relu', batch_norm=True, batch_size=8, batchnorm=True, dropout=0.0, epoch=21, exp_name='exp1', gamma=0.31622776601683794, hid_dim=300, in_dim=784, lr=0.01, n_layer=4, out_dim=10, step_size=3, test_batch_size=1000, weight_decay=0.1)
Epoch 0, Acc(train/val): 16.75/11.70, Loss(train/val) 2.29/2.48. Took 99.05 sec
Epoch 1, Acc(train/val): 14.54/9.84, Loss(train/val) 2.37/2.69. Took 99.79 sec
Epoch 2, Acc(train/val): 13.08/10.02, Loss(train/val) 2.47/4.30. Took 99.55 sec
Epoch 3, Acc(train/val): 12.38/9.89, Loss(train/val) 2.39/2.48. Took 100.13 sec
Epoch 4, Acc(train/val): 18.06/13.29, Loss(train/val) 2.25/2.32. Took 100.14 sec
Epoch 5, Acc(train/val): 28.96/11.55, Loss(train/val) 2.07/2.30. Took 99.71 sec
Epoch 6, Acc(train/val): 38.90/45.85, Loss(train/val) 1.90/1.71. Took 99.47 sec
Epoch 7, Acc(train/val): 37.82/47.83, Loss(train/val) 1.92/1.69. Took 99.73 sec
Epoch 8, Acc(train/val): 37.18/45.64, Loss(train/val) 1.93/1.85. Took 99.73 sec
Epoch 9, Acc(train/val): 41.73/50.