# M2177.003100 Deep Learning <br> Assignment #1 Part 3: Playing with Neural Networks by PyTorch

Copyright (C) Data Science & AI Laboratory, Seoul National University. This material is for educational uses only. Some contents are based on the material provided by other paper/book authors and may be copyrighted by them. 

Previously in `Assignment1-1_Data_Curation.ipynb`, we created a pickle with formatted datasets for training, development and testing on the [notMNIST dataset](http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html).

The goal of this assignment is to progressively train deeper and more accurate models using PyTorch.

**Note**: certain details are missing or ambiguous on purpose, in order to test your knowledge on the related materials. However, if you really feel that something essential is missing and cannot proceed to the next step, then contact the teaching staff with clear description of your problem.

### Submitting your work:
<font color=red>**DO NOT clear the final outputs**</font> so that TAs can grade both your code and results.  
Once you have done **part 1 - 3**, run the *CollectSubmission.sh* script with your **Student number** as input argument. <br>
This will produce a compressed file called *[Your student number].tar.gz*. Please submit this file on ETL. &nbsp;&nbsp; (Usage: ./*CollectSubmission.sh* &nbsp; 20\*\*-\*\*\*\*\*)

## Load datasets

First reload the data we generated in `Assignment1-1_Data_Curation.ipynb`.

In [1]:
# These are all the modules we'll be using later. Make sure you can import them
# before proceeding further.
from __future__ import print_function
import numpy as np
import torch
from six.moves import cPickle as pickle
from six.moves import range
import os

In [2]:
pickle_file = '/home/jackyoung96/2020_2/Deeplearning_assignment/HW1_data/notMNIST.pickle'

with open(pickle_file, 'rb') as f:
    save = pickle.load(f)
    train_dataset = save['train_dataset']
    train_labels = save['train_labels']
    valid_dataset = save['valid_dataset']
    valid_labels = save['valid_labels']
    test_dataset = save['test_dataset']
    test_labels = save['test_labels']
    del save  # hint to help gc free up memory
    print('Training set', train_dataset.shape, train_labels.shape)
    print('Validation set', valid_dataset.shape, valid_labels.shape)
    print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 28, 28) (200000,)
Validation set (10000, 28, 28) (10000,)
Test set (10000, 28, 28) (10000,)


Reformat into a shape that's more adapted to the models we're going to train:
- unnormalize data
- data as a flat matrix

In [3]:
image_size = 28
num_labels = 10

def reformat(dataset):
    dataset = dataset * 255.0 + 255.0/2
    dataset = dataset.reshape((-1, image_size * image_size)).astype(np.float32)
    return dataset

train_dataset = reformat(train_dataset)
valid_dataset = reformat(valid_dataset)
test_dataset = reformat(test_dataset)

print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 784) (200000,)
Validation set (10000, 784) (10000,)
Test set (10000, 784) (10000,)


## PyTorch tutorial: Fully Connected Network

We're first going to train a **fully connected network** with *1 hidden layer* with *1024 units* using stochastic gradient descent (SGD).

In [4]:
from torch.utils.data import Dataset
from torch.utils.data import DataLoader

import torch.nn as nn
import torch.nn.functional as F
from torch.nn.parameter import Parameter

import torch.optim as optim

### first, define NotMNIST dataset class.  
- dataset class inherits torch.utils.data.Dataset class  
- every dataset class should define \_\_len\_\_() and \_\_getitem\_\_()

In [5]:
class NotMNIST(Dataset):
    def __init__(self, data, label):
        self.data = data
        self.label = label
        
    def __len__(self):
        return len(self.data)
    
    def __getitem__(self, idx):
        data_idx = self.data[idx]
        label_idx = self.label[idx]
        return data_idx, label_idx

    
notmnist_train = NotMNIST(train_dataset, train_labels)
notmnist_valid = NotMNIST(valid_dataset, valid_labels)
notmnist_test = NotMNIST(test_dataset, test_labels)

print('training set length: ', len(notmnist_train))
print('validation set length: ', len(notmnist_valid))
print('test set length: ', len(notmnist_test))

training set length:  200000
validation set length:  10000
test set length:  10000


### Then, make dataloader using NotMNIST dataset objects  
Note that torch.utils.data.DataLoader is a subclass of Iterable, which means it can be used with 'for' statement (for more detailed explanation of Iterable, refer to https://shoark7.github.io/programming/python/iterable-iterator-generator-in-python)  

In [6]:
batch_size = 64

train_loader = DataLoader(dataset=notmnist_train, batch_size=batch_size, shuffle=True, drop_last=True)
valid_loader = DataLoader(dataset=notmnist_valid, batch_size=len(notmnist_valid), shuffle=True)
test_loader = DataLoader(dataset=notmnist_test, batch_size=len(notmnist_test), shuffle=True)

from collections.abc import Iterable
print(issubclass(DataLoader, Iterable))

print('train loader length: ', len(train_loader)) # same as len(dataset) // batch_size
print('valid loader length: ', len(valid_loader))
print('test loader length: ', len(test_loader))

True
train loader length:  3125
valid loader length:  1
test loader length:  1


### Define Naive Linear model
- model should inherit nn.Module
- implement feed forward by overriding **forward** method of nn.Module

In [7]:
torch.manual_seed(1)

class NaiveLinear(nn.Module):
    
    def __init__(self, in_features, out_features):
        super(NaiveLinear, self).__init__()
        self.weight = Parameter(torch.Tensor(in_features, out_features))
        self.bias = Parameter(torch.Tensor(out_features))
        torch.nn.init.uniform_(self.weight, -1.0, 1.0)
        torch.nn.init.zeros_(self.bias)
    
    def forward(self, input):
        return torch.matmul(input, self.weight) + self.bias

In [8]:
class Model(nn.Module):
    
    def __init__(self, in_features, nn_hidden, num_labels):
        super(Model, self).__init__()
        self.fc1 = NaiveLinear(in_features, nn_hidden)
        self.fc2 = NaiveLinear(nn_hidden, num_labels)
        
    def forward(self, x):
        x = torch.tanh(self.fc1(x))
        x = F.log_softmax(self.fc2(x), dim=1)
        return x

In [9]:
nn_hidden = 1024

model = Model(image_size*image_size, nn_hidden, num_labels)

# move model to GPU
model.cuda()
# print model, initialized weight, grad buffer
print(model)
print(model.fc1.weight.data)
print(model.fc1.bias.grad)

Model(
  (fc1): NaiveLinear()
  (fc2): NaiveLinear()
)
tensor([[ 0.5153, -0.4414, -0.1939,  ..., -0.0334, -0.3184, -0.6335],
        [ 0.1658,  0.0407,  0.1526,  ..., -0.9122, -0.9774, -0.8611],
        [-0.5218,  0.6270,  0.8866,  ..., -0.1884,  0.3789, -0.5059],
        ...,
        [ 0.0949, -0.5369,  0.1770,  ...,  0.7149,  0.9331, -0.7348],
        [-0.7956,  0.9141,  0.3562,  ..., -0.1767, -0.8524,  0.0262],
        [-0.8096, -0.3422, -0.3679,  ...,  0.1363,  0.2865,  0.6339]],
       device='cuda:0')
None


Now, define loss function and optimizer

In [10]:
criterion = nn.NLLLoss()
optimizer = optim.SGD(params=model.parameters(), lr=0.005)

Let's run this computation and iterate:

In [11]:
epochs = 10
log_step = 1000

def accuracy(logits, labels):
    logits = logits.cpu().detach().numpy()
    labels = labels.cpu().detach().numpy()
    
    return (100.0 * np.sum(np.equal(np.argmax(logits, 1), labels))
          / logits.shape[0])


# train_model
def train(model, train_loader, optimizer, criterion, epoch):
    model.train()
    for idx, data in enumerate(train_loader):
        images_flatten, labels = data[0].cuda(), data[1].long().cuda()
        logits = model(images_flatten)

        optimizer.zero_grad()
        loss = criterion(logits, labels)
        loss.backward()
        optimizer.step()

        if (idx % log_step) == log_step-1:
            print(f'epoch: {epoch+1} [{idx + 1} / {len(train_loader)}]\t train_loss: {loss.item():.3f}\t train_accuracy: {accuracy(logits, labels):.1f}\n')


# evaluate model
def evaluate(model, test_loader):
    with torch.no_grad():
        for _, data in enumerate(test_loader):
            test_images_flatten, test_labels = data[0].cuda(), data[1].cuda()
            test_logits = model(test_images_flatten)

        print(f'accuracy: {accuracy(test_logits, test_labels):.1f}\n')
    

In [12]:
for epoch in range(epochs):
    train(model, train_loader, optimizer, criterion, epoch)
    print('-------- validation --------')
    evaluate(model, valid_loader)

            
print('-------- test ---------')
evaluate(model, test_loader)

    
# save model
torch.save(model.state_dict(), './model_checkpoints/naive_model_final.pt')
print('naive model saved')

epoch: 1 [1000 / 3125]	 train_loss: 5.108	 train_accuracy: 68.8

epoch: 1 [2000 / 3125]	 train_loss: 4.471	 train_accuracy: 75.0

epoch: 1 [3000 / 3125]	 train_loss: 5.345	 train_accuracy: 70.3

-------- validation --------
accuracy: 73.7

epoch: 2 [1000 / 3125]	 train_loss: 3.363	 train_accuracy: 70.3

epoch: 2 [2000 / 3125]	 train_loss: 3.448	 train_accuracy: 78.1

epoch: 2 [3000 / 3125]	 train_loss: 2.918	 train_accuracy: 71.9

-------- validation --------
accuracy: 76.0

epoch: 3 [1000 / 3125]	 train_loss: 2.952	 train_accuracy: 76.6

epoch: 3 [2000 / 3125]	 train_loss: 4.682	 train_accuracy: 71.9

epoch: 3 [3000 / 3125]	 train_loss: 4.327	 train_accuracy: 70.3

-------- validation --------
accuracy: 76.8

epoch: 4 [1000 / 3125]	 train_loss: 3.225	 train_accuracy: 73.4

epoch: 4 [2000 / 3125]	 train_loss: 3.127	 train_accuracy: 73.4

epoch: 4 [3000 / 3125]	 train_loss: 3.204	 train_accuracy: 75.0

-------- validation --------
accuracy: 77.3

epoch: 5 [1000 / 3125]	 train_loss: 1.70

So far, you have built the model in a naive way. However, PyTorch provides a linear module named nn.Linear for your convenience. 

From now on, build the same model as above using layers module.

You can also build model using nn.Sequential()

In [13]:
model_layer = nn.Sequential(
            # neural network using nn.Linear
            nn.Linear(image_size * image_size, nn_hidden),
            nn.Tanh(),
            nn.Linear(nn_hidden, num_labels),
            nn.LogSoftmax(dim=1)
            )

model_layer.cuda()

Sequential(
  (0): Linear(in_features=784, out_features=1024, bias=True)
  (1): Tanh()
  (2): Linear(in_features=1024, out_features=10, bias=True)
  (3): LogSoftmax(dim=1)
)

In [14]:
criterion_layer = nn.NLLLoss()
optimizer_layer = optim.SGD(model_layer.parameters(), lr=0.005)

In [15]:
for epoch in range(epochs):
    train(model_layer, train_loader, optimizer_layer, criterion_layer, epoch)
    print('-------- validation --------')
    evaluate(model_layer, valid_loader)

            
print('-------- test ---------')
evaluate(model_layer, test_loader)

    
# save model
torch.save(model_layer.state_dict(), './model_checkpoints/layer_model_final.pt')
print('layer_model saved')

epoch: 1 [1000 / 3125]	 train_loss: 0.745	 train_accuracy: 84.4

epoch: 1 [2000 / 3125]	 train_loss: 0.674	 train_accuracy: 82.8

epoch: 1 [3000 / 3125]	 train_loss: 1.024	 train_accuracy: 73.4

-------- validation --------
accuracy: 82.0

epoch: 2 [1000 / 3125]	 train_loss: 0.743	 train_accuracy: 78.1

epoch: 2 [2000 / 3125]	 train_loss: 0.989	 train_accuracy: 71.9

epoch: 2 [3000 / 3125]	 train_loss: 0.624	 train_accuracy: 84.4

-------- validation --------
accuracy: 82.5

epoch: 3 [1000 / 3125]	 train_loss: 0.676	 train_accuracy: 82.8

epoch: 3 [2000 / 3125]	 train_loss: 0.583	 train_accuracy: 85.9

epoch: 3 [3000 / 3125]	 train_loss: 0.777	 train_accuracy: 79.7

-------- validation --------
accuracy: 82.6

epoch: 4 [1000 / 3125]	 train_loss: 0.568	 train_accuracy: 82.8

epoch: 4 [2000 / 3125]	 train_loss: 0.527	 train_accuracy: 87.5

epoch: 4 [3000 / 3125]	 train_loss: 0.564	 train_accuracy: 79.7

-------- validation --------
accuracy: 82.9

epoch: 5 [1000 / 3125]	 train_loss: 0.80

---
Problem 1
-------

**Describe below** why there is a difference in an accuracy between the model using nn.Linear and the model which is built in a naive way. **explain simply**  
You can refer to PyTorch documentation(https://pytorch.org/docs/stable/index.html) to check nn.Linear() implementation





---

In built in naive model, we initialize the weight in uniform distribution in [-1,1] and bias in zero. However in nn.Linear model, they initialize the weight and bias in uniform distribution in [$\frac{-1}{\sqrt(n)}$,$\frac{1}{\sqrt(n)}$] where n is input feature size. This difference might make different accuracy.

































>
/Assignment1_sol/
Name
Last Modified

---
Problem 2
-------

Try to get the best performance you can using a multi-layer model! (It doesn't matter whether you implement it in a naive way or using layer module. HOWEVER, you CANNOT use other type of layers such as conv.) 

The best reported test accuracy using a deep network is [97.1%](http://yaroslavvb.blogspot.kr/2011/09/notmnist-dataset.html?showComment=1391023266211#c8758720086795711595). You may use techniques below.

1. Experiment with different hyperparameters: epochs, learning rate, etc.
2. We used a fixed learning rate epsilon for gradient descent. Implement an annealing schedule for the gradient descent learning rate ([more info](http://cs231n.github.io/neural-networks-3/#anneal)). *Hint*. Try using `torch.optim.lr_scheduler.ExponentialLR()`.    
3. We used a tanh activation function for our hidden layer. Experiment with other activation functions included in PyTorch.
4. Extend the network to multiple hidden layers. Experiment with the layer sizes. Adding another hidden layer means you will need to adjust the code. 
5. Introduce and tune regularization method (e.g. L2 regularization) for your model. Remeber that L2 amounts to adding a penalty on the norm of the weights to the loss. The right amount of regularization should imporve your validation / test accuracy.


**Evaluation:** You will get full credit if your best test accuracy exceeds <font color=red>$93\%$</font>. Save your best perfoming model using saver.  
**<font color=red>Save your model in directory ./model_checkpoints</font>** (Refer to the cell above)  
**<font color=red>Please follow format as problem2_(Student Number)</font>** (e.g. set path as './model_checkpoints/problem2_2020-23456')

---

In [16]:
print(__doc__)
""" TODO """


Automatically created module for IPython interactive environment


' TODO '

In [62]:
#define model

def makemodel(dim_input,dim_output,dim_hidden_list):
    model = nn.Sequential()
    dim_in = dim_input
    print(dim_in)
    for idx,dim_h in enumerate(dim_hidden_list):
        model.add_module("linear"+str(idx),nn.Linear(dim_in,dim_h))
        model.add_module("activate"+str(idx),nn.ReLU())
        dim_in = dim_h
    model.add_module("Linear",nn.Linear(dim_in,dim_output))
    model.add_module("Activate",nn.LogSoftmax(dim = 1))

    
    return model.cuda()

In [63]:
def accuracy(pred, label):
    pred = pred.cpu().detach().numpy()
    label = label.cpu().detach().numpy()
    
    return (100* np.sum(np.equal(np.argmax(pred,1),label))/pred.shape[0])
    

def train(model,train_loader,optimizer,criterion, _print = False):
    model.train()
    for idx,data in enumerate(train_loader):
        input_data, labels = data[0].cuda(), data[1].long().cuda()
        pred = model(input_data)
        
        optimizer.zero_grad()
        loss = criterion(pred, labels)
        loss.backward()
        optimizer.step()
        
    if print:
        print(f'train_loss: {loss.item():.3f}\t train_accuracy: {accuracy(pred, labels):.1f}\n')
            
def evaluation(model, test_loader):
    with torch.no_grad():
        for _,data in enumerate(test_loader):
            input_data, labels = data[0].cuda(), data[1].cuda()
            pred = model(input_data)
            
        print(f'accuracy :{accuracy(pred,labels):.1f}\n')

In [67]:
# set hyperparameter

dim_hidden_list_list = [[1024],
                        [500,300],
                       [1000,500],
                       [1500,600],
                       [1000,500,200],
                       [900,300,100],
                       [500,300,200,100],
                       [500,300,200,100,50]]

epochs = 10
dim_input = image_size * image_size
dim_output = num_labels

for dim_hidden_list in dim_hidden_list_list:
    
    jk_model = makemodel(dim_input, dim_output, dim_hidden_list)
    optimizer = optim.Adam(jk_model.parameters(),lr = 0.001, weight_decay=0.001)
    criterion = nn.CrossEntropyLoss()
    print(f'-------- At hidden layer {dim_hidden_list} --------')
    for epoch in range(epochs):
        print(f'-------- At epoch {epoch} ----------')
        train(jk_model, train_loader, optimizer, criterion)
        evaluate(jk_model, valid_loader)


    print(f'-------- test at hidden layer {dim_hidden_list} ---------')
    evaluate(jk_model, test_loader)
    jk_model.cpu()

784
Sequential(
  (linear0): Linear(in_features=784, out_features=1024, bias=True)
  (activate0): ReLU()
  (Linear): Linear(in_features=1024, out_features=10, bias=True)
  (Activate): LogSoftmax(dim=1)
)
-------- At hidden layer [1024] --------
-------- At epoch 0 ----------
train_loss: 0.460	 train_accuracy: 84.4

accuracy: 86.2

-------- At epoch 1 ----------
train_loss: 0.414	 train_accuracy: 85.9

accuracy: 86.8

-------- At epoch 2 ----------
train_loss: 0.439	 train_accuracy: 89.1

accuracy: 87.2

-------- At epoch 3 ----------
train_loss: 0.480	 train_accuracy: 84.4

accuracy: 87.5

-------- At epoch 4 ----------
train_loss: 0.371	 train_accuracy: 85.9

accuracy: 87.3

-------- At epoch 5 ----------
train_loss: 0.285	 train_accuracy: 92.2

accuracy: 87.3

-------- At epoch 6 ----------
train_loss: 0.479	 train_accuracy: 85.9

accuracy: 87.3

-------- At epoch 7 ----------
train_loss: 0.525	 train_accuracy: 84.4

accuracy: 87.3

-------- At epoch 8 ----------
train_loss: 0.211	 t

train_loss: 0.561	 train_accuracy: 84.4

accuracy: 86.3

-------- At epoch 1 ----------
train_loss: 0.240	 train_accuracy: 93.8

accuracy: 87.0

-------- At epoch 2 ----------
train_loss: 0.399	 train_accuracy: 82.8

accuracy: 87.7

-------- At epoch 3 ----------
train_loss: 0.339	 train_accuracy: 90.6

accuracy: 87.8

-------- At epoch 4 ----------
train_loss: 0.317	 train_accuracy: 89.1

accuracy: 87.7

-------- At epoch 5 ----------
train_loss: 0.201	 train_accuracy: 95.3

accuracy: 88.0

-------- At epoch 6 ----------
train_loss: 0.366	 train_accuracy: 87.5

accuracy: 88.5

-------- At epoch 7 ----------
train_loss: 0.313	 train_accuracy: 87.5

accuracy: 88.1

-------- At epoch 8 ----------
train_loss: 0.251	 train_accuracy: 92.2

accuracy: 88.0

-------- At epoch 9 ----------
train_loss: 0.176	 train_accuracy: 96.9

accuracy: 87.8

-------- test at hidden layer [500, 300, 200, 100] ---------
accuracy: 93.5

784
Sequential(
  (linear0): Linear(in_features=784, out_features=500, bia

The Last model is the best!! so I will save it

In [68]:
# save model
torch.save(jk_model.state_dict(), './model_checkpoints/problem2_2020-27508.pt')
print('model saved')

model saved
