# Homework 2, *part 2* (60 points)

In this assignment you will build a convolutional neural net (CNN) to solve Tiny ImageNet image classification. Try to achieve as high accuracy as possible.

## Deliverables

* This file,
* a "checkpoint file" from `torch.save(model.state_dict(), ...)` that contains model's weights (which a TA should be able to load to verify your accuracy).

## Grading

* 9 points for reproducible training code and a filled report below.
* 12 points for building a network that gets above 20% accuracy.
* 6.5 points for beating each of these milestones on the private **test** set:
  * 25.0%
  * 30.0%
  * 32.5%
  * 35.0%
  * 37.5%
  * 40.0%
  
*Private test set* means that you won't be able to evaluate your model on it. Rather, after you submit code and checkpoint, we will load your model and evaluate it on that test set ourselves (so please make sure it's easy for TAs to do!), reporting your accuracy in a comment to the grade.
    
## Restrictions

* Don't use pretrained networks.

## Tips

* One change at a time: never test several new things at once.
* Google a lot.
* Use GPU.
* Use regularization: L2, batch normalization, dropout, data augmentation.
* Use Tensorboard ([non-Colab](https://github.com/lanpa/tensorboardX) or [Colab](https://medium.com/@tommytao_54597/use-tensorboard-in-google-colab-16b4bb9812a6)) or a similar interactive tool for viewing progress.

In [0]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

## 1. Проверим, подключен ли GPU

In [0]:
## check if there is a connection to GPU
import tensorflow as tf
tf.test.gpu_device_name()

'/device:GPU:0'

## 2. Подгрузим датасет

In [0]:
import os
from urllib.request import urlretrieve

def download(path, url='http://cs231n.stanford.edu/tiny-imagenet-200.zip'):
    dataset_name = 'tiny-imagenet-200'

    if os.path.exists(os.path.join(path, dataset_name, "val", "n01443537")):
        print("%s already exists, not downloading" % os.path.join(path, dataset_name))
        return
    else:
        print("Dataset not exists or is broken, downloading it")
    urlretrieve(url, os.path.join(path, dataset_name + ".zip"))
    
    import zipfile
    with zipfile.ZipFile(os.path.join(path, dataset_name + ".zip"), 'r') as archive:
        archive.extractall()

    # move validation images to subfolders by class
    val_root = os.path.join(path, dataset_name, "val")
    with open(os.path.join(val_root, "val_annotations.txt"), 'r') as f:
        for image_filename, class_name, _, _, _, _ in map(str.split, f):
            class_path = os.path.join(val_root, class_name)
            os.makedirs(class_path, exist_ok=True)
            os.rename(
                os.path.join(val_root, "images", image_filename),
                os.path.join(class_path, image_filename))

    os.rmdir(os.path.join(val_root, "images"))
    os.remove(os.path.join(val_root, "val_annotations.txt"))
    
download(".")

Dataset not exists or is broken, downloading it


Training and validation images are now in `tiny-imagenet-200/train` and `tiny-imagenet-200/val`.

## 3. Определим и применим трансформации (аугментацию) к данным

In [0]:
import torch
import torchvision
from torchvision import transforms
means = np.array((0.4914, 0.4822, 0.4465))
stds = np.array((0.2023, 0.1994, 0.2010))

transform_train_val = transforms.Compose([
    transforms.RandomRotation(degrees = 30),
    transforms.RandomHorizontalFlip(p = 0.5),
    transforms.CenterCrop(size = 64),  # image 64x64
    transforms.ToTensor(),  # Just to get tensors in the end of transforms
    transforms.Normalize(means, stds)
])


# Don't rotate and crop test dataset
transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(means, stds)
])

# test_dataset = <YOUR CODE>

In [0]:
train_dataset = torchvision.datasets.ImageFolder(
    "tiny-imagenet-200/train",
    transform = transform_train_val)

val_dataset = torchvision.datasets.ImageFolder(
     "tiny-imagenet-200/val",
      transform = transform_train_val) #  transform=torchvision.transforms.ToTensor()

## 4. Сгенерим batches

In [0]:
batch_size = 64
train_batch_gen = torch.utils.data.DataLoader(train_dataset, 
                                              batch_size=batch_size,
                                              shuffle=True,
                                              num_workers=2)

batch_size = 64
val_batch_gen = torch.utils.data.DataLoader(val_dataset, 
                                              batch_size=batch_size,
                                              shuffle=True,
                                              num_workers=2)

In [0]:
for (X_batch, y_batch) in train_batch_gen:
  print(X_batch.shape)
  break

torch.Size([64, 3, 64, 64])


In [0]:
import glob
import scipy as sp
import scipy.misc
im = sp.misc.imread("tiny-imagenet-200/val/n03814639/val_4697.JPEG")
im.shape

`imread` is deprecated in SciPy 1.0.0, and will be removed in 1.2.0.
Use ``imageio.imread`` instead.
  after removing the cwd from sys.path.


(64, 64, 3)

## 5. Определим архитектуру нейронки и обучим ее

In [0]:
import torch, torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
from torch.optim import Adam
import numpy as np
import time
from  tqdm import tqdm_notebook

# a special module that converts [batch, channel, w, h] to [batch, units]
class Flatten(nn.Module):
    def forward(self, input):
        return input.view(input.size(0), -1)
      



In [0]:
# Set random seed for reproducibility !!!
np.random.seed(42)
torch.manual_seed(42)
torch.backends.cudnn.deterministic = True  # for GPU
torch.backends.cudnn.benchmark = False  # for GPU


model = nn.Sequential()

#decribe convnet here
model.add_module('conv1', nn.Conv2d(in_channels=3, out_channels=128, kernel_size=5, stride = 2)) # stride = 2
model.add_module('batchnorm1', nn.BatchNorm2d(num_features = 128))  # number of input channels
model.add_module('relu1', nn.ReLU())
model.add_module('pool1', nn.AvgPool2d(kernel_size = 3, stride = 1)) # max pooling 3x3
model.add_module('conv2', nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3, stride = 2))  # stride = 2
model.add_module('batchnorm2', nn.BatchNorm2d(num_features = 256))  # number of input channels
model.add_module('relu2', nn.ReLU())
model.add_module('pool2', nn.AvgPool2d(kernel_size = 2, stride = 1)) # max pooling 2x2
model.add_module('conv3', nn.Conv2d(in_channels=256, out_channels=1024, kernel_size=3, stride = 2))  # stride = 2
model.add_module('batchnorm3', nn.BatchNorm2d(num_features = 1024))  # number of input channels
model.add_module('relu3', nn.ReLU())
model.add_module('pool3', nn.AvgPool2d(kernel_size = 2, stride = 1)) # max pooling 2x2
model.add_module('flatten', Flatten())
model.add_module('dense4', nn.Linear(16384, 1024)) # Compute number of input neurons  ## 36864    
model.add_module('batchnorm4', nn.BatchNorm1d(num_features = 1024))  # number of input channels  
model.add_module('relu4', nn.LeakyReLU(0.05))
model.add_module('dropout4', nn.Dropout(0.30))
model.add_module('dense5', nn.Linear(1024, 200)) # logits for 200 classes   ##512

model = model.cuda()    # if wanna run on gpu


def compute_loss(X_batch, y_batch):
    X_batch = Variable(torch.FloatTensor(X_batch)).cuda()  # Variable(torch.FloatTensor(X_batch)).cuda()
    y_batch = Variable(torch.LongTensor(y_batch)).cuda() # Variable(torch.LongTensor(y_batch)).cuda()
    logits = model.cuda()(X_batch)  # model.cuda()(X_batch)
    return F.cross_entropy(logits, y_batch).mean()
  
  

opt = Adam(model.parameters(),
           lr = 1e-3,  #learning rate
           weight_decay = 1e-4) # L2-regularization

train_loss = []
val_accuracy = []

num_epochs = 50   ## previously 20 # total amount of full passes over training data
max_val_accuracy = 0 

for epoch in tqdm_notebook(range(num_epochs)):
    start_time = time.time()
    model.train(True) # enable dropout / batch_norm training behavior
    for (X_batch, y_batch) in train_batch_gen:
        # train on batch
        loss = compute_loss(X_batch, y_batch)
        loss.backward()
        opt.step()
        opt.zero_grad()
        train_loss.append(loss.cpu().data.numpy())
    
#     model.train(False) # disable dropout / use averages for batch_norm
    model.eval()
    for X_batch, y_batch in val_batch_gen:
        logits = model(Variable(torch.FloatTensor(X_batch)).cuda())  #model(Variable(torch.FloatTensor(X_batch)).cuda())
        y_pred = logits.max(1)[1].data
        val_accuracy.append(np.mean( (y_batch.cpu() == y_pred.cpu()).numpy() ))

    
    # Then we print the results for this epoch:
    print("Epoch {} of {} took {:.3f}s".format(
        epoch + 1, num_epochs, time.time() - start_time))
    print("  training loss (in-iteration): \t{:.6f}".format(
        np.mean(train_loss[-len(train_dataset) // batch_size :])))
    print("  validation accuracy: \t\t\t{:.2f} %".format(
        np.mean(val_accuracy[-len(val_dataset) // batch_size :]) * 100))
    
    if np.mean(val_accuracy[-len(val_dataset) // batch_size :]) * 100 > max_val_accuracy:
      max_val_accuracy =  np.mean(val_accuracy[-len(val_dataset) // batch_size :]) * 100
      

############################################################### 
# Train NN further with lower learning rate (x 100 lower)
###############################################################

Usually it improves metric a bit

num_extra_epochs = 10
opt = Adam(model.parameters(),
           lr = 1e-5,  #learning rate  (MUCH lower)
           weight_decay = 1e-4) # L2-regularization
for epoch in tqdm_notebook(range(num_extra_epochs)):
    start_time = time.time()
    model.train(True) # enable dropout / batch_norm training behavior
    for (X_batch, y_batch) in train_batch_gen:
        # train on batch
        loss = compute_loss(X_batch, y_batch)
        loss.backward()
        opt.step()
        opt.zero_grad()
        train_loss.append(loss.cpu().data.numpy())
    
#     model.train(False) # disable dropout / use averages for batch_norm
    model.eval()
    for X_batch, y_batch in val_batch_gen:
        logits = model(Variable(torch.FloatTensor(X_batch)).cuda())  #model(Variable(torch.FloatTensor(X_batch)).cuda())
        y_pred = logits.max(1)[1].data
        val_accuracy.append(np.mean( (y_batch.cpu() == y_pred.cpu()).numpy() ))

    
    # Then we print the results for this epoch:
    print("Epoch {} of {} took {:.3f}s".format(
        epoch + 1, num_epochs, time.time() - start_time))
    print("  training loss (in-iteration): \t{:.6f}".format(
        np.mean(train_loss[-len(train_dataset) // batch_size :])))
    print("  validation accuracy: \t\t\t{:.2f} %".format(
        np.mean(val_accuracy[-len(val_dataset) // batch_size :]) * 100))
    
    if np.mean(val_accuracy[-len(val_dataset) // batch_size :]) * 100 > max_val_accuracy:
      max_val_accuracy =  np.mean(val_accuracy[-len(val_dataset) // batch_size :]) * 100
      
      
#################################################################################### 
## Train NN EVEN further with lower learning rate (x 10000 lower)
####################################################################################

# Usually it improves metric a bit (much less than first extra training)

num_extra_epochs = 10
opt = Adam(model.parameters(),
           lr = 1e-7,  #learning rate  (MUCH MUCH lower)
           weight_decay = 1e-4) # L2-regularization
for epoch in tqdm_notebook(range(num_extra_epochs)):
    start_time = time.time()
    model.train(True) # enable dropout / batch_norm training behavior
    for (X_batch, y_batch) in train_batch_gen:
        # train on batch
        loss = compute_loss(X_batch, y_batch)
        loss.backward()
        opt.step()
        opt.zero_grad()
        train_loss.append(loss.cpu().data.numpy())
    
#     model.train(False) # disable dropout / use averages for batch_norm
    model.eval()
    for X_batch, y_batch in val_batch_gen:
        logits = model(Variable(torch.FloatTensor(X_batch)).cuda())  #model(Variable(torch.FloatTensor(X_batch)).cuda())
        y_pred = logits.max(1)[1].data
        val_accuracy.append(np.mean( (y_batch.cpu() == y_pred.cpu()).numpy() ))

    
    # Then we print the results for this epoch:
    print("Epoch {} of {} took {:.3f}s".format(
        epoch + 1, num_epochs, time.time() - start_time))
    print("  training loss (in-iteration): \t{:.6f}".format(
        np.mean(train_loss[-len(train_dataset) // batch_size :])))
    print("  validation accuracy: \t\t\t{:.2f} %".format(
        np.mean(val_accuracy[-len(val_dataset) // batch_size :]) * 100))
    
    if np.mean(val_accuracy[-len(val_dataset) // batch_size :]) * 100 > max_val_accuracy:
      max_val_accuracy =  np.mean(val_accuracy[-len(val_dataset) // batch_size :]) * 100

When everything is done, please compute accuracy on the validation set and report it below.

In [0]:
val_accuracy_ = max_val_accuracy   # we can do this as it was said in telegram chat
print("Validation accuracy: %.2f%%" % (val_accuracy_))

Validation accuracy: 42.21%


## 6. Сохраним модель на Google Drive

In [0]:
print("Our model: \n\n", model, '\n')

Our model: 

 Sequential(
  (conv1): Conv2d(3, 128, kernel_size=(5, 5), stride=(2, 2))
  (batchnorm1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu1): ReLU()
  (pool1): AvgPool2d(kernel_size=3, stride=1, padding=0)
  (conv2): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2))
  (batchnorm2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu2): ReLU()
  (pool2): AvgPool2d(kernel_size=2, stride=1, padding=0)
  (conv3): Conv2d(256, 1024, kernel_size=(3, 3), stride=(2, 2))
  (batchnorm3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu3): ReLU()
  (pool3): AvgPool2d(kernel_size=2, stride=1, padding=0)
  (flatten): Flatten()
  (dense4): Linear(in_features=16384, out_features=1024, bias=True)
  (batchnorm4): BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu4): LeakyReLU(negative_slope=0.05)
  (dropout4): Dropout(p=0.3)
  (dense5

In [0]:
from google.colab import drive
drive.mount('/content/gdrive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/gdrive


In [0]:
!ls /content/gdrive/

'My Drive'


In [0]:
model_save_name = '5_model_batchnorm_augment_bigger__add_training_x2.pth'
path = F"/content/gdrive/My Drive/{model_save_name}"
torch.save(model.state_dict(), path)

# or just 'torch.save(model.state_dict(), 'model_name.pth')' if wanna save in colab

In [0]:
# Загрузить модель
model = model.load_state_dict('2_model_batchnorm.pth')
print(model)

# Report

Below, please mention

* a brief history of tweaks and improvements;
* what is the final architecture and why?
* what is the training method (batch size, optimization algorithm, ...) and why?
* Any regularization and other techniques applied and their effects;

The reference format is:

*"I have analyzed these and these articles|sources|blog posts, tried that and that to adapt them to my problem and the conclusions are such and such".*


1. **Baseline**:   
  - 5x5 Conv2D + Relu + MaxPool  -> 3x3 Conv2d + Relu + MaxPool -> Dense + Relu + DropOut(0.3) -> Dense     
   - n_channels inscrease through conv layers: 3 -> 32 -> 64
   - batch_siza = 64
   - n_epoch = 20
   - SGD optimizer
   - CPU

  Took ~ 15-16 epochs to converge to ~ 23% validation accuracy. NOT a smooth convergence

2. **Changed MaxPool to AvgPool**.   
   - Almost no change

3. **Changed last Relu to LeakyReLU(0.05)**.   
  - Almost nothing changed

4. **Inserted BatchNorm2d, and 1d before every ReLU/LeakyReLU**.   
  - Convergence in ~7-8 epochs but to LOWER validation accuracy (~ 20%).
  - However, training entropy is lower, so, probably, I overfit

5. **Changed optimazer from SGD to Adam**.   
  - Faster convergence (~5-6 epochs), much higher val accuracy ~ 28%. 
  - However, it overfits after 6-7 epoch and accuracy drop to ~ 24% val accuracy
 
 *How to store model parameters state to perform early stopping?*   
 *How to do early stopping?*    
 *Add data augmentation to avoid overfitting*  
 *How to insert TensorBoard to monitor progress?*

6. **Changed CPU to GPU**.   
Magnificent! 1 epoch is fitted in ~ 45 sec, not 10 minutes  

7. **Added L2-regularization**.   
Weight_decay = 0.05. Probably, too big, as loss even increases  

8. **Changed weight decay to 1e-8**.  
Don't see any difference from situation with weight_decay = 0    
9. **Changed weight decay to 1e-4**.   
See https://www.fast.ai/2018/07/02/adam-weight-decay/ . A bit better convergence  

10. **Added augmentation**.   
Tha same logic as in seminars. Much better convergence, oscilations are very small compared to previous results. Validaton accuracy ~ 31%

11. **Higher dimentionality of hidden layers**  
Training is a bit slower, val accuracy reached 33.5%

12. **One more convolution layer**  
Val accuracy ~ 37,5%  
 
13. **Continue training for 10 more epochs with x100 lower learning rate**  
 Val accurachy reached 41%  
 
14. **Continue ONE MORE TIME training for 10 more epochs with x10000 lower learning rate**   
 Val accurachy reached 42%
