# Homework 2, *part 2* (60 points)

In this assignment you will build a convolutional neural net (CNN) to solve Tiny ImageNet image classification. Try to achieve as high accuracy as possible.

## Deliverables

* This file,
* a "checkpoint file" from `torch.save(model.state_dict(), ...)` that contains model's weights (which a TA should be able to load to verify your accuracy).

## Grading

* 9 points for reproducible training code and a filled report below.
* 12 points for building a network that gets above 20% accuracy.
* 6.5 points for beating each of these milestones on the private **test** set:
  * 25.0%
  * 30.0%
  * 32.5%
  * 35.0%
  * 37.5%
  * 40.0%
  
*Private test set* means that you won't be able to evaluate your model on it. Rather, after you submit code and checkpoint, we will load your model and evaluate it on that test set ourselves (so please make sure it's easy for TAs to do!), reporting your accuracy in a comment to the grade.
    
## Restrictions

* Don't use pretrained networks.

## Tips

* One change at a time: never test several new things at once.
* Google a lot.
* Use GPU.
* Use regularization: L2, batch normalization, dropout, data augmentation.
* Use Tensorboard ([non-Colab](https://github.com/lanpa/tensorboardX) or [Colab](https://medium.com/@tommytao_54597/use-tensorboard-in-google-colab-16b4bb9812a6)) or a similar interactive tool for viewing progress.

In [0]:
from tqdm import tqdm
from IPython.display import clear_output

In [0]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [109]:
ls

adc.json    model35.pt  [0m[01;34m__pycache__[0m/  [01;34mtiny-imagenet-200[0m/     tiny_imagenet.py
model31.pt  model36.pt  [01;34msample_data[0m/  tiny-imagenet-200.zip


In [0]:
from google.colab import files
files.upload()

Saving tiny_imagenet.py to tiny_imagenet.py


{'tiny_imagenet.py': b'import os\nfrom urllib.request import urlretrieve\n\ndef download(path, url=\'http://cs231n.stanford.edu/tiny-imagenet-200.zip\'):\n    dataset_name = \'tiny-imagenet-200\'\n\n    if os.path.exists(os.path.join(path, dataset_name, "val", "n01443537")):\n        print("%s already exists, not downloading" % os.path.join(path, dataset_name))\n        return\n    else:\n        print("Dataset not exists or is broken, downloading it")\n    urlretrieve(url, os.path.join(path, dataset_name + ".zip"))\n    \n    import zipfile\n    with zipfile.ZipFile(os.path.join(path, dataset_name + ".zip"), \'r\') as archive:\n        archive.extractall()\n\n    # move validation images to subfolders by class\n    val_root = os.path.join(path, dataset_name, "val")\n    with open(os.path.join(val_root, "val_annotations.txt"), \'r\') as f:\n        for image_filename, class_name, _, _, _, _ in map(str.split, f):\n            class_path = os.path.join(val_root, class_name)\n            

In [0]:
import tiny_imagenet
tiny_imagenet.download(".")

Dataset not exists or is broken, downloading it


In [0]:
ls

adc.json      [0m[01;34msample_data[0m/        tiny-imagenet-200.zip
[01;34m__pycache__[0m/  [01;34mtiny-imagenet-200[0m/  tiny_imagenet.py


Training and validation images are now in `tiny-imagenet-200/train` and `tiny-imagenet-200/val`.

In [0]:
import torchvision
import torch
from torchvision import transforms
import torchvision.models as models
import time

In [0]:
means = np.array((0.4914, 0.4822, 0.4465))
stds = np.array((0.2023, 0.1994, 0.2010))

transform_augment_train = transforms.Compose([
   # decribe transformation here
   transforms.RandomRotation((-30,30)),
   transforms.RandomHorizontalFlip(),
   transforms.RandomCrop(50),
   transforms.ToTensor(),
   transforms.Normalize(means, stds),
])

# transform_test = transforms.Compose([
#     transforms.ToTensor(),
#     transforms.Normalize(means, stds),
# ])

In [0]:
?transforms.RandomCrop

In [0]:
dataset = torchvision.datasets.ImageFolder('tiny-imagenet-200/train', transform_augment_train)
# test_dataset = torchvision.datasets.ImageFolder('tiny-imagenet-200/val', transform=transforms.ToTensor())

train_dataset, val_dataset = torch.utils.data.random_split(dataset, [90000, 10000])
# test_dataset, val_dataset = torch.utils.data.random_split(val_dataset, [10000, 10000])



In [0]:
batch_size = 150

train_batch_gen = torch.utils.data.DataLoader(train_dataset, 
                                              batch_size=batch_size,
                                              shuffle=True,
                                              num_workers=2)

val_batch_gen = torch.utils.data.DataLoader(val_dataset, 
                                              batch_size=batch_size,
                                              shuffle=True,
                                              num_workers=2)

# Building a network

In [0]:
import torch, torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable

# a special module that converts [batch, channel, w, h] to [batch, units]
class Flatten(nn.Module):
    def forward(self, input):
        return input.view(input.size(0), -1)

In [0]:
def compute_loss(X_batch, y_batch):
    X_batch = Variable(torch.FloatTensor(X_batch)).cuda()
    y_batch = Variable(torch.LongTensor(y_batch)).cuda()
    logits = model(X_batch)
    return F.cross_entropy(logits, y_batch).mean()

In [0]:
# Inception v3 inspired

class BasicConv2d(nn.Module):

    def __init__(self, in_planes, out_planes, kernel_size, stride=1, padding=0):
        super(BasicConv2d, self).__init__()
        self.layer = nn.Sequential()
        self.layer.add_module("conv", nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=padding, bias=False))
        self.layer.add_module("bn", nn.BatchNorm2d(out_planes,
                                                   eps=0.001, # value found in tensorflow
                                                   momentum=0.1, # default pytorch value
                                                   affine=True))
        self.layer.add_module("relu", nn.ReLU(inplace=False))

    def forward(self, x):
        return self.layer(x)
      
      
      

class MyConvNet (nn.Module):
    def __init__(self):
        super(MyConvNet, self).__init__()
        self.convolutions = nn.Sequential()
        # N x 3 x 64 x 64
        self.convolutions.add_module("conv0", BasicConv2d(3, 32, kernel_size=3, stride=2))
        # N x 32 x 31 x 31
        self.convolutions.add_module("conv1", BasicConv2d(32, 32, kernel_size=3))
        # N x 32 x 29 x 29
        self.convolutions.add_module("conv2", BasicConv2d(32, 64, kernel_size=3, padding=1))
        # N x 64 x 29 x 29
        self.convolutions.add_module("drop6", nn.Dropout())
        self.convolutions.add_module("conv3", BasicConv2d(64, 80, kernel_size=1, stride=2))
        # N x 80 x 29 x 29
        self.convolutions.add_module("conv4", BasicConv2d(80, 192, kernel_size=3))
        # N x 192 x 27 x 27

        self.convolutions.add_module("pool", nn.MaxPool2d(2))
        
        self.convolutions.add_module("conv5", BasicConv2d(192, 400, kernel_size=3, padding=1))
        self.convolutions.add_module("conv6", BasicConv2d(400, 150, kernel_size=3, padding=1))
        self.convolutions.add_module("conv7", BasicConv2d(150, 64, kernel_size=3, padding=1))
        self.convolutions.add_module("drop9", nn.Dropout())
        
        
        
        self.classifier = nn.Sequential()
        self.classifier.add_module("flatten", Flatten())
        self.classifier.add_module("lin0", nn.Linear(1024, 200))
#         self.classifier.add_module("lin0_bn", nn.BatchNorm1d(1000))
#         self.classifier.add_module("lin0_relu", nn.ReLU())
#         self.classifier.add_module("lin1", nn.Linear(1000, 200))
        
        
#         self.convolutions.add_module("conv5", InceptionA(192, pool_features=32))
#         # N x 256 x 27 x 27
#         self.convolutions.add_module("conv6", InceptionB(192))
        # N x 736 x 13 x 13
#         self.convolutions.add_module("conv7", InceptionC(736, channels_7x7=64))
#         # N x 768 x 13 x 13
#         self.convolutions.add_module("conv8", InceptionD(672))
        # N x 1280 x 6 x 6
#         self.convolutions.add_module("conv9", InceptionE(1280))
#         # N x 2048 x 6 x 6
       
        
        # adaptive_avg_pool2d
        # N x 2048 x 1 x 1
        
        
        
        
    
    def forward(self, x):
        x = self.convolutions(x)
        x = self.classifier(x)
        
#         x = nn.functional.adaptive_avg_pool2d(x, (1, 1))
#         Flatten(),
#         x = nn.functional.dropout(x, p=0.3)
#         x = x.view(x.size(0), -1)
# #         # N x 1184
        
#         fc = nn.Linear(1184, 200).cuda()
#         x = fc(x)
# #         # N x 200
        
        return x


      
model = MyConvNet()
model = model.cuda()
# model(torch.zeros(1,3,64,64).cuda()).shape





In [0]:
model = nn.Sequential(
  nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, padding=1),
  nn.BatchNorm2d(32),
  nn.MaxPool2d(2),
  nn.ReLU(),
  nn.Dropout(p=0.3),  

  nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding=1),
  nn.BatchNorm2d(64),
  nn.MaxPool2d(2),
  nn.ReLU(),

  nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, padding=1),
  nn.BatchNorm2d(128),
  #nn.MaxPool2d(2),
  nn.ReLU(),
  nn.Dropout(p=0.3),  
 
  nn.Conv2d(in_channels=128, out_channels=250, kernel_size=3, padding=(0, 4)),
  nn.BatchNorm2d(250),
  nn.ReLU(),

    
  nn.Conv2d(in_channels=250, out_channels=250, kernel_size=3, padding=(4, 0)),
  nn.BatchNorm2d(250),
  nn.ReLU(),
    
#   nn.Conv2d(in_channels=250, out_channels=400, kernel_size=3, padding=1),
#   nn.BatchNorm2d(400),
#   nn.ReLU(),
    
#   nn.Conv2d(in_channels=400, out_channels=400, kernel_size=3, padding=1),
#   nn.BatchNorm2d(400),
#   nn.ReLU(),
#   nn.Dropout(p=0.3), 

#   nn.Conv2d(in_channels=400, out_channels=250, kernel_size=3, padding=1),
#   nn.BatchNorm2d(250),
#   nn.ReLU(),

    
  nn.Conv2d(in_channels=250, out_channels=128, kernel_size=3, padding=1),
  nn.BatchNorm2d(128),
  nn.ReLU(), 
    
  nn.Conv2d(in_channels=128, out_channels=64, kernel_size=3),
  nn.BatchNorm2d(64),
  nn.ReLU(),
  nn.Dropout(p=0.3),  

  Flatten(),
  nn.Linear(12544, 1024),
  nn.BatchNorm1d(1024),
  nn.ReLU(),
  nn.Dropout(p=0.3),
  nn.Linear(1024, 200)
)


model = model.cuda()
# model(torch.zeros(1,3,64,64)).shape

In [0]:
model = models.resnet18(pretrained=False)
model.fc = nn.Linear(512, 200)
model = model.cuda()
# print(model)

In [0]:
# opt = torch.optim.SGD(model.parameters(), lr=0.01)
opt = torch.optim.Adam(model.parameters(), lr=0.001)


# opt = torch.optim.RMSprop(model.parameters())


train_loss = []
val_accuracy = []

In [0]:
opt = torch.optim.Adam(model.parameters(), lr=0.000001)

In [165]:
num_epochs = 1 # total amount of full passes over training data

for epoch in range(num_epochs):
    start_time = time.time()
    model.train(True) # enable dropout / batch_norm training behavior
    for (X_batch, y_batch) in train_batch_gen:
        # train on batch
        loss = compute_loss(X_batch, y_batch)
        loss.backward()
        opt.step()
        opt.zero_grad()
        train_loss.append(loss.cpu().data.numpy())

    model.train(False) # disable dropout / use averages for batch_norm
    for X_batch, y_batch in val_batch_gen:
        logits = model(Variable(torch.FloatTensor(X_batch)).cuda())
        y_pred = logits.max(1)[1].data
        val_accuracy.append(np.mean( (y_batch.cpu() == y_pred.cpu()).numpy() ))


    # Then we print the results for this epoch:
    print("Epoch {} of {} took {:.3f}s".format(
        epoch + 1, num_epochs, time.time() - start_time))
    print("  training loss (in-iteration): \t{:.6f}".format(
        np.mean(train_loss[-len(train_dataset) // batch_size :])))
    print("  validation accuracy: \t\t\t{:.2f} %".format(
        np.mean(val_accuracy[-len(val_dataset) // batch_size :]) * 100))
    
    
#     clear_output(True)
#     plt.figure(figsize=(8, 6))
#     plt.subplot(1, 2, 1)
#     plt.plot(train_loss)
#     plt.title('train_loss')
#     plt.subplot(1, 2, 2)
#     plt.plot(val_accuracy)
#     plt.title('val_acc')
#     plt.grid()
#     plt.show()

Epoch 1 of 1 took 56.164s
  training loss (in-iteration): 	1.120944
  validation accuracy: 			40.57 %


In [0]:
torch.save(model.state_dict(),  'model41.pt')

In [0]:
model_loaded = torch.load('model36.pt', map_location=lambda storage, loc: storage)

In [151]:
ls

adc.json    model35.pt  model40.pt    [0m[01;34msample_data[0m/        tiny-imagenet-200.zip
model31.pt  model36.pt  [01;34m__pycache__[0m/  [01;34mtiny-imagenet-200[0m/  tiny_imagenet.py


In [0]:
from google.colab import files
files.download('model41.pt')

When everything is done, please compute accuracy on the validation set and report it below.

In [158]:
model.train(False) # disable dropout / use averages for batch_norm
val_batch_acc = []
for X_batch, y_batch in val_batch_gen:
    logits = model(Variable(torch.FloatTensor(X_batch)).cuda())
    y_pred = logits.max(1)[1].data
    val_batch_acc.append(np.mean( (y_batch.cpu() == y_pred.cpu()).numpy() ))


val_accuracy_final = np.mean(val_batch_acc)
    
print("Final results:")
print("  val accuracy:\t\t{:.2f} %".format(
    val_accuracy_final * 100))

# if test_accuracy * 100 > 70:
#     print("U'r freakin' amazin'!")
# elif test_accuracy * 100 > 50:
#     print("Achievement unlocked: 110lvl Warlock!")
# elif test_accuracy * 100 > 40:
#     print("Achievement unlocked: 80lvl Warlock!")
# elif test_accuracy * 100 > 30:
#     print("Achievement unlocked: 70lvl Warlock!")
# elif test_accuracy * 100 > 20:
#     print("Achievement unlocked: 60lvl Warlock!")
# else:
#     print("We need more magic! Follow instructons below")

# val_accuracy = # Your code here
# print("Validation accuracy: %.2f%%" % (val_accuracy * 100))

Final results:
  val accuracy:		40.93 %


# Report

Below, please mention

* a brief history of tweaks and improvements;
* what is the final architecture and why?
* what is the training method (batch size, optimization algorithm, ...) and why?
* Any regularization and other techniques applied and their effects;

The reference format is:

*"I have analyzed these and these articles|sources|blog posts, tried that and that to adapt them to my problem and the conclusions are such and such".*

# Regularization

The first thing I did was to take care of reularization.

I applied data augmentation with
transform_augment_train = transforms.Compose([
   transforms.RandomRotation((-30,30)),
   transforms.RandomHorizontalFlip(),
   transforms.RandomCrop(50),
   transforms.ToTensor(),
   transforms.Normalize(means, stds),
])

Before creating the net architecture i kept in mind to insert several Dropouts for the convolutional part and one before every dence layer. I also gona to insert batchnorm after every convolition and after every dence layer (except the last one).

# First simple seld-developed convnet

Right from the beginning I started to build the typical arcitecture: conv -> batchnorm -> nonlinear.
The number of pulling has to provede the propor receptive field.
The most complicated thing here is to figure out the number of channels, number of layers, and other paramaters.
I ddecided to start with the simplest architecture. The batch_size was set to 500 (slightly big for better regularization).
For the Adam optimizer the LR was set to 0.01, which is pretty fast.
_________________________________________________________________________
StartUp architecture:

Very simple, just to make sure it works somehow. Consist of tree convolutions with three  

model = nn.Sequential(
  nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3),
  nn.BatchNorm2d(32),
  nn.MaxPool2d(2),
  nn.ReLU(),

  nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3),
  nn.BatchNorm2d(64),
  nn.MaxPool2d(2),
  nn.ReLU(),
  
  nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3),
  nn.BatchNorm2d(128),
  nn.MaxPool2d(2),
  nn.ReLU(),
  
  nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3),
  nn.BatchNorm2d(256),
  nn.ReLU(),

  Flatten(),
  nn.Dropout(0,2),
  nn.Linear(256, 1024),
  nn.ReLU(),
  nn.Dropout(0,2),
  nn.Linear(1024, 200)
)

Receptive fiels covers almost all image.

Epoch 52 of 100 took 42.022s
  training loss (in-iteration): 	2.922985
  validation accuracy: 			27.90 %

________________________________________________________________________

# More advanced seld-developed convnet

The next step was the obvious - to make the net more fat.

More advanced network:

model = nn.Sequential(
  nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, padding=1),
  nn.BatchNorm2d(32),
  nn.MaxPool2d(2),
  nn.ReLU(),
  nn.Dropout(p=0.3),  

  nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding=1),
  nn.BatchNorm2d(64),
  nn.MaxPool2d(2),
  nn.ReLU(),

  nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, padding=1),
  nn.BatchNorm2d(128),
  #nn.MaxPool2d(2),
  nn.ReLU(),
  nn.Dropout(p=0.3),  
 
  nn.Conv2d(in_channels=128, out_channels=250, kernel_size=3, padding=1),
  nn.BatchNorm2d(250),
  nn.ReLU(),
    
  nn.Conv2d(in_channels=250, out_channels=250, kernel_size=3, padding=1),
  nn.BatchNorm2d(250),
  nn.ReLU(),
  nn.Dropout(p=0.3),  

  nn.Conv2d(in_channels=250, out_channels=128, kernel_size=3, padding=1),
  nn.BatchNorm2d(128),
  nn.ReLU(), 
    
  nn.Conv2d(in_channels=128, out_channels=64, kernel_size=3),
  nn.BatchNorm2d(64),
  nn.ReLU(),
  nn.Dropout(p=0.3),  

  Flatten(),
  nn.Linear(6400, 512),
  nn.BatchNorm1d(512),
  nn.ReLU(),
  nn.Dropout(p=0.3),
  nn.Linear(512, 200)
)


Epoch 58 of 100 took 50.212s
  training loss (in-iteration): 	2.273430
  validation accuracy: 			31.19 %
  
  
Decreasing the batch size down to 150 and Change of LR to default (0.001) for adam increased the val accuracy to 35%:
  
  Epoch 41 of 100 took 51.407s
  training loss (in-iteration): 	2.322155
  validation accuracy: 			35.41 %
  
  
________________________________________________________________________
# Adaptation of Inception v3

I tried to build more advanced network out of Inception v3 blocks:

class BasicConv2d(nn.Module):

    def __init__(self, in_planes, out_planes, kernel_size, stride=1, padding=0):
        super(BasicConv2d, self).__init__()
        self.layer = nn.Sequential()
        self.layer.add_module("conv", nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=padding, bias=False))
        self.layer.add_module("bn", nn.BatchNorm2d(out_planes,
                                                   eps=0.001, # value found in tensorflow
                                                   momentum=0.1, # default pytorch value
                                                   affine=True))
        self.layer.add_module("relu", nn.ReLU(inplace=False))

    def forward(self, x):
        return self.layer(x)
      
class MyConvNet (nn.Module):
    def __init__(self):
        super(MyConvNet, self).__init__()
        self.convolutions = nn.Sequential()
        # N x 3 x 64 x 64
        self.convolutions.add_module("conv0", BasicConv2d(3, 32, kernel_size=3, stride=2))
        # N x 32 x 31 x 31
        self.convolutions.add_module("conv1", BasicConv2d(32, 32, kernel_size=3))
        # N x 32 x 29 x 29
        self.convolutions.add_module("conv2", BasicConv2d(32, 64, kernel_size=3, padding=1))
        # N x 64 x 29 x 29
        self.convolutions.add_module("drop6", nn.Dropout())
        self.convolutions.add_module("conv3", BasicConv2d(64, 80, kernel_size=1, stride=2))
        # N x 80 x 29 x 29
        self.convolutions.add_module("conv4", BasicConv2d(80, 192, kernel_size=3))
        # N x 192 x 27 x 27

         self.convolutions.add_module("conv5", InceptionA(192, pool_features=32))
         # N x 256 x 27 x 27
         self.convolutions.add_module("conv6", InceptionB(192))
        # N x 736 x 13 x 13
         self.convolutions.add_module("conv7", InceptionC(736, channels_7x7=64))
         # N x 768 x 13 x 13
         self.convolutions.add_module("conv8", InceptionD(672))
        # N x 1280 x 6 x 6
         self.convolutions.add_module("conv9", InceptionE(1280))
         # N x 2048 x 6 x 6
       
        
        # adaptive_avg_pool2d
        # N x 2048 x 1 x 1
        
        
        
        
    
    def forward(self, x):
        x = self.convolutions(x)
       
         x = nn.functional.adaptive_avg_pool2d(x, (1, 1))

         x = nn.functional.dropout(x, p=0.3)
         x = x.view(x.size(0), -1)
        # N x 1184
        
         fc = nn.Linear(1184, 200).cuda()
        x = fc(x)
        # N x 200
        
        return x


But I did not succeed with that. It was too hard for me to do it without mistakes - learning rate was too slow..

________________________________________________________________________

# Usage of RESNET

After some straggles with self-builded network, I decided that it should be relly easies to use a well developed architecture. ResNet is a good candidate for reusing. It was not prohibited to use a net with no pretrained parameters. 

model = models.resnet18()
model.fc = nn.Linear(512, 200) # to modify the last layer for our number of classes.

ResNet out of the box was able to give me 35-36% of validation accuracy with default parameters for Adam optimizer.
To get better result, after getting 35% of val accuracy,  I reduced the learning rate down to 0.0001, then 0.00001 (doing small steps to get deeper in gradient descend) to train the net up to 40.93% - my best score.

L2 regularization is not effective in Adam. Source: https://medium.com/vitalify-asia/whats-up-with-deep-learning-optimizers-since-adam-5c1d862b9db0. So I did't use it here.




