# Homework 2.2: The Quest For A Better Network

In this assignment you will build a monster network to solve Tiny ImageNet image classification.

This notebook is intended as a sequel to seminar 3, please give it a try if you haven't done so yet.

(please read it at least diagonally)

* The ultimate quest is to create a network that has as high __accuracy__ as you can push it.
* There is a __mini-report__ at the end that you will have to fill in. We recommend reading it first and filling it while you iterate.
 
## Grading
* starting at zero points
* +20% for describing your iteration path in a report below.
* +20% for building a network that gets above 20% accuracy
* +10% for beating each of these milestones on __TEST__ dataset:
    * 25% (50% points)
    * 30% (60% points)
    * 32.5% (70% points)
    * 35% (80% points)
    * 37.5% (90% points)
    * 40% (full points)
    
## Restrictions
* Please do NOT use pre-trained networks for this assignment until you reach 40%.
 * In other words, base milestones must be beaten without pre-trained nets (and such net must be present in the anytask atttachments). After that, you can use whatever you want.
* you __can't__ do anything with validation data apart from running the evaluation procedure. Please, split train images on train and validation parts

## Tips on what can be done:


 * __Network size__
   * MOAR neurons, 
   * MOAR layers, ([torch.nn docs](http://pytorch.org/docs/master/nn.html))

   * Nonlinearities in the hidden layers
     * tanh, relu, leaky relu, etc
   * Larger networks may take more epochs to train, so don't discard your net just because it could didn't beat the baseline in 5 epochs.

   * Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn!


### The main rule of prototyping: one change at a time
   * By now you probably have several ideas on what to change. By all means, try them out! But there's a catch: __never test several new things at once__.


### Optimization
   * Training for 100 epochs regardless of anything is probably a bad idea.
   * Some networks converge over 5 epochs, others - over 500.
   * Way to go: stop when validation score is 10 iterations past maximum
   * You should certainly use adaptive optimizers
     * rmsprop, nesterov_momentum, adam, adagrad and so on.
     * Converge faster and sometimes reach better optima
     * It might make sense to tweak learning rate/momentum, other learning parameters, batch size and number of epochs
   * __BatchNormalization__ (nn.BatchNorm2d) for the win!
     * Sometimes more batch normalization is better.
   * __Regularize__ to prevent overfitting
     * Add some L2 weight norm to the loss function, PyTorch will do the rest
       * Can be done manually or like [this](https://discuss.pytorch.org/t/simple-l2-regularization/139/2).
     * Dropout (`nn.Dropout`) - to prevent overfitting
       * Don't overdo it. Check if it actually makes your network better
   
### Convolution architectures
   * This task __can__ be solved by a sequence of convolutions and poolings with batch_norm and ReLU seasoning, but you shouldn't necessarily stop there.
   * [Inception family](https://hacktilldawn.com/2016/09/25/inception-modules-explained-and-implemented/), [ResNet family](https://towardsdatascience.com/an-overview-of-resnet-and-its-variants-5281e2f56035?gi=9018057983ca), [Densely-connected convolutions (exotic)](https://arxiv.org/abs/1608.06993), [Capsule networks (exotic)](https://arxiv.org/abs/1710.09829)
   * Please do try a few simple architectures before you go for resnet-152.
   * Warning! Training convolutional networks can take long without GPU. That's okay.
     * If you are CPU-only, we still recomment that you try a simple convolutional architecture
     * a perfect option is if you can set it up to run at nighttime and check it up at the morning.
     * Make reasonable layer size estimates. A 128-neuron first convolution is likely an overkill.
     * __To reduce computation__ time by a factor in exchange for some accuracy drop, try using __stride__ parameter. A stride=2 convolution should take roughly 1/4 of the default (stride=1) one.
 
   
### Data augmemntation
   * getting 5x as large dataset for free is a great 
     * Zoom-in+slice = move
     * Rotate+zoom(to remove black stripes)
     * Add Noize (gaussian or bernoulli)
   * Simple way to do that (if you have PIL/Image): 
     * ```from scipy.misc import imrotate,imresize```
     * and a few slicing
     * Other cool libraries: cv2, skimake, PIL/Pillow
   * A more advanced way is to use torchvision transforms:
    ```
    transform_train = transforms.Compose([
        transforms.RandomCrop(32, padding=4),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
    ])
    trainset = torchvision.datasets.ImageFolder(root=path_to_tiny_imagenet, train=True, download=True, transform=transform_train)
    trainloader = torch.utils.data.DataLoader(trainset, batch_size=128, shuffle=True, num_workers=2)

    ```
   * Or use this tool from Keras (requires theano/tensorflow): [tutorial](https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html), [docs](https://keras.io/preprocessing/image/)
   * Stay realistic. There's usually no point in flipping dogs upside down as that is not the way you usually see them.
   


In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline


In [3]:
# from tiny_img import download_tinyImg200
# data_path = '.'
# download_tinyImg200(data_path)


./tiny-imagenet-200.zip


In [2]:
import torchvision
from torchvision import transforms
import torch

means = np.array([0.485, 0.456, 0.406])
stds = np.array([0.229, 0.224, 0.225])

transforms = {
    'train': transforms.Compose([
        transforms.RandomRotation((-30,30)),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize(means, stds)
    ]),
    'val': transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize(means, stds)
    ]),
    'test': transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize(means, stds)
    ])
}


In [3]:
dataset = torchvision.datasets.ImageFolder('tiny-imagenet-200/train', transform=transforms['train'])
train_dataset, val_dataset = torch.utils.data.random_split(dataset, [80000, 20000])


In [5]:
batch_size = 128
train_batch_gen = torch.utils.data.DataLoader(train_dataset, 
                                              batch_size=batch_size,
                                              shuffle=True,
                                              num_workers=4)
val_batch_gen = torch.utils.data.DataLoader(val_dataset, 
                                            batch_size=batch_size,
                                            shuffle=True,
                                            num_workers=2)


In [31]:
test_dataset = torchvision.datasets.ImageFolder('tiny-imagenet-200/new_val/', transform=transforms['val'])
test_batch_gen = torch.utils.data.DataLoader(test_dataset, 
                                             batch_size=batch_size,
                                             shuffle=False,
                                             num_workers=2)

In [6]:
import torch, torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable

# a special module that converts [batch, channel, w, h] to [batch, units]
class Flatten(nn.Module):
    def forward(self, input):
        return input.view(input.size(0), -1)

    

In [44]:
import torch 
torch.initial_seed()

15477778889165552190

In [7]:
cfg = [64, 'M', 128, 'M', 256, 256, 'M'] 

in_channels = 3
layers = []
for v in cfg:
    if v == 'M':
        layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
    else:
        conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1)
        layers += [conv2d, nn.BatchNorm2d(v), nn.ReLU(inplace=True)]
        in_channels = v
layers += [
    Flatten(),
    nn.Linear(16384, 1024),
    nn.ReLU(True),
    nn.Dropout(),
    nn.Linear(1024, 512),
    nn.ReLU(True),
    nn.Dropout(),
    nn.Linear(512, 200),
]
model = nn.Sequential(*layers)

In [22]:
opt = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=0.0001)
# opt = torch.optim.SGD(model.parameters(), lr=0.01)
loss_fn = nn.CrossEntropyLoss()

train_loss = []
val_accuracy = []

In [30]:
import time
num_epochs = 1000 # total amount of full passes over training data


for epoch in range(num_epochs):
    # In each epoch, we do a full pass over the training data:
    start_time = time.time()
    model.train(True) # enable dropout / batch_norm training behavior
    for (X_batch, y_batch) in train_batch_gen:
        # train on batch
        X_batch = Variable(torch.FloatTensor(X_batch)).cuda()
        y_batch = Variable(torch.LongTensor(y_batch)).cuda()
        logits = model.cuda()(X_batch)
        loss = loss_fn(logits, y_batch)
        
        loss.backward()
        opt.step()
        opt.zero_grad()

        train_loss.append(loss.data.cpu().numpy())
    
    model.train(False) # disable dropout / use averages for batch_norm
    for X_batch, y_batch in val_batch_gen:
        logits = model(Variable(torch.FloatTensor(X_batch)).cuda())
        y_pred = logits.max(1)[1].data
        val_accuracy.append(np.mean( (y_batch.cpu() == y_pred.cpu()).numpy() ))

    # Then we print the results for this epoch:
    print("Epoch {} of {} took {:.3f}s".format(
        epoch + 1, num_epochs, time.time() - start_time))
    print("  training loss (in-iteration): \t{:.6f}".format(
        np.mean(train_loss[-len(train_dataset) // batch_size :])))
    print("  validation accuracy: \t\t\t{:.2f} %".format(
        np.mean(val_accuracy[-len(val_dataset) // batch_size :]) * 100))
        

Epoch 1 of 1000 took 45.841s
  training loss (in-iteration): 	3.700233
  validation accuracy: 			21.98 %
Epoch 2 of 1000 took 46.259s
  training loss (in-iteration): 	3.612604
  validation accuracy: 			23.06 %
Epoch 3 of 1000 took 47.330s
  training loss (in-iteration): 	3.539748
  validation accuracy: 			21.51 %
Epoch 4 of 1000 took 47.739s
  training loss (in-iteration): 	3.467154
  validation accuracy: 			23.07 %
Epoch 5 of 1000 took 47.959s
  training loss (in-iteration): 	3.395841
  validation accuracy: 			26.11 %
Epoch 6 of 1000 took 47.896s
  training loss (in-iteration): 	3.336965
  validation accuracy: 			27.47 %
Epoch 7 of 1000 took 47.949s
  training loss (in-iteration): 	3.264905
  validation accuracy: 			29.08 %
Epoch 8 of 1000 took 47.940s
  training loss (in-iteration): 	3.215509
  validation accuracy: 			28.55 %
Epoch 9 of 1000 took 47.901s
  training loss (in-iteration): 	3.159193
  validation accuracy: 			29.05 %
Epoch 10 of 1000 took 47.991s
  training loss (in-itera

KeyboardInterrupt: 

In [24]:
# fig, axes = plt.subplots(4,4)
# for i in range(4):
#     for j in range(4):
#         axes[i, j].imshow(X_batch[4*i+j].cpu().numpy().transpose(1,2,0))

In [26]:
y_train = np.zeros((200))

for (X_batch, y_batch) in train_batch_gen:
    y_train = y_train + np.sum(np.eye(200)[y_batch.numpy()], axis=0)
y_train

array([ 394.,  402.,  403.,  399.,  412.,  402.,  400.,  397.,  404.,
        401.,  394.,  395.,  386.,  404.,  400.,  402.,  406.,  409.,
        407.,  403.,  399.,  410.,  407.,  395.,  397.,  396.,  397.,
        405.,  400.,  411.,  400.,  393.,  394.,  393.,  383.,  404.,
        392.,  395.,  407.,  391.,  397.,  396.,  410.,  421.,  419.,
        394.,  397.,  403.,  417.,  393.,  401.,  389.,  405.,  393.,
        399.,  395.,  405.,  417.,  396.,  384.,  392.,  418.,  408.,
        405.,  404.,  404.,  396.,  404.,  404.,  407.,  407.,  400.,
        383.,  410.,  406.,  391.,  408.,  409.,  383.,  400.,  399.,
        399.,  402.,  417.,  410.,  398.,  417.,  392.,  405.,  394.,
        410.,  389.,  402.,  408.,  406.,  393.,  402.,  394.,  396.,
        406.,  415.,  420.,  406.,  406.,  395.,  401.,  407.,  409.,
        401.,  386.,  414.,  402.,  399.,  409.,  393.,  388.,  412.,
        385.,  412.,  392.,  396.,  404.,  381.,  411.,  409.,  390.,
        393.,  397.,

In [32]:
y_test = np.zeros((200))

for (X_batch, y_batch) in test_batch_gen:
    y_test = y_test + np.sum(np.eye(200)[y_batch.numpy()], axis=0)
y_test

array([ 50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,
        50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,
        50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,
        50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,
        50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,
        50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,
        50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,
        50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,
        50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,
        50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,
        50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,
        50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,
        50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,  50.,
        50.,  50.,  50.,  50.,  50.,  50.,  50.,  5

In [33]:
# fig, axes = plt.subplots(4,4)
# for i in range(4):
#     for j in range(4):
#         axes[i, j].imshow(X_batch[4*i+j].cpu().numpy().transpose(1,2,0))

When everything is done, please calculate accuracy on `tiny-imagenet-200/val`

In [37]:
model.train(False) # disable dropout / use averages for batch_norm
train_acc = []

for X_batch, y_batch in train_batch_gen:
    logits = model(Variable(torch.FloatTensor(X_batch)).cuda())
    y_pred = logits.max(1)[1].data
    train_acc += list((y_batch.cpu() == y_pred.cpu()).numpy())

train_accuracy = np.mean(train_acc)


In [38]:
train_accuracy

0.53382499999999999

In [34]:
model.train(False) # disable dropout / use averages for batch_norm
test_acc = []

for X_batch, y_batch in test_batch_gen:
    logits = model(Variable(torch.FloatTensor(X_batch)).cuda())
    y_pred = logits.max(1)[1].data
    test_acc += list((y_batch.cpu() == y_pred.cpu()).numpy())




In [35]:
test_accuracy = np.mean(test_acc)

In [36]:
print("Final results:")
print("  test accuracy:\t\t{:.2f} %".format(
    test_accuracy * 100))

if test_accuracy * 100 > 70:
    print("U'r freakin' amazin'!")
elif test_accuracy * 100 > 50:
    print("Achievement unlocked: 110lvl Warlock!")
elif test_accuracy * 100 > 40:
    print("Achievement unlocked: 80lvl Warlock!")
elif test_accuracy * 100 > 30:
    print("Achievement unlocked: 70lvl Warlock!")
elif test_accuracy * 100 > 20:
    print("Achievement unlocked: 60lvl Warlock!")
else:
    print("We need more magic! Follow instructons below")

Final results:
  test accuracy:		41.71 %
Achievement unlocked: 80lvl Warlock!


In [39]:
! mkdir model_checkpoints

In [41]:
! ls

README.md		 homework_part1.ipynb  tiny-imagenet-200
Untitled.ipynb		 homework_part2.ipynb  tiny-imagenet-200.zip
__pycache__		 model_checkpoints     tiny_img.py
homework_advanced.ipynb  notmnist.py	       tiny_img.pyc


In [43]:
torch.save(model.state_dict(), 'model_state_dict_41.71.pcl')


# Отчет

Меня зовут Руденко Ирина и в рамках своих рабочих обязанностей я занимаюсь приложениями нейросетевой сегментации на keras, а это мой первый опыт с pytorch

Вот моя иcтория:
* я написала стандартную блочную архитектуру, (conv+batchnorm+relu) + иногда pooling, после flatten добавила пару скрытых слоёв с дропаутом, в качестве активации также взяла relu, а последней - softmax
* беда! на валидации ~30%, а на тесте ~0.5%
* прорешала семинар, поняла, что softmax с данной функцией потерь - лишний, так что я убрала его и попробовала снова.
* решила, что sgd слишком тупой, сделала Adam - не помогло 0.47 %
* посмотрела косым взглядом на тест, посмотрела на формирование теста в семинаре(там и тест и валидация - куски трейна)
* проверила распределение классов на трейне и тесте, весь тест был одного класса, а вот картинки это не подтверждали, и простая сетка с семинара также давала почти нулевое качество на правильном тесте


In [None]:
import shutil
import os

def ensure_dir(dir):
    if not os.path.exists(dir):
        os.makedirs(dir)
        
with open('tiny-imagenet-200/val/val_annotations.txt') as f:
    for s in f.readlines():
        tokens = s.split()
        ensure_dir('tiny-imagenet-200/new_val/%s/images/'% tokens[1])
        shutil.copy2(os.path.join('tiny-imagenet-200/val/images/', tokens[0]), 
                     os.path.join('tiny-imagenet-200/new_val/%s/images/' % tokens[1], tokens[0]))

$\uparrow$ Вот этот скрипт перекладывает тест правильным образом

После этого я просто запустила исходную сетку без софтмакса и поучила полчасика на gpu

Итог:

|   | acc, %  |
|---|---|
| train  | 53.38  |
| val    | 40.24  |
| test   | 41.71  |


По итогам заняло 3 вечера, причём баг искала 2.5

```

```

```

```

```

```


# Report

All creative approaches are highly welcome, but at the very least it would be great to mention
* the idea;
* brief history of tweaks and improvements;
* what is the final architecture and why?
* what is the training method and, again, why?
* Any regularizations and other techniques applied and their effects;


There is no need to write strict mathematical proofs (unless you want to).
 * "I tried this, this and this, and the second one turned out to be better. And i just didn't like the name of that one" - OK, but can be better
 * "I have analized these and these articles|sources|blog posts, tried that and that to adapt them to my problem and the conclusions are such and such" - the ideal one
 * "I took that code that demo without understanding it, but i'll never confess that and instead i'll make up some pseudoscientific explaination" - __not_ok__