# Homework 2.2: The Quest For A Better Network

In this assignment you will build a monster network to solve CIFAR10 image classification.

This notebook is intended as a sequel to seminar 3, please give it a try if you haven't done so yet.

(please read it at least diagonally)

* The ultimate quest is to create a network that has as high __accuracy__ as you can push it.
* There is a __mini-report__ at the end that you will have to fill in. We recommend reading it first and filling it while you iterate.
 
## Grading
* starting at zero points
* +20% for describing your iteration path in a report below.
* +20% for building a network that gets above 20% accuracy
* +10% for beating each of these milestones on __TEST__ dataset:
    * 50% (50% points)
    * 60% (60% points)
    * 65% (70% points)
    * 70% (80% points)
    * 75% (90% points)
    * 80% (full points)
    
## Restrictions
* Please do NOT use pre-trained networks for this assignment until you reach 80%.
 * In other words, base milestones must be beaten without pre-trained nets (and such net must be present in the e-mail). After that, you can use whatever you want.
* you __can__ use validation data for training, but you __can't'__ do anything with test data apart from running the evaluation procedure.

## Tips on what can be done:


 * __Network size__
   * MOAR neurons, 
   * MOAR layers, ([torch.nn docs](http://pytorch.org/docs/master/nn.html))

   * Nonlinearities in the hidden layers
     * tanh, relu, leaky relu, etc
   * Larger networks may take more epochs to train, so don't discard your net just because it could didn't beat the baseline in 5 epochs.

   * Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn!


### The main rule of prototyping: one change at a time
   * By now you probably have several ideas on what to change. By all means, try them out! But there's a catch: __never test several new things at once__.


### Optimization
   * Training for 100 epochs regardless of anything is probably a bad idea.
   * Some networks converge over 5 epochs, others - over 500.
   * Way to go: stop when validation score is 10 iterations past maximum
   * You should certainly use adaptive optimizers
     * rmsprop, nesterov_momentum, adam, adagrad and so on.
     * Converge faster and sometimes reach better optima
     * It might make sense to tweak learning rate/momentum, other learning parameters, batch size and number of epochs
   * __BatchNormalization__ (nn.BatchNorm2d) for the win!
     * Sometimes more batch normalization is better.
   * __Regularize__ to prevent overfitting
     * Add some L2 weight norm to the loss function, theano will do the rest
       * Can be done manually or like [this](https://discuss.pytorch.org/t/simple-l2-regularization/139/2).
     * Dropout (`nn.Dropout`) - to prevent overfitting
       * Don't overdo it. Check if it actually makes your network better
   
### Convolution architectures
   * This task __can__ be solved by a sequence of convolutions and poolings with batch_norm and ReLU seasoning, but you shouldn't necessarily stop there.
   * [Inception family](https://hacktilldawn.com/2016/09/25/inception-modules-explained-and-implemented/), [ResNet family](https://towardsdatascience.com/an-overview-of-resnet-and-its-variants-5281e2f56035?gi=9018057983ca), [Densely-connected convolutions (exotic)](https://arxiv.org/abs/1608.06993), [Capsule networks (exotic)](https://arxiv.org/abs/1710.09829)
   * Please do try a few simple architectures before you go for resnet-152.
   * Warning! Training convolutional networks can take long without GPU. That's okay.
     * If you are CPU-only, we still recomment that you try a simple convolutional architecture
     * a perfect option is if you can set it up to run at nighttime and check it up at the morning.
     * Make reasonable layer size estimates. A 128-neuron first convolution is likely an overkill.
     * __To reduce computation__ time by a factor in exchange for some accuracy drop, try using __stride__ parameter. A stride=2 convolution should take roughly 1/4 of the default (stride=1) one.
 
   
### Data augmemntation
   * getting 5x as large dataset for free is a great 
     * Zoom-in+slice = move
     * Rotate+zoom(to remove black stripes)
     * Add Noize (gaussian or bernoulli)
   * Simple way to do that (if you have PIL/Image): 
     * ```from scipy.misc import imrotate,imresize```
     * and a few slicing
     * Other cool libraries: cv2, skimake, PIL/Pillow
   * A more advanced way is to use torchvision transforms:
    ```
    transform_train = transforms.Compose([
        transforms.RandomCrop(32, padding=4),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
    ])
    trainset = torchvision.datasets.CIFAR10(root=path_to_cifar_like_in_seminar, train=True, download=True, transform=transform_train)
    trainloader = torch.utils.data.DataLoader(trainset, batch_size=128, shuffle=True, num_workers=2)

    ```
   * Or use this tool from Keras (requires theano/tensorflow): [tutorial](https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html), [docs](https://keras.io/preprocessing/image/)
   * Stay realistic. There's usually no point in flipping dogs upside down as that is not the way you usually see them.
   
```

```

```

```

```

```

```

```


   
There is a template for your solution below that you can opt to use or throw away and write it your way.

In [15]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [16]:
%load_ext jupyternotify

<IPython.core.display.Javascript object>

In [17]:
%%notify
from cifar import load_cifar10
X_train,y_train,X_val,y_val,X_test,y_test = load_cifar10("cifar_data")
class_names = np.array(['airplane','automobile ','bird ','cat ','deer ','dog ','frog ','horse ','ship ','truck'])

print(X_train.shape,y_train.shape)

((40000, 3, 32, 32), (40000,))


<IPython.core.display.Javascript object>

In [18]:
import torchvision

In [19]:
from torchvision import transforms

In [20]:
import torch

In [77]:
123

123

In [78]:
transform_train = transforms.Compose([
   transforms.RandomCrop(32, padding=4),
   transforms.RandomHorizontalFlip(),
   transforms.ToTensor(),
   transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])
trainset = torchvision.datasets.CIFAR10(root='cifar_data', train=True, download=True, transform=transform_train)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=128, shuffle=True, num_workers=2)

Files already downloaded and verified


In [79]:
transform_valid = transforms.Compose([
   transforms.ToTensor(),
   transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])
validset = torchvision.datasets.CIFAR10(root='cifar_data', train=True, download=True, transform=transform_valid)
validloader = torch.utils.data.DataLoader(validset, batch_size=128, shuffle=True, num_workers=2)

Files already downloaded and verified


In [28]:
trainloader

<torch.utils.data.dataloader.DataLoader at 0x7fc1d52f84d0>

In [29]:
validloader

<torch.utils.data.dataloader.DataLoader at 0x7fc1e0920a50>

In [36]:
len(elem[1])

128

In [33]:
len(elem)

2

In [30]:
for elem in trainloader:
    print elem
    break

[
( 0 , 0 ,.,.) = 
 -2.4291 -2.4291 -2.4291  ...   0.6144  0.3042  0.2073
 -2.4291 -2.4291 -2.4291  ...   1.1571  0.9439  0.7501
 -2.4291 -2.4291 -2.4291  ...   1.4091  1.3704  1.3316
           ...             ⋱             ...          
 -2.4291 -2.4291 -2.4291  ...  -0.6457 -0.6457 -0.6844
 -2.4291 -2.4291 -2.4291  ...  -2.4291 -2.4291 -2.4291
 -2.4291 -2.4291 -2.4291  ...  -2.4291 -2.4291 -2.4291

( 0 , 1 ,.,.) = 
 -2.4183 -2.4183 -2.4183  ...   0.0794  0.0794  0.1384
 -2.4183 -2.4183 -2.4183  ...   0.0204  0.0008 -0.0386
 -2.4183 -2.4183 -2.4183  ...   0.0204 -0.0189 -0.0189
           ...             ⋱             ...          
 -2.4183 -2.4183 -2.4183  ...  -0.3532 -0.3729 -0.3926
 -2.4183 -2.4183 -2.4183  ...  -2.4183 -2.4183 -2.4183
 -2.4183 -2.4183 -2.4183  ...  -2.4183 -2.4183 -2.4183

( 0 , 2 ,.,.) = 
 -2.2214 -2.2214 -2.2214  ...   0.1394  0.1589  0.2369
 -2.2214 -2.2214 -2.2214  ...  -0.2509 -0.1533 -0.1143
 -2.2214 -2.2214 -2.2214  ...  -0.3484 -0.3094 -0.2704
          

In [25]:
import torch, torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
class Flatten(nn.Module):
    def forward(self, input):
        return input.view(input.size(0), -1)

In [26]:
from PIL import Image

In [70]:
model = nn.Sequential(
    nn.Conv2d(in_channels=3, out_channels=64, kernel_size=(3,3)),
    nn.Conv2d(in_channels=64, out_channels=16, kernel_size=(3,3)),
)

In [49]:
# model.add_module('fst_conv', nn.Conv2d(in_channels=3, out_channels=20, kernel_size=(3, 3)))

In [116]:
model = nn.Sequential()

In [None]:
rrrr=model(Variable(torch.FloatTensor(X_train[0:3])))
print(rrrr.shape)

In [117]:
model.add_module('fst_conv', nn.Conv2d(in_channels=3, out_channels=32, kernel_size=(3,3)))
model.add_module('fst_mpool', nn.MaxPool2d(kernel_size=(2,2)))

model.add_module('snd_conv', nn.Conv2d(in_channels=32, out_channels=64, kernel_size=(2,2)))
model.add_module('snd_mpool', nn.MaxPool2d(kernel_size=(2,2)))

model.add_module('thrd_conv', nn.Conv2d(in_channels=64, out_channels=128, kernel_size=(2,2)))
model.add_module('thrd_mpool', nn.MaxPool2d(kernel_size=(2,2)))

model.add_module('flatten', Flatten())

model.add_module('fst_ll', nn.Linear(in_features=1152, out_features=600))
model.add_module('fst_nonlin', nn.Tanh())


model.add_module('drop', nn.Dropout(p = 0.1))


model.add_module('snd_ll', nn.Linear(in_features=600, out_features=10))
model.add_module('out', nn.Softmax(0))

In [146]:
model = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=5),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(64, 192, kernel_size=5, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(192, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(384, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            Flatten(),
        )

In [147]:
rrrr=model(Variable(torch.FloatTensor(X_train[0:3])))
print(rrrr.shape)

torch.Size([3, 256])


In [148]:
model.add_module('liniya', nn.Linear(in_features=256, out_features=10))

In [149]:
model.add_module('soft', nn.Softmax(0))

In [150]:
rrrr=model(Variable(torch.FloatTensor(X_train[0:3])))
print(rrrr.shape)

torch.Size([3, 10])


In [41]:
import math

def conv3x3(in_planes, out_planes, stride=1):
    "3x3 convolution with padding"
    return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
                     padding=1, bias=False)


class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(BasicBlock, self).__init__()
        self.conv1 = conv3x3(inplanes, planes, stride)
        self.bn1 = nn.BatchNorm2d(planes)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = conv3x3(planes, planes)
        self.bn2 = nn.BatchNorm2d(planes)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        residual = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        if self.downsample is not None:
            residual = self.downsample(x)

        out += residual
        out = self.relu(out)

        return out


class Bottleneck(nn.Module):
    expansion = 4

    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(Bottleneck, self).__init__()
        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
                               padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(planes * 4)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        residual = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        if self.downsample is not None:
            residual = self.downsample(x)

        out += residual
        out = self.relu(out)

        return out


class ResNet(nn.Module):

    def __init__(self, depth, num_classes=1000):
        super(ResNet, self).__init__()
        # Model type specifies number of layers for CIFAR-10 model
        assert (depth - 2) % 6 == 0, 'depth should be 6n+2'
        n = (depth - 2) // 6

        block = Bottleneck if depth >=44 else BasicBlock

        self.inplanes = 16
        self.conv1 = nn.Conv2d(3, 16, kernel_size=3, padding=1,
                               bias=False)
        self.bn1 = nn.BatchNorm2d(16)
        self.relu = nn.ReLU(inplace=True)
        self.layer1 = self._make_layer(block, 16, n)
        self.layer2 = self._make_layer(block, 32, n, stride=2)
        self.layer3 = self._make_layer(block, 64, n, stride=2)
        self.avgpool = nn.AvgPool2d(8)
        self.fc = nn.Linear(64 * block.expansion, num_classes)

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, math.sqrt(2. / n))
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()

    def _make_layer(self, block, planes, blocks, stride=1):
        downsample = None
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.inplanes, planes * block.expansion,
                          kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(planes * block.expansion),
            )

        layers = []
        layers.append(block(self.inplanes, planes, stride, downsample))
        self.inplanes = planes * block.expansion
        for i in range(1, blocks):
            layers.append(block(self.inplanes, planes))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)    # 32x32

        x = self.layer1(x)  # 32x32
        x = self.layer2(x)  # 16x16
        x = self.layer3(x)  # 8x8

        x = self.avgpool(x)
        x = x.view(x.size(0), -1)
        x = self.fc(x)

        return x


def resnet(**kwargs):
    """
    Constructs a ResNet model.
    """
    return ResNet(**kwargs)

In [42]:
setka = resnet(depth=20, num_classes = 10)

In [71]:
setka2 = resnet(depth=50, num_classes = 10)

In [108]:
setka2.cuda()

ResNet(
  (conv1): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True)
  (relu): ReLU(inplace)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(16, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True)
      (conv2): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True)
      (conv3): Conv2d(16, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
      (relu): ReLU(inplace)
      (downsample): Sequential(
        (0): Conv2d(16, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
      )
    )
    (1): Bottleneck(
      (conv1): Conv2d(64, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1

In [43]:
setka.cuda()

ResNet(
  (conv1): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True)
  (relu): ReLU(inplace)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True)
    )
    (2): BasicBlock(
      (conv1): Conv2d(

In [44]:
model = setka

In [84]:
model2 = setka2

In [76]:
123

123

In [74]:
rrrr=model2(Variable(torch.FloatTensor(X_train[0:3])).cuda())
rrrr.shape

torch.Size([3, 10])

In [75]:
def compute_loss(X_batch, y_batch):
    X_batch = Variable(torch.FloatTensor(X_batch)).cuda()
    y_batch = Variable(torch.LongTensor(y_batch)).cuda()
    logits = model(X_batch)
    return F.cross_entropy(logits, y_batch).mean()

In [81]:
def compute_loss2(X_batch, y_batch):
    X_batch = Variable(torch.FloatTensor(X_batch)).cuda()
    y_batch = Variable(torch.LongTensor(y_batch)).cuda()
    logits = model2(X_batch)
    return F.cross_entropy(logits, y_batch).mean()

__ Training __

In [80]:
def iterate_minibatches(X, y, batchsize):
    indices = np.random.permutation(np.arange(len(X)))
    for start in range(0, len(indices), batchsize):
        ix = indices[start: start + batchsize]
        yield X[ix], y[ix]
        
opt = torch.optim.Adam(model.parameters())

train_loss = []
val_accuracy = []

In [86]:
train_loss2 = []
val_accuracy2 = []

In [87]:
opt2 = torch.optim.Adam(model2.parameters())

In [89]:
%%notify
import time
num_epochs = 100 # total amount of full passes over training data
batch_size = 128  # number of samples processed in one SGD iteration

for epoch in range(num_epochs):
    # In each epoch, we do a full pass over the training data:
    start_time = time.time()
    model2.train(True) # enable dropout / batch_norm training behavior
    for X_batch, y_batch in trainloader:
        # train on batch
        loss = compute_loss2(X_batch, y_batch)
        loss.backward()
        opt2.step()
        opt2.zero_grad()
        train_loss2.append(loss.data.cpu().numpy()[0])
        
    # And a full pass over the validation data:
    model2.train(False) # disable dropout / use averages for batch_norm
    for X_batch, y_batch in validloader:
        logits = model2(Variable(torch.FloatTensor(X_batch)).cuda())
        y_pred = logits.max(1)[1].data.cpu().numpy()
        val_accuracy2.append(np.mean(y_batch.numpy() == y_pred))

    
    # Then we print the results for this epoch:
    print("Epoch {} of {} took {:.3f}s".format(
        epoch + 1, num_epochs, time.time() - start_time))
    print("  training loss (in-iteration): \t{:.6f}".format(
        np.mean(train_loss2[-len(X_train) // batch_size :])))
    print("  validation accuracy: \t\t\t{:.2f} %".format(
        np.mean(val_accuracy2[-len(X_val) // batch_size :]) * 100))

Epoch 1 of 100 took 40.825s
  training loss (in-iteration): 	0.909471
  validation accuracy: 			68.30 %
Epoch 2 of 100 took 40.984s
  training loss (in-iteration): 	0.861573
  validation accuracy: 			71.97 %
Epoch 3 of 100 took 41.105s
  training loss (in-iteration): 	0.800657
  validation accuracy: 			72.51 %
Epoch 4 of 100 took 41.333s
  training loss (in-iteration): 	0.751132
  validation accuracy: 			75.02 %
Epoch 5 of 100 took 41.514s
  training loss (in-iteration): 	0.702403
  validation accuracy: 			76.34 %
Epoch 6 of 100 took 41.425s
  training loss (in-iteration): 	0.662269
  validation accuracy: 			76.66 %
Epoch 7 of 100 took 41.470s
  training loss (in-iteration): 	0.617647
  validation accuracy: 			77.33 %
Epoch 8 of 100 took 41.465s
  training loss (in-iteration): 	0.584012
  validation accuracy: 			78.39 %
Epoch 9 of 100 took 41.558s
  training loss (in-iteration): 	0.557070
  validation accuracy: 			80.83 %
Epoch 10 of 100 took 41.512s
  training loss (in-iteration): 	0.

<IPython.core.display.Javascript object>

In [95]:
del model

In [98]:
del trainloader

In [99]:
del validloader

In [100]:
torch.cuda.empty_cache()

In [107]:
model2.train(False) # disable dropout / use averages for batch_norm
test_batch_acc = []
for X_batch, y_batch in testloader:
    logits = model2(Variable(torch.FloatTensor(X_batch)).cuda())
    y_pred = logits.max(1)[1].data.cpu().numpy()
    test_batch_acc.append(np.mean(y_batch.numpy() == y_pred))

test_accuracy = np.mean(test_batch_acc)
    
print("Final results:")
print("  test accuracy:\t\t{:.2f} %".format(
    test_accuracy * 100))

if test_accuracy * 100 > 95:
    print("Double-check, than consider applying for NIPS'17. SRSly.")
elif test_accuracy * 100 > 90:
    print("U'r freakin' amazin'!")
elif test_accuracy * 100 > 80:
    print("Achievement unlocked: 110lvl Warlock!")
elif test_accuracy * 100 > 70:
    print("Achievement unlocked: 80lvl Warlock!")
elif test_accuracy * 100 > 60:
    print("Achievement unlocked: 70lvl Warlock!")
elif test_accuracy * 100 > 50:
    print("Achievement unlocked: 60lvl Warlock!")
else:
    print("We need more magic! Follow instructons below")

Final results:
  test accuracy:		89.79 %
Achievement unlocked: 110lvl Warlock!


In [66]:
%%notify
import time
num_epochs = 10 # total amount of full passes over training data
batch_size = 512  # number of samples processed in one SGD iteration

for epoch in range(num_epochs):
    # In each epoch, we do a full pass over the training data:
    start_time = time.time()
    model.train(True) # enable dropout / batch_norm training behavior
    for X_batch, y_batch in trainloader:
        # train on batch
        loss = compute_loss(X_batch, y_batch)
        loss.backward()
        opt.step()
        opt.zero_grad()
        train_loss.append(loss.data.cpu().numpy()[0])
        
    # And a full pass over the validation data:
    model.train(False) # disable dropout / use averages for batch_norm
    for X_batch, y_batch in validloader:
        logits = model(Variable(torch.FloatTensor(X_batch)).cuda())
        y_pred = logits.max(1)[1].data.cpu().numpy()
        val_accuracy.append(np.mean(y_batch.numpy() == y_pred))

    
    # Then we print the results for this epoch:
    print("Epoch {} of {} took {:.3f}s".format(
        epoch + 1, num_epochs, time.time() - start_time))
    print("  training loss (in-iteration): \t{:.6f}".format(
        np.mean(train_loss[-len(X_train) // batch_size :])))
    print("  validation accuracy: \t\t\t{:.2f} %".format(
        np.mean(val_accuracy[-len(X_val) // batch_size :]) * 100))

Epoch 1 of 100 took 8.118s
  training loss (in-iteration): 	0.020525
  validation accuracy: 			99.57 %
Epoch 2 of 100 took 7.998s
  training loss (in-iteration): 	0.016070
  validation accuracy: 			99.58 %
Epoch 3 of 100 took 7.930s
  training loss (in-iteration): 	0.014265
  validation accuracy: 			99.55 %
Epoch 4 of 100 took 8.027s
  training loss (in-iteration): 	0.012918
  validation accuracy: 			99.67 %
Epoch 5 of 100 took 8.099s
  training loss (in-iteration): 	0.013578
  validation accuracy: 			99.54 %
Epoch 6 of 100 took 8.093s
  training loss (in-iteration): 	0.014518
  validation accuracy: 			99.62 %
Epoch 7 of 100 took 7.974s
  training loss (in-iteration): 	0.013638
  validation accuracy: 			99.57 %
Epoch 8 of 100 took 8.291s
  training loss (in-iteration): 	0.011807
  validation accuracy: 			99.52 %
Epoch 9 of 100 took 8.060s
  training loss (in-iteration): 	0.013541
  validation accuracy: 			99.56 %
Epoch 10 of 100 took 8.189s
  training loss (in-iteration): 	0.012854
  v

<IPython.core.display.Javascript object>

In [106]:
transform_test = transforms.Compose([
   transforms.ToTensor(),
   transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])
testset = torchvision.datasets.CIFAR10(root='cifar_data', train=False, download=True, transform=transform_test)
testloader = torch.utils.data.DataLoader(testset, batch_size=256, shuffle=True, num_workers=2)

Files already downloaded and verified


In [69]:
123

123

In [70]:
model.train(False) # disable dropout / use averages for batch_norm
test_batch_acc = []
for X_batch, y_batch in testloader:
    logits = model(Variable(torch.FloatTensor(X_batch)).cuda())
    y_pred = logits.max(1)[1].data.cpu().numpy()
    test_batch_acc.append(np.mean(y_batch.numpy() == y_pred))

test_accuracy = np.mean(test_batch_acc)
    
print("Final results:")
print("  test accuracy:\t\t{:.2f} %".format(
    test_accuracy * 100))

if test_accuracy * 100 > 95:
    print("Double-check, than consider applying for NIPS'17. SRSly.")
elif test_accuracy * 100 > 90:
    print("U'r freakin' amazin'!")
elif test_accuracy * 100 > 80:
    print("Achievement unlocked: 110lvl Warlock!")
elif test_accuracy * 100 > 70:
    print("Achievement unlocked: 80lvl Warlock!")
elif test_accuracy * 100 > 60:
    print("Achievement unlocked: 70lvl Warlock!")
elif test_accuracy * 100 > 50:
    print("Achievement unlocked: 60lvl Warlock!")
else:
    print("We need more magic! Follow instructons below")

Final results:
  test accuracy:		88.86 %
Achievement unlocked: 110lvl Warlock!


```

```

```

```

```

```


# Report

All creative approaches are highly welcome, but at the very least it would be great to mention
* the idea;
* brief history of tweaks and improvements;
* what is the final architecture and why?
* what is the training method and, again, why?
* Any regularizations and other techniques applied and their effects;


There is no need to write strict mathematical proofs (unless you want to).
 * "I tried this, this and this, and the second one turned out to be better. And i just didn't like the name of that one" - OK, but can be better
 * "I have analized these and these articles|sources|blog posts, tried that and that to adapt them to my problem and the conclusions are such and such" - the ideal one
 * "I took that code that demo without understanding it, but i'll never confess that and instead i'll make up some pseudoscientific explaination" - __not_ok__

### Hi, my name is `Semyon Fedotov`, and here's my story

В самом начале я решил попробовать простую сетку из свертки+пуллинга  + пара полносвязных слоев с релушками. В итоге обучения получил такой-себе результат (около 30% на тестовой выборке).
----

Далее, решил воспользоваться архитектурой, которую предлагали взять на семинарах(несколько (свертка+пуллинг) , нелинейности) смог добиться примерно 50%, что уже лучше, но недостаточно хорошо.

----

Поискал архитектуры помощней и решил использовать mini-resnet. Пытался на разных глубинах обучть, но результат примерно одинаковый(колебался вокруг 70%). Тут я решил воспользоваться аугментацией данных и качество сразу сильно выросло. Стало около 87%!!!. я попробовал пообучать больше эпох, разные батчи: 128, 256, 512. В итоге на тесте 88.86 при 300 эпохах

-----

Попытался сделать побольше слоев в финальной архитектуре, обучил, а при тестирование кончилось место на гпу(,  почистить что-то не вышло, пришлось размер бача на тесте сделать поменьше 500 -> 256. Качество 89.79 % 