# Deep learning for computer vision


This notebook will teach you to build and train convolutional networks for image recognition. Brace yourselves.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yandexdataschool/Practical_DL/blob/spring20/seminar3/seminar3_pytorch.ipynb)

# Tiny ImageNet dataset
This week, we shall focus on the image recognition problem on Tiny Image Net dataset
* 100k images of shape 3x64x64
* 200 different classes: snakes, spaiders, cats, trucks, grasshopper, gull, etc.


In [0]:
import torchvision
import torch
from torchvision import transforms

In [2]:
!wget --no-check-certificate 'https://docs.google.com/uc?export=download&id=1UksGhGn63aQLAfGrAkGzdx69U6waEHPR' -O tinyim3.png
!wget --no-check-certificate 'https://docs.google.com/uc?export=download&id=19qsD0o7pfAI8UYxgDY18sdRjV0Aantn2' -O tiny_img.py
!wget --no-check-certificate 'https://docs.google.com/uc?export=download&id=12IrLjz8pss4284xsBAJt6CW6yELPH4tL' -O tiniim.png

--2020-04-05 15:26:39--  https://docs.google.com/uc?export=download&id=1UksGhGn63aQLAfGrAkGzdx69U6waEHPR
Resolving docs.google.com (docs.google.com)... 108.177.119.113, 108.177.119.101, 108.177.119.100, ...
Connecting to docs.google.com (docs.google.com)|108.177.119.113|:443... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: https://doc-0k-6s-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/nd71kvcq0ec6br7eso08i0db05duh1vi/1586100375000/01961971800886548445/*/1UksGhGn63aQLAfGrAkGzdx69U6waEHPR?e=download [following]
--2020-04-05 15:26:40--  https://doc-0k-6s-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/nd71kvcq0ec6br7eso08i0db05duh1vi/1586100375000/01961971800886548445/*/1UksGhGn63aQLAfGrAkGzdx69U6waEHPR?e=download
Resolving doc-0k-6s-docs.googleusercontent.com (doc-0k-6s-docs.googleusercontent.com)... 74.125.128.132, 2a00:1450:4013:c02::84
Connecting to doc-0k-6s-docs.googleusercontent.com (doc-0k-

In [3]:
from tiny_img import download_tinyImg200
data_path = '.'
download_tinyImg200(data_path)

./tiny-imagenet-200.zip


In [0]:
dataset = torchvision.datasets.ImageFolder('tiny-imagenet-200/train', transform=transforms.ToTensor())
test_dataset = torchvision.datasets.ImageFolder('tiny-imagenet-200/val', transform=transforms.ToTensor())
train_dataset, val_dataset = torch.utils.data.random_split(dataset, [80000, 20000])
test_dataset, val_dataset = torch.utils.data.random_split(val_dataset, [10000, 10000])

In [0]:
batch_size = 50
train_batch_gen = torch.utils.data.DataLoader(train_dataset, 
                                              batch_size=batch_size,
                                              shuffle=True,
                                              num_workers=1)
val_batch_gen = torch.utils.data.DataLoader(val_dataset, 
                                            batch_size=batch_size,
                                            shuffle=True,
                                            num_workers=1)

## Image examples ##



<tr>
    <td> <img src="https://github.com/yandexdataschool/Practical_DL/blob/sem3spring2019/week03_convnets/tinyim3.png?raw=1" alt="Drawing" style="width:90%"/> </td>
    <td> <img src="https://github.com/yandexdataschool/Practical_DL/blob/sem3spring2019/week03_convnets/tinyim2.png?raw=1" alt="Drawing" style="width:90%"/> </td>
</tr>


<tr>
    <td> <img src="https://github.com/yandexdataschool/Practical_DL/blob/sem3spring2019/week03_convnets/tiniim.png?raw=1" alt="Drawing" style="width:90%"/> </td>
</tr>

# Building a network

Simple neural networks with layers applied on top of one another can be implemented as `torch.nn.Sequential` - just add a list of pre-built modules and let it train.

In [0]:
import torch, torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable

# a special module that converts [batch, channel, w, h] to [batch, units]
# there is probably already implemented layer in pytorch
class Flatten(nn.Module):
    def forward(self, input):
        return input.view(input.size(0), -1)

Let's start with a dense network for our baseline:

In [0]:
model = nn.Sequential()

# reshape from "images" to flat vectors
model.add_module('flatten', Flatten())

# dense "head"
model.add_module('dense1', nn.Linear(3 * 64 * 64, 1064))
model.add_module('dense2', nn.Linear(1064, 512))
model.add_module('dropout0', nn.Dropout(0.05)) 
model.add_module('dense3', nn.Linear(512, 256))
model.add_module('dropout1', nn.Dropout(0.05))
model.add_module('dense4', nn.Linear(256, 64))
model.add_module('dropout2', nn.Dropout(0.05))
model.add_module('dense1_relu', nn.ReLU())
model.add_module('dense2_logits', nn.Linear(64, 200)) # logits for 200 classes

As in our basic tutorial, we train our model with negative log-likelihood aka crossentropy.

In [0]:
def compute_loss(X_batch, y_batch):
    X_batch = Variable(torch.FloatTensor(X_batch)).cuda()
    y_batch = Variable(torch.LongTensor(y_batch)).cuda()
    logits = model.cuda()(X_batch)
    return F.cross_entropy(logits, y_batch).mean()

### Training on minibatches
* We got 100k images, that's way too many for a full-batch SGD. Let's train on minibatches instead
* Below is a function that splits the training sample into minibatches

In [0]:
opt = torch.optim.SGD(model.parameters(), lr=0.01)

train_loss = []
val_accuracy = []

In [11]:
import numpy as np

opt = torch.optim.SGD(model.parameters(), lr=0.01)

train_loss = []
val_accuracy = []

num_epochs = 50 # total amount of full passes over training data

import time

for epoch in range(num_epochs):
    start_time = time.time()
    model.train(True) # enable dropout / batch_norm training behavior
    for (X_batch, y_batch) in train_batch_gen:
        # train on batch
        loss = compute_loss(X_batch, y_batch)
        loss.backward()
        opt.step()
        opt.zero_grad()
        train_loss.append(loss.cpu().data.numpy())
    
    model.train(False) # disable dropout / use averages for batch_norm
    for X_batch, y_batch in val_batch_gen:
        logits = model(Variable(torch.FloatTensor(X_batch)).cuda())
        y_pred = logits.max(1)[1].data
        val_accuracy.append(np.mean( (y_batch.cpu() == y_pred.cpu()).numpy() ))

    
    # Then we print the results for this epoch:
    print("Epoch {} of {} took {:.3f}s".format(
        epoch + 1, num_epochs, time.time() - start_time))
    print("  training loss (in-iteration): \t{:.6f}".format(
        np.mean(train_loss[-len(train_dataset) // batch_size :])))
    print("  validation accuracy: \t\t\t{:.2f} %".format(
        np.mean(val_accuracy[-len(val_dataset) // batch_size :]) * 100))

Epoch 1 of 50 took 43.470s
  training loss (in-iteration): 	5.292254
  validation accuracy: 			0.85 %
Epoch 2 of 50 took 33.296s
  training loss (in-iteration): 	5.217831
  validation accuracy: 			1.80 %
Epoch 3 of 50 took 33.089s
  training loss (in-iteration): 	5.082955
  validation accuracy: 			2.43 %
Epoch 4 of 50 took 33.185s
  training loss (in-iteration): 	4.994013
  validation accuracy: 			3.39 %
Epoch 5 of 50 took 33.569s
  training loss (in-iteration): 	4.896095
  validation accuracy: 			4.56 %


KeyboardInterrupt: ignored

Don't wait for full 100 epochs. You can interrupt training after 5-20 epochs once validation accuracy stops going up.
```
```

### Final test

In [12]:
model.train(False) # disable dropout / use averages for batch_norm
test_batch_acc = []
for X_batch, y_batch in val_batch_gen:
    logits = model(Variable(torch.FloatTensor(X_batch)).cuda())
    y_pred = logits.max(1)[1].data
    test_batch_acc.append(np.mean( (y_batch.cpu() == y_pred.cpu()).numpy() ))


test_accuracy = np.mean(test_batch_acc)
    
print("Final results:")
print("  test accuracy:\t\t{:.2f} %".format(
    test_accuracy * 100))

if test_accuracy * 100 > 70:
    print("U'r freakin' amazin'!")
elif test_accuracy * 100 > 50:
    print("Achievement unlocked: 110lvl Warlock!")
elif test_accuracy * 100 > 40:
    print("Achievement unlocked: 80lvl Warlock!")
elif test_accuracy * 100 > 30:
    print("Achievement unlocked: 70lvl Warlock!")
elif test_accuracy * 100 > 20:
    print("Achievement unlocked: 60lvl Warlock!")
else:
    print("We need more magic! Follow instructons below")

Final results:
  test accuracy:		4.15 %
We need more magic! Follow instructons below


## Task I: small convolution net
### First step

Let's create a mini-convolutional network with roughly such architecture:
* Input layer
* 3x3 convolution with 128 filters and _ReLU_ activation
* 2x2 pooling (or set previous convolution stride to 3)
* Flatten
* Dense layer with 1024 neurons and _ReLU_ activation
* 30% dropout
* Output dense layer.


__Convolutional layers__ in torch are just like all other layers, but with a specific set of parameters:

__`...`__

__`model.add_module('conv1', nn.Conv2d(in_channels=3, out_channels=128, kernel_size=3)) # convolution`__

__`model.add_module('pool1', nn.MaxPool2d(2)) # max pooling 2x2`__

__`...`__


Once you're done (and compute_loss no longer raises errors), train it with __Adam__ optimizer with default params (feel free to modify the code above).

If everything is right, you should get at least __16%__ validation accuracy.

__HACK_OF_THE_DAY__ :the number of channels must be in the order of the number of class_labels

### Before we start:
**Stride, Padding and Kernel_size**

In [13]:
from IPython.display import Image
Image(url='https://deeplearning.net/software/theano/_images/numerical_padding_strides.gif')  

In [0]:
# empiric rule - amount of output channels approximately equals number of classes
model = nn.Sequential()

model.add_module('conv1', nn.Conv2d(in_channels=3, out_channels=64, kernel_size=(3,3)))
model.add_module('conv1_relu', nn.ReLU())
model.add_module('pool1', nn.MaxPool2d(kernel_size=2))

model.add_module('conv2', nn.Conv2d(in_channels=64, out_channels=128, kernel_size=(3,3)))
model.add_module('conv2_relu', nn.ReLU())
model.add_module('pool2', nn.MaxPool2d(kernel_size=3))

#decribe convnet here
model.add_module('flatten', Flatten())
model.add_module('dense1_logits', nn.Linear(10368, 200)) # logits for 200 classes

In [17]:
from torchsummary import summary
# to see the size of final linear layer without calculating it - 10368
summary(model.cuda(), (3, 64, 64))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 64, 62, 62]           1,792
              ReLU-2           [-1, 64, 62, 62]               0
         MaxPool2d-3           [-1, 64, 31, 31]               0
            Conv2d-4          [-1, 128, 29, 29]          73,856
              ReLU-5          [-1, 128, 29, 29]               0
         MaxPool2d-6            [-1, 128, 9, 9]               0
           Flatten-7                [-1, 10368]               0
            Linear-8                  [-1, 200]       2,073,800
Total params: 2,149,448
Trainable params: 2,149,448
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.05
Forward/backward pass size (MB): 6.03
Params size (MB): 8.20
Estimated Total Size (MB): 14.27
----------------------------------------------------------------


In [0]:
opt = torch.optim.SGD(model.parameters(), lr=0.01)

train_loss = []
val_accuracy = []

## retrain it ##

In [23]:
import time
num_epochs = 100 # total amount of full passes over training data
batch_size = 50  # number of samples processed in one SGD iteration


for epoch in range(num_epochs):
    # In each epoch, we do a full pass over the training data:
    start_time = time.time()
    model.train(True) # enable dropout / batch_norm training behavior
    for (X_batch, y_batch) in train_batch_gen:
        # train on batch
        loss = compute_loss(X_batch, y_batch)
        loss.backward()
        opt.step()
        opt.zero_grad()
        train_loss.append(loss.data.cpu().numpy())
    model.train(False) # disable dropout / use averages for batch_norm
    for X_batch, y_batch in val_batch_gen:
        logits = model(Variable(torch.FloatTensor(X_batch)).cuda())
        y_pred = logits.max(1)[1].data

        batch_val_acc = np.mean((y_batch.cpu() == y_pred.cpu()).numpy())
        val_accuracy.append(batch_val_acc)
    epoch_time = time.time() - start_time
    train_l = np.mean(train_loss[-len(train_dataset) // batch_size :])
    val_acc = np.mean(val_accuracy[-len(val_dataset) // batch_size :]) * 100
    # Then we print the results for this epoch:
    print(f"Epoch {epoch + 1} of {num_epochs} took {epoch_time:.3f}s")
    print(f"  training loss (in-iteration): \t{train_l:.6f}")
    print(f"  validation accuracy: \t\t\t{val_acc:.2f} %")

Epoch 1 of 100 took 37.377s
  training loss (in-iteration): 	4.763467
  validation accuracy: 			7.70
Epoch 2 of 100 took 37.091s
  training loss (in-iteration): 	4.464122
  validation accuracy: 			10.71
Epoch 3 of 100 took 36.928s
  training loss (in-iteration): 	4.211896
  validation accuracy: 			14.23
Epoch 4 of 100 took 36.840s
  training loss (in-iteration): 	3.971863
  validation accuracy: 			15.34
Epoch 5 of 100 took 36.697s
  training loss (in-iteration): 	3.785956
  validation accuracy: 			16.76
Epoch 6 of 100 took 36.625s
  training loss (in-iteration): 	3.634214
  validation accuracy: 			17.52
Epoch 7 of 100 took 36.645s
  training loss (in-iteration): 	3.494398
  validation accuracy: 			18.23
Epoch 8 of 100 took 36.457s
  training loss (in-iteration): 	3.356897
  validation accuracy: 			18.95
Epoch 9 of 100 took 36.693s
  training loss (in-iteration): 	3.219419
  validation accuracy: 			19.31
Epoch 10 of 100 took 36.674s
  training loss (in-iteration): 	3.079872
  validation

Exception ignored in: <bound method _MultiProcessingDataLoaderIter.__del__ of <torch.utils.data.dataloader._MultiProcessingDataLoaderIter object at 0x7ff6fda0afd0>>
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 961, in __del__
    self._shutdown_workers()
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 941, in _shutdown_workers
    w.join()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 122, in join
    assert self._parent_pid == os.getpid(), 'can only join a child process'
AssertionError: can only join a child process
Exception ignored in: <bound method _MultiProcessingDataLoaderIter.__del__ of <torch.utils.data.dataloader._MultiProcessingDataLoaderIter object at 0x7ff6fda67c18>>
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 961, in __del__
    self._shutdown_workers()
  File "/usr/local/l

Epoch 12 of 100 took 36.761s
  training loss (in-iteration): 	2.801317
  validation accuracy: 			20.16


Exception ignored in: <bound method _MultiProcessingDataLoaderIter.__del__ of <torch.utils.data.dataloader._MultiProcessingDataLoaderIter object at 0x7ff6fda0afd0>>
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 961, in __del__
    self._shutdown_workers()
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 941, in _shutdown_workers
    w.join()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 122, in join
    assert self._parent_pid == os.getpid(), 'can only join a child process'
AssertionError: can only join a child process
Exception ignored in: <bound method _MultiProcessingDataLoaderIter.__del__ of <torch.utils.data.dataloader._MultiProcessingDataLoaderIter object at 0x7ff6fda67c18>>
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 961, in __del__
    self._shutdown_workers()
  File "/usr/local/l

Epoch 13 of 100 took 36.768s
  training loss (in-iteration): 	2.647879
  validation accuracy: 			19.90
Epoch 14 of 100 took 36.733s
  training loss (in-iteration): 	2.500448
  validation accuracy: 			20.26
Epoch 15 of 100 took 36.569s
  training loss (in-iteration): 	2.345874
  validation accuracy: 			20.60
Epoch 16 of 100 took 36.488s
  training loss (in-iteration): 	2.185058
  validation accuracy: 			19.62
Epoch 17 of 100 took 36.532s
  training loss (in-iteration): 	2.021010
  validation accuracy: 			19.96
Epoch 18 of 100 took 36.513s
  training loss (in-iteration): 	1.855061
  validation accuracy: 			19.30
Epoch 19 of 100 took 36.486s
  training loss (in-iteration): 	1.687373
  validation accuracy: 			18.62
Epoch 20 of 100 took 36.544s
  training loss (in-iteration): 	1.516150
  validation accuracy: 			18.78
Epoch 21 of 100 took 36.396s
  training loss (in-iteration): 	1.348621
  validation accuracy: 			18.25
Epoch 22 of 100 took 36.484s
  training loss (in-iteration): 	1.184577
  


__Hint:__ If you don't want to compute shapes by hand, just plug in any shape (e.g. 1 unit) and run compute_loss. You will see something like this:

__`RuntimeError: size mismatch, m1: [5 x 1960], m2: [1 x 64] at /some/long/path/to/torch/operation`__

See the __1960__ there? That's your actual input shape.

## Task 2: adding normalization

* Add batch norm (with default params) between convolution and ReLU
  * nn.BatchNorm*d (1d for dense, 2d for conv)
  * usually better to put them after linear/conv but before nonlinearity
* Re-train the network with the same optimizer, it should get at least 20% validation accuracy at peak.

To know more about **batch_norm** and **data covariate shift**

https://towardsdatascience.com/batch-normalization-in-neural-networks-1ac91516821c

https://www.youtube.com/watch?v=nUUqwaxLnWs

In [0]:
# empiric rule - amount of output channels approximately equals number of classes
model = nn.Sequential()

model.add_module('conv1', nn.Conv2d(in_channels=3, out_channels=64, kernel_size=(3,3)))
model.add_module('bn1', nn.BatchNorm2d(64))
model.add_module('conv1_relu', nn.ReLU())
model.add_module('pool1', nn.MaxPool2d(kernel_size=2))

model.add_module('conv2', nn.Conv2d(in_channels=64, out_channels=128, kernel_size=(3,3)))
model.add_module('bn2', nn.BatchNorm2d(128))
model.add_module('conv2_relu', nn.ReLU())
model.add_module('pool2', nn.MaxPool2d(kernel_size=3))

model.add_module('flatten', Flatten())
model.add_module('dense1_logits', nn.Linear(10368, 200)) # logits for 200 classes

In [0]:
opt = torch.optim.SGD(model.parameters(), lr=0.01)

train_loss = []
val_accuracy = []

In [28]:
import time
num_epochs = 100 # total amount of full passes over training data
batch_size = 50  # number of samples processed in one SGD iteration


for epoch in range(num_epochs):
    # In each epoch, we do a full pass over the training data:
    start_time = time.time()
    model.train(True) # enable dropout / batch_norm training behavior
    for (X_batch, y_batch) in train_batch_gen:
        # train on batch
        loss = compute_loss(X_batch, y_batch)
        loss.backward()
        opt.step()
        opt.zero_grad()
        train_loss.append(loss.data.cpu().numpy())
    model.train(False) # disable dropout / use averages for batch_norm
    for X_batch, y_batch in val_batch_gen:
        logits = model(Variable(torch.FloatTensor(X_batch)).cuda())
        y_pred = logits.max(1)[1].data

        batch_val_acc = np.mean((y_batch.cpu() == y_pred.cpu()).numpy())
        val_accuracy.append(batch_val_acc)
    epoch_time = time.time() - start_time
    train_l = np.mean(train_loss[-len(train_dataset) // batch_size :])
    val_acc = np.mean(val_accuracy[-len(val_dataset) // batch_size :]) * 100
    # Then we print the results for this epoch:
    print(f"Epoch {epoch + 1} of {num_epochs} took {epoch_time:.3f}s")
    print(f"  training loss (in-iteration): \t{train_l:.6f}")
    print(f"  validation accuracy: \t\t\t{val_acc:.2f} %")

KeyboardInterrupt: ignored

Now let's try batchnorm after relu

In [0]:
# empiric rule - amount of output channels approximately equals number of classes
model = nn.Sequential()

model.add_module('conv1', nn.Conv2d(in_channels=3, out_channels=64, kernel_size=(3,3)))
model.add_module('conv1_relu', nn.ReLU())
model.add_module('bn1', nn.BatchNorm2d(64))
model.add_module('pool1', nn.MaxPool2d(kernel_size=2))

model.add_module('conv2', nn.Conv2d(in_channels=64, out_channels=128, kernel_size=(3,3)))
model.add_module('conv2_relu', nn.ReLU())
model.add_module('bn2', nn.BatchNorm2d(128))
model.add_module('pool2', nn.MaxPool2d(kernel_size=3))

model.add_module('flatten', Flatten())
model.add_module('dense1_logits', nn.Linear(10368, 200)) # logits for 200 classes

In [30]:
opt = torch.optim.SGD(model.parameters(), lr=0.01)

train_loss = []
val_accuracy = []

import time
num_epochs = 100 # total amount of full passes over training data
batch_size = 50  # number of samples processed in one SGD iteration

for epoch in range(num_epochs):
    # In each epoch, we do a full pass over the training data:
    start_time = time.time()
    model.train(True) # enable dropout / batch_norm training behavior
    for (X_batch, y_batch) in train_batch_gen:
        # train on batch
        loss = compute_loss(X_batch, y_batch)
        loss.backward()
        opt.step()
        opt.zero_grad()
        train_loss.append(loss.data.cpu().numpy())
    model.train(False) # disable dropout / use averages for batch_norm
    for X_batch, y_batch in val_batch_gen:
        logits = model(Variable(torch.FloatTensor(X_batch)).cuda())
        y_pred = logits.max(1)[1].data

        batch_val_acc = np.mean((y_batch.cpu() == y_pred.cpu()).numpy())
        val_accuracy.append(batch_val_acc)
    epoch_time = time.time() - start_time
    train_l = np.mean(train_loss[-len(train_dataset) // batch_size :])
    val_acc = np.mean(val_accuracy[-len(val_dataset) // batch_size :]) * 100
    # Then we print the results for this epoch:
    print(f"Epoch {epoch + 1} of {num_epochs} took {epoch_time:.3f}s")
    print(f"  training loss (in-iteration): \t{train_l:.6f}")
    print(f"  validation accuracy: \t\t\t{val_acc:.2f} %")

Epoch 1 of 100 took 37.290s
  training loss (in-iteration): 	4.584318
  validation accuracy: 			15.25 %
Epoch 2 of 100 took 37.368s
  training loss (in-iteration): 	3.567699
  validation accuracy: 			18.97 %
Epoch 3 of 100 took 37.497s
  training loss (in-iteration): 	2.999180
  validation accuracy: 			20.88 %
Epoch 4 of 100 took 38.044s
  training loss (in-iteration): 	2.517339
  validation accuracy: 			21.59 %
Epoch 5 of 100 took 37.454s
  training loss (in-iteration): 	2.023547
  validation accuracy: 			20.36 %
Epoch 6 of 100 took 37.801s
  training loss (in-iteration): 	1.506637
  validation accuracy: 			19.61 %
Epoch 7 of 100 took 37.635s
  training loss (in-iteration): 	0.989550
  validation accuracy: 			18.03 %
Epoch 8 of 100 took 37.663s
  training loss (in-iteration): 	0.550558
  validation accuracy: 			19.18 %
Epoch 9 of 100 took 37.481s
  training loss (in-iteration): 	0.268019
  validation accuracy: 			18.88 %
Epoch 10 of 100 took 37.769s
  training loss (in-iteration): 	0.

KeyboardInterrupt: ignored



```

```

```

```

```
## Task 3: Data Augmentation

** Augmenti - A spell used to produce water from a wand (Harry Potter Wiki) **

<img src="https://github.com/yandexdataschool/Practical_DL/blob/sem3spring2019/week03_convnets/HagridsHut_PM_B6C28_Hagrid_sHutFireHarryFang.jpg?raw=1" style="width:80%">

There's a powerful torch tool for image preprocessing useful to do data preprocessing and augmentation.

Here's how it works: we define a pipeline that
* makes random crops of data (augmentation)
* randomly flips image horizontally (augmentation)
* then normalizes it (preprocessing)

When testing, we don't need random crops, just normalize with same statistics.

In [0]:
import torchvision
from torchvision import transforms

transform_augment = transforms.Compose([
                                        transforms.RandomHorizontalFlip(),
                                        transforms.ToTensor()
                                        ])
 # decribe transformation here

In [0]:
dataset = torchvision.datasets.ImageFolder('tiny-imagenet-200/train', transform=transform_augment)
train_dataset, val_dataset = torch.utils.data.random_split(dataset, [90000, 10000])
train_batch_gen = torch.utils.data.DataLoader(train_dataset, 
                                              batch_size=batch_size,
                                              shuffle=True,
                                              num_workers=1)
val_batch_gen = torch.utils.data.DataLoader(val_dataset,
                                            batch_size=batch_size,
                                            shuffle=True,
                                            num_workers=1)

In [53]:
import time
num_epochs = 100 # total amount of full passes over training data
batch_size = 50  # number of samples processed in one SGD iteration


for epoch in range(num_epochs):
    print (num_epochs)
    # In each epoch, we do a full pass over the training data:
    start_time = time.time()
    model.train(True) # enable dropout / batch_norm training behavior
    for (X_batch, y_batch) in train_batch_gen:
        # train on batch
        loss = compute_loss(X_batch, y_batch)
        loss.backward()
        opt.step()
        opt.zero_grad()
        train_loss.append(loss.data.cpu().numpy())
    print (num_epochs)    
    model.train(False) # disable dropout / use averages for batch_norm
    for X_batch, y_batch in val_batch_gen:
        logits = model(Variable(torch.FloatTensor(X_batch)).cuda())
        y_pred = logits.max(1)[1].data
        val_accuracy.append(np.mean( (y_batch.cpu() == y_pred.cpu()).numpy() ))

    print (num_epochs)
    # Then we print the results for this epoch:
    print("Epoch {} of {} took {:.3f}s".format(
        epoch + 1, num_epochs, time.time() - start_time))
    print("  training loss (in-iteration): \t{:.6f}".format(
        np.mean(train_loss[-len(train_dataset) // batch_size :])))
    print("  validation accuracy: \t\t\t{:.2f} %".format(
        np.mean(val_accuracy[-len(val_dataset) // batch_size :]) * 100))

100
100
100
Epoch 1 of 100 took 42.113s
  training loss (in-iteration): 	2.724849
  validation accuracy: 			45.18 %
100
100
100
Epoch 2 of 100 took 41.882s
  training loss (in-iteration): 	2.050311
  validation accuracy: 			45.11 %
100
100
100
Epoch 3 of 100 took 42.166s
  training loss (in-iteration): 	1.726609
  validation accuracy: 			43.68 %
100
100
100
Epoch 4 of 100 took 42.103s
  training loss (in-iteration): 	1.483451
  validation accuracy: 			42.21 %
100
100
100
Epoch 5 of 100 took 42.174s
  training loss (in-iteration): 	1.261368
  validation accuracy: 			39.82 %
100
100
100
Epoch 6 of 100 took 42.170s
  training loss (in-iteration): 	1.080441
  validation accuracy: 			38.55 %
100


KeyboardInterrupt: ignored

We need for test data __only normalization__, not cropping and rotation

In [54]:
transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(means, stds), #normalize by channel. all value along the channel have mean and deviation
])

test_dataset = <YOUR CODE>


NameError: ignored

## The Quest For A Better Network

See `practical_dl/homework02` for a full-scale assignment.