# Model training

In [2]:
import json
import pandas as pd
import matplotlib.pyplot as plt

from torchsummary import summary

from doggofier.data.dataset import DogsDataset
from doggofier.models import ResNet50, VGG16

In [3]:
categories = DogsDataset.create_categories('data')
num_classes = len(categories)

## ResNet-50

ResNet-50 is a residual network made of 50 layers proposed in the paper "Deep Residual Learning for Image Recognition". It's widely used for many computer vision tasks because it allows to build very deep networks without hitting vanishing/exploding gradients problem. This is achieved by introducing residual blocks consisting of shortcuts that skip one or more layers. The shortcut connections simply perform identity mapping, and their outputs are added to the outputs of stacked layers. This kind of mapping doesn't add any extra parameter nor computational complexity. In the picture below there is an architecture of ResNet-50 network:

![resnet](https://tech.showmax.com/2019/04/conveiro/resnet_50-f66170f4.png)

In our case we have removed original classifier and in its place we have added a sequential block consisting of 2 fully connected layers with ReLU as an activation function followed by 2 dropout layers. At the end the fully connected layer with log softmax function has been added.

In [5]:
resnet = ResNet50(num_classes, pretrained=True)

summary(resnet, input_size=(3, 224, 224), batch_size=32)

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1         [32, 64, 112, 112]           9,408
       BatchNorm2d-2         [32, 64, 112, 112]             128
              ReLU-3         [32, 64, 112, 112]               0
         MaxPool2d-4           [32, 64, 56, 56]               0
            Conv2d-5           [32, 64, 56, 56]           4,096
       BatchNorm2d-6           [32, 64, 56, 56]             128
              ReLU-7           [32, 64, 56, 56]               0
            Conv2d-8           [32, 64, 56, 56]          36,864
       BatchNorm2d-9           [32, 64, 56, 56]             128
             ReLU-10           [32, 64, 56, 56]               0
           Conv2d-11          [32, 256, 56, 56]          16,384
      BatchNorm2d-12          [32, 256, 56, 56]             512
           Conv2d-13          [32, 256, 56, 56]          16,384
      BatchNorm2d-14          [32, 256,

## VGG-16

VGG-16 is a convolutional neural network architecture proposed in the paper "Very Deep Convolutional Networks For Large-Scale Image Recognition". The most unique thing about this network in comparison to the earlier architectures is the usage of multiple kernel-sized 3x3 filters one after another instead of using large 7x7 or 11x11 filters at the beginning of the network. The convolution stride is fixed to 1 pixel and the spatial padding of convolutional layer input is such that the spatial resolution is preserved after convolution. The architecture is depicted in the picture below:

![vgg](https://neurohive.io/wp-content/uploads/2018/11/vgg16-neural-network.jpg)

Similarly to ResNet-50 model, again we have replaced the last layer of classifier with one fully connected layer with ReLU function and dropout layer and one fully connected layer with log softmax function.

In [6]:
vgg = VGG16(num_classes, pretrained=True)

summary(vgg, input_size=(3, 224, 224), batch_size=32)

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1         [32, 64, 224, 224]           1,792
              ReLU-2         [32, 64, 224, 224]               0
            Conv2d-3         [32, 64, 224, 224]          36,928
              ReLU-4         [32, 64, 224, 224]               0
         MaxPool2d-5         [32, 64, 112, 112]               0
            Conv2d-6        [32, 128, 112, 112]          73,856
              ReLU-7        [32, 128, 112, 112]               0
            Conv2d-8        [32, 128, 112, 112]         147,584
              ReLU-9        [32, 128, 112, 112]               0
        MaxPool2d-10          [32, 128, 56, 56]               0
           Conv2d-11          [32, 256, 56, 56]         295,168
             ReLU-12          [32, 256, 56, 56]               0
           Conv2d-13          [32, 256, 56, 56]         590,080
             ReLU-14          [32, 256,

## Example of training

We have prepared a script that allows an user to run the training without modifying any code. The only thing that must be done is preparing the JSON file with hyperparameters. The example of such a file is given below:

```
{
    "n_classes": 130,
    "num_workers": 0,
    "batch_size": 32,
    "lr": 0.001,
    "epochs": 1,
    "max_epoch_stop": 3
}
```
It is assumed that the JSON file is stored in `models` directory and dataset is stored in `data` directory, however user can change it by specifying appropriate script arguments. Here's presented only exemplary training of VGG-16 model for 1 epoch with the file given above. The proper training will be performed on device with GPUs to speed up computation.

In [10]:
%run doggofier/train.py vgg16_example.json vgg16

Loading the datasets...
Dataset loading has been completed.
Training on cpu device...
Epoch: 1 / 1	                    Step: 100 / 2039	                    Loss: 2.6234065544605256
Epoch: 1 / 1	                    Step: 200 / 2039	                    Loss: 2.2220068752765654
Epoch: 1 / 1	                    Step: 300 / 2039	                    Loss: 1.9844504523277282
Epoch: 1 / 1	                    Step: 400 / 2039	                    Loss: 1.8420568689703942
Epoch: 1 / 1	                    Step: 500 / 2039	                    Loss: 1.7455630342960358
Epoch: 1 / 1	                    Step: 600 / 2039	                    Loss: 1.683325431148211
Epoch: 1 / 1	                    Step: 700 / 2039	                    Loss: 1.6278196588584355
Epoch: 1 / 1	                    Step: 800 / 2039	                    Loss: 1.5850052423030139
Epoch: 1 / 1	                    Step: 900 / 2039	                    Loss: 1.5520649543073441
Epoch: 1 / 1	                    Step: 1000 / 2039	         