# 05: Training a network

In [2]:
from IPython.display import Image

# set path containing data folder or use default for Colab (/gdrive/My Drive)
local_folder = "../data/"
import urllib.request
urllib.request.urlretrieve('https://raw.githubusercontent.com/guiwitz/DLImaging/master/utils/check_colab.py', 'check_colab.py')
from check_colab import set_datapath
colab, datapath = set_datapath(local_folder)

Image(url='https://github.com/guiwitz/DLImaging/raw/master/illustrations/ML_principle.jpg',width=700)




   1. First we will pass training examples forward through the network
   2. We measure an error between prediction and true label, the loss
   3. We calculate the gradient of the loss respective to each parameter in the model. This is done by backpropagation
   4. We adjust the parameters using the calculated gradient and an optimizer (e.g. SGD)

Additionally we will also see in this notebooks additional aspects such as training epochs and validation. The goal here is to once see the whole pipeline in detail before we start using tools that reduce some of the boiler-plate code necessary here.


### Mini batches

Before we create our network and define a loss, let's remember how training samples are passed through the network. In principle we want to do each optimization step for the entire dataset not just a single image as training would have a difficult time to converge. However this is usually not possible and and instead what is generally done is to use mini-batches, i.e. the network is iteratively trained on subsets of traininig examples. So now instead of using the gradients produced by a single image, one can use for example the average gradients over the mini-batch:

In [3]:
Image(url='https://github.com/guiwitz/DLImaging/raw/master/illustrations/batch_processing.jpg',width=700) 



PyTorch is in fact designed to handle batches by default. We can see that if we look at the documentation of modules such as nn.Linear which says that inputs should have the shape N x ... where N stands for batch size and ... for other dimensions such as channels, samples etc. This applies in fact to all modules, including those calculating losses. We can therefore feed examples with dimensions N x ... and PyTorch handles batch calculations for us.


### Creating the network

What does this mean for out network? We only have to make one slight modification. We used x.view(-1) previously to flatten 32x32 images into vectors of 1024 elements. If we now feed a batch of size Nx32x32, this would generate a long vector of size Nx1024. So we need to adjust the view() command and specify the size of the first dimension. In such a way only the image dimensions are flattened: x.view(batchsize, -1). Alternatively we can use torch.flatten(start_dim = 1) specifying from which dimension we want to start flattening:

In [4]:
import torch
from torch import nn
from torch.functional import F


In [5]:
class Mynetwork(nn.Module):
    def __init__(self, input_size, num_categories):
        super(Mynetwork, self).__init__()

        # define layers:
        self.layer1 = nn.Linear(input_size, 100)
        self.layer2 = nn.Linear(100, 100)
        self.layer3 = nn.Linear(100, num_categories)

    def forward(self, x):

        #flatten the input
        x = x.flatten(start_dim=1)
        # define the sequence of operations in the network including activations for example
        x = F.relu(self.layer1(x))
        x = F.relu(self.layer2(x))
        x = self.layer3(x)

        return x 



In [6]:
#insantiate 
model = Mynetwork(1024, 2)

In [8]:
# Check inputs/outputs

myinput = torch.randn((5,32,32))
myinput.size()


torch.Size([5, 32, 32])

In [9]:
myoutput = model(myinput)
myoutput.size()


torch.Size([5, 2])



If we want to pass a single element, e.g. in the inference phase, then we still have to reshape it so that it has dimensions N x .... The first dimension will just have a size of 1. The simple way to do that is to use unsqueeze():


In [13]:
myimage = torch.randn((32,32))
myimage.size()

torch.Size([32, 32])

In [14]:
myimage = myimage.unsqueeze(0)
myimage.size()

torch.Size([1, 32, 32])

In [15]:
output = model(myimage)
output.size()

torch.Size([1, 2])

## Defining a loss function and backpropogation

Example below - classify images. Therefore, use a standard loss function like cross entropy avaiable in torch.nn module:

In [16]:
criterion = nn.CrossEntropyLoss()

In [17]:
type(criterion)

torch.nn.modules.loss.CrossEntropyLoss

- Loss function is a module: it is differentiable and can be integrated into the network. It sticks to the same "batch-logic" as the other layers. Therefore it expects inputs where dims start with N for batches. 
- So need an output of the network size: N x C, where C is the number of categories (eg. 2 in this example) and a list of target labels ("true" labels) which have to be turned into a tensor

In [19]:
# make some data:
mysample = torch.randn(3, 32*32)
mylabels = torch.tensor([0,1,1])

In [20]:
# pass them through the network
output = model(mysample)

In [21]:
# compare output to target with cross-entropy module:
loss = criterion(output, mylabels)

Note: CrossEntropyLoss module automatically applies soft-mas to the output and then calculates loss. We DONT NEED to have a soft-max layer at the end of network.

After the forward pass, we can calculate the gradients of loss by backpropogation. Done by calling "backward" method:

In [22]:
loss.backward()

### Optimizer

Now we have estimates of loss and the gradients, so can now optimize all our parameters bt using optimisation algorithms. Several are available to use in torch.optim. Here using Adam optimiser, of the "safer" choices. As arguments we need to pass a list of parameters that need to be optimized. We can do that by recovering them from our model: 

In [23]:
list(model.parameters())

[Parameter containing:
 tensor([[-0.0169, -0.0028,  0.0166,  ...,  0.0138, -0.0294,  0.0260],
         [ 0.0031, -0.0187, -0.0098,  ...,  0.0274, -0.0208, -0.0231],
         [ 0.0205,  0.0225,  0.0139,  ...,  0.0033, -0.0183, -0.0188],
         ...,
         [-0.0089, -0.0237, -0.0096,  ...,  0.0250,  0.0010, -0.0275],
         [ 0.0103, -0.0147,  0.0073,  ..., -0.0162, -0.0283,  0.0303],
         [ 0.0106, -0.0227,  0.0006,  ...,  0.0209,  0.0262,  0.0079]],
        requires_grad=True),
 Parameter containing:
 tensor([-0.0217, -0.0023,  0.0276,  0.0068,  0.0037, -0.0254,  0.0056, -0.0081,
         -0.0129, -0.0228,  0.0088,  0.0299,  0.0299,  0.0138,  0.0310,  0.0145,
          0.0083,  0.0136, -0.0153, -0.0108,  0.0017, -0.0076, -0.0203, -0.0193,
          0.0071, -0.0081, -0.0294, -0.0224,  0.0198,  0.0311, -0.0055, -0.0011,
         -0.0282, -0.0014, -0.0003, -0.0029,  0.0167,  0.0261,  0.0087, -0.0245,
         -0.0095, -0.0286,  0.0157, -0.0033, -0.0280, -0.0123, -0.0305,  0.0054

### Measuring accuracy

We use cross-entropy as a loss function because it allows us to optimise the network. However, we are interested in the **accuracy** of the model - whether the correct label has been found or not. A binary answer is not useful for optimisation, but it is the goal to monitor this. Generating random data below to practice calculating this: 

In [24]:
myimages = torch.randn((3,32,32))
labels = torch.randint(0,2,(3,))

In [25]:
labels

tensor([1, 1, 1])

In [26]:
output = model(myimages)
output

tensor([[-0.0885, -0.0422],
        [-0.0140, -0.0682],
        [-0.0438, -0.0390]], grad_fn=<AddmmBackward0>)



The predicted category is the one with the highest probability (not normalized here but it doesn't matter). We can therefore just look for the index of the maximum value along the horizontal dimension:


In [27]:
output.argmax(dim=1)

tensor([1, 0, 1])

In [29]:
# compare prediction and true label
labels == output.argmax(dim=1)

tensor([ True, False,  True])

take the sum of the tensor (how many samples in the batch were correctly predicted) divided by total samples and the average accuracy is:


In [30]:
(labels == output.argmax(dim=1)).sum()/3

tensor(0.6667)