### The Pipeline

So far in this series, we learned about Tensors, and we've learned all about PyTorch neural networks. We are now ready to begin the training process.

1. Prepare the data
2. Build the model
3. __Train the model__ 
4. Analyze the model's results

Training is essentially gradient descent: we calculate the gradient of the loss, and use this to update the weights so that the resulting loss is closer the the minima. 

### Training

During training, we get a __batch__ and pass it forward through the network. Once the output is obtained, we compare the predicted output to the actual labels, and once we know how close the predicted values are from the actual labels, we tweak the weights inside the network in such a way that the values the network predicts move closer to the true values (labels). All of this is for a single batch, and we repeat this process for every batch until we have covered every sample in our training set. 

After we've completed this process for all of the batches and passed over every sample in our training set, we say that an __epoch__ is complete. During the entire training process, we do as many epochs as necessary to reach our desired level of accuracy. 

__Training a neural net:__
1. Get batch from the training set.
2. Pass batch to network.
3. Calculate the __loss__ (difference between the predicted values and the true values).
4. Calculate the gradient of the loss function w.r.t the network's weights.
5. Update the weights using the gradients to reduce the loss.
6. Repeat steps 1-5 until one epoch is completed.
7. Repeat steps 1-6 for as many epochs required to reach the minimum loss.

We know how to do steps 1 and 2. The loss is specified depending on the problem. We use backpropagation and an optimization algorithm to perform step 4 and 5. Steps 6 and 7 are just standard Python loops (the training loop).

In [2]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class Network(nn.Module):
    def __init__(self, channels=1): # default grayscale
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels=channels, out_channels=6, kernel_size=5) 
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)
        
        self.fc1 = nn.Linear(in_features=12*4*4, out_features=120) # ((28-5+1)/2 -5 +1)/2 = 4
        self.fc2 = nn.Linear(in_features=120, out_features=60)
        self.out = nn.Linear(in_features=60, out_features=10)
        
    def forward(self, t):
        # (1) input layer
        t = t
        
        # (2) hidden conv layer
        t = self.conv1(t)
        t = F.relu(t) # activation_function='relu' in tf.keras      
        t = F.max_pool2d(t, kernel_size=2, stride=2)
        
        # (3) hidden conv layer
        t = self.conv2(t)
        t = F.relu(t)
        t = F.max_pool2d(t, kernel_size=2, stride=2)

        # (4) hidden linear layer
        t = t.reshape(-1, 12*4*4)
        t = self.fc1(t)
        t = F.relu(t) # activation_funcion='relu' in tf.keras
        
        # (5) hidden linear layer
        t = self.fc2(t)
        t = F.relu(t)
        
        # (6) output layer
        t = self.out(t)
        #t = F.softmax(t, dim=1) # first index is batch
        return t

In [11]:
import torchvision
import torchvision.transforms as transform

train_set = torchvision.datasets.FashionMNIST(
    root='./data/FashionMNIST',
    download=True,
    transform=transform.ToTensor())

train_loader = torch.utils.data.DataLoader(train_set, batch_size=100)
batch = next(iter(train_loader)) # get one batch
images, labels = batch

# initialize a network
network = Network() 

In [12]:
def get_num_correct(preds, labels):
    return (preds.argmax(dim=1) == labels).sum()

#### Cross entropy loss

`F.cross_entropy` combines __negative log-loss with softmax and averages the result__. Thus we comment out above the softmax activation. Here we have a batch size of 2. Thus to compare absolute lossess for different batch sizes one must multiply by the batch size. 

In [39]:
F.cross_entropy(torch.tensor([[3, 6, 2], [1, 1, 2]]).float(), torch.tensor([2, 1]))

tensor(2.8087)

In [42]:
import numpy as np
softmax = lambda a: np.exp(a) / np.exp(a).sum()

(-np.log(softmax(np.array([3,6,2])))[2] + -np.log(softmax(np.array([1,1,2])))[1])/2 # averages

2.8086643088447403

#### Backpropagation and Gradient Descent

In [13]:
import torch.optim as optim
optimizer = optim.Adam(network.parameters(), lr=0.01) # optimizer has access to network parameters

In [6]:
network.conv1.weight.grad is None

True

In [14]:
preds = network(images)
loss = F.cross_entropy(preds, labels) 
print('loss: ', loss) 
print('no. correct:', get_num_correct(preds, labels)) # out of 100

loss:  tensor(2.3068, grad_fn=<NllLossBackward>)
no. correct: tensor(12)


In [15]:
loss.backward() # backprop, looks at definition of loss and crawls backward into the network
network.conv1.weight.grad.shape # gradients updated after one pass of backprop

torch.Size([6, 1, 5, 5])

In [16]:
optimizer.step() # based on our new loss gradient values, we update weights accdg to Adam to minimize loss.

In [17]:
preds = network(images) # run new predictions
loss = F.cross_entropy(preds, labels) 
print('loss after backprop: ', loss, 'no. correct:', get_num_correct(preds, labels))

loss after backprop:  tensor(2.2887, grad_fn=<NllLossBackward>) no. correct: tensor(14)


The correct predictions have increased after one step of gradient descent!

### Training one batch

Collect everything in one place:

In [18]:
# compile the neural net
network = Network()
optimizer = optim.Adam(network.parameters(), lr=0.01)


# loss
loss = F.cross_entropy(network(images), labels)
print("Step 0:")
print(loss.item())
print(get_num_correct(network(images), labels))

# backprop
loss.backward()  # update gradients
optimizer.step() # update weights using gradients to minimize loss

# recalculating loss based on new weights
loss = F.cross_entropy(network(images), labels)
print("\nStep 1:")
print(loss.item())
print(get_num_correct(network(images), labels))

Step 0:
2.3104944229125977
tensor(11)

Step 1:
2.2894699573516846
tensor(11)


### Training a single epoch

In [19]:
train_loader = torch.utils.data.DataLoader(train_set, batch_size=100) 

network = Network()
optimizer = optim.Adam(network.parameters(), lr=0.01)

total_loss = 0
total_correct = 0
for batch in train_loader:
    images, labels = batch 

    preds = network(images) 
    loss = F.cross_entropy(preds, labels) 

    optimizer.zero_grad() 
    loss.backward()  # calculate gradients
    optimizer.step() # update weights using gradients using adam

    total_loss += loss.item()
    total_correct += get_num_correct(preds, labels)
    
print(
    "epoch:", 0, 
    "total_correct:", total_correct, 
    "loss:", total_loss
)

epoch: 0 total_correct: tensor(46885) loss: 346.6361614763737


Setting the gradients to zero in line 14 is necessary because `loss.backward()` _adds_ the calculated gradients instead of assigning them. 

In [20]:
optimizer.zero_grad()

In [21]:
network.conv1.weight.grad.sum()

tensor(0.)

### Training with multiple epochs

In [22]:
train_loader = torch.utils.data.DataLoader(train_set, batch_size=100)

network = Network()
optimizer = optim.Adam(network.parameters(), lr=0.01)

for epoch in range(10):    
    total_loss = 0
    total_correct = 0
    
    for batch in train_loader:    
        images, labels = batch 
        preds = network(images)
        loss = F.cross_entropy(preds, labels) # check that loss tensor has a gradient attribute
                                              # so that line 17 makes sense
        optimizer.zero_grad() # set all gradients to zero
        loss.backward() # calculate gradient
        optimizer.step() # update Weights

        total_loss += loss.item()
        total_correct += get_num_correct(preds, labels)

    print(
        "epoch", epoch, 
        "total_correct:", total_correct, 
        "loss:", total_loss
    )

epoch 0 total_correct: tensor(45570) loss: 380.7395526468754
epoch 1 total_correct: tensor(50782) loss: 251.97550985217094
epoch 2 total_correct: tensor(51535) loss: 229.5160759538412
epoch 3 total_correct: tensor(51801) loss: 220.17312617599964
epoch 4 total_correct: tensor(52229) loss: 211.77114780247211
epoch 5 total_correct: tensor(52350) loss: 208.3045560270548
epoch 6 total_correct: tensor(52407) loss: 206.49326403439045
epoch 7 total_correct: tensor(52563) loss: 201.06376388669014
epoch 8 total_correct: tensor(52860) loss: 194.65639048814774
epoch 9 total_correct: tensor(52807) loss: 196.51216650009155


In [25]:
52807/60000 # train accuracy

0.8801166666666667