# A notebook to exemplify backpropagation using pytorch

The following notebook illustrates two ways of doing backpropagation:

* by hand - using the gradients and chain rules for a custom made loss function
* automatic - using a simple NN model in pytorch

Made in the context of the course "Pythorch for Deep Learning" 

Author: P. Silva (29/12/2020)

In [1]:
import torch
import numpy as np
import torch.nn as nn
import torch.nn.functional as F

## Model

A simple network with 1 input, 1 hidded and 1 output layer, each with two perceptrons making use of a logistic activiation function.
The values and architecture are based on [Matt Mazur's blog](https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/). The only difference is that we allow the individual biases of the perceptrons in the same layer to be updated separately.

In [2]:
layer_weights=[[0.15,0.20,0.25,0.30],[0.40,0.45,0.50,0.55]]
layer_biases=[[0.35,0.35],[0.60,0.60]]
inputs=[0.05,0.10]
exp_outputs=[0.01,0.99]

## Approach 1: doing back-propagation by hand

## Forward pass

We compute the values expected as output of the hidden and output layers.

In [3]:
#input layer
inputT = torch.Tensor(inputs).reshape(-1,1)

def getLayerAsTensor(inputT,weights,bias):
   
    if isinstance(bias,torch.Tensor):
        biasT = bias
    else:
        biasT = torch.tensor(np.array(bias).reshape(2,1), requires_grad=True, dtype=torch.float32)
    
    if isinstance(weights,torch.Tensor):
        weightsT = weights
    else:
        weightsT = torch.tensor(np.array(weights).reshape(2,2), requires_grad=True, dtype=torch.float32)

    activation = torch.nn.Sigmoid()
    layerT     = activation( torch.add(weightsT.mm(inputT),biasT) )

    return layerT,weightsT,biasT
    

#hidden layer definition
hiddenT, hiddenWeightsT, hiddenBiasT = getLayerAsTensor(inputT,  layer_weights[0], layer_biases[0])
outT,    outWeightsT,    outBiasT    = getLayerAsTensor(hiddenT, layer_weights[1], layer_biases[1])

print('Hidden layer:', hiddenT.detach().numpy())
print('Output layer:', outT.detach().numpy())

Hidden layer: [[0.59327   ]
 [0.59688437]]
Output layer: [[0.75136507]
 [0.7729285 ]]


In [4]:
#error computation
def getMSE(outT,exp_outputs):
    expOutT = torch.tensor(exp_outputs, dtype=torch.float32).reshape(-1,1)
    return ( 0.5*( outT - expOutT )**2 ).sum()

errorT=getMSE(outT,exp_outputs)
print('Total error:',errorT.detach().numpy())

Total error: 0.2983711


## Backward pass

We now apply the chain rule to perform back propagation and update the weights so that the final error is minimized.
The backward pass with gradient descent is given by the formula

$\vec{\theta}_{n+1}=\vec{\theta}_{n}-\eta \vec{\nabla} C(\vec{\theta}_n)$

with the following quantities defined

* $\vec{\theta}=(\vec{w},\vec{b})$ a vector of weights and biases
* $\eta$ the learning rate
* C the cost function in this case the sum of the mean square errors of the outputs

In [5]:
#compute the derivatives of the cost function
errorT.backward()

In [6]:
#gradient descent algorithm
def doGradientDescentBackProp(t,lr=0.5):
    return t.add(t.grad,alpha=-lr)
    
newHiddenWeightsT=doGradientDescentBackProp(hiddenWeightsT)
newHiddenBiasT=doGradientDescentBackProp(hiddenBiasT)
newOutWeightsT=doGradientDescentBackProp(outWeightsT)
newOutBiasT=doGradientDescentBackProp(outBiasT)

print('Udpdated hidden layer')
print('\t weights',newHiddenWeightsT.detach().numpy())
print('\t bias',newHiddenBiasT.detach().numpy())

print('Udpdated output layer')
print('\t weights',newOutWeightsT.detach().numpy())
print('\t bias',newOutBiasT.detach().numpy())

Udpdated hidden layer
	 weights [[0.14978072 0.19956143]
 [0.24975115 0.2995023 ]]
	 bias [[0.3456143 ]
 [0.34502286]]
Udpdated output layer
	 weights [[0.3589165  0.40866616]
 [0.5113013  0.56137013]]
	 bias [[0.53075075]
 [0.61904913]]


In [7]:
#build the updated network and get its error
newHiddenT, _, _ = getLayerAsTensor(inputT,     newHiddenWeightsT, hiddenBiasT)
newOutT,    _, _ = getLayerAsTensor(newHiddenT, newOutWeightsT,    outBiasT)
newErrorT        = getMSE(newOutT,exp_outputs)
print('Total error (no updated biases):',newErrorT.detach().numpy())

newHiddenT, _, _ = getLayerAsTensor(inputT,     newHiddenWeightsT, newHiddenBiasT)
newOutT,    _, _ = getLayerAsTensor(newHiddenT, newOutWeightsT,    newOutBiasT)
newErrorT        = getMSE(newOutT,exp_outputs)
print('Total error (updated biases):',newErrorT.detach().numpy())

Total error (no updated biases): 0.2910278
Total error (updated biases): 0.28047147


## Approach 2: implementation using a simple ANN model

We repeat the above exercise but using a ANN class derived from torch.nn.Module

In [8]:
class Model(nn.Module):
    
    """this model implements a dummy 2 layer ANN to explore the backpropagation algorithm"""
    
    def __init__(self,layer_weights,layer_biases):
        super().__init__()
        
        self.nlayers=len(layer_weights)
        for i in range(self.nlayers):       
            setattr(self,'layer_%d'%i, nn.Linear(2,2) )
            layer=getattr(self,'layer_%d'%i)
            
            layer.bias.data = torch.tensor( np.array(layer_biases[i]), dtype=torch.float32 )
            layer.weight.data = torch.tensor(np.array(layer_weights[i]).reshape(2,2), dtype=torch.float32)
                        
    
    def forward(self,x,debug=True):
        for i in range(self.nlayers):
            x = F.sigmoid( getattr(self,'layer_%d'%i)(x) )
            if debug:
                print('Layer {} yields {}'.format(i,str(x)))
        return x
    
    def print(self):
        for name,param in self.named_parameters():
            print(name,param)
    
model=Model(layer_weights,layer_biases)
model.print()

layer_0.weight Parameter containing:
tensor([[0.1500, 0.2000],
        [0.2500, 0.3000]], requires_grad=True)
layer_0.bias Parameter containing:
tensor([0.3500, 0.3500], requires_grad=True)
layer_1.weight Parameter containing:
tensor([[0.4000, 0.4500],
        [0.5000, 0.5500]], requires_grad=True)
layer_1.bias Parameter containing:
tensor([0.6000, 0.6000], requires_grad=True)


## Forward pass 

The forward pass is computed for the given inputs and the loss is defined using the standard MSELoss class from pytorch.
the results are fully equivalent to the ones obtained previously by hand.

In [9]:
criterion=nn.MSELoss()

x=torch.tensor(inputs)
yexp=torch.tensor(exp_outputs)

#make the forward pass and compute the loss with MSE
y=model.forward(x)
loss=criterion(yexp,y)
print('Error:',loss)

Layer 0 yields tensor([0.5933, 0.5969], grad_fn=<SigmoidBackward>)
Layer 1 yields tensor([0.7514, 0.7729], grad_fn=<SigmoidBackward>)
Error: tensor(0.2984, grad_fn=<MeanBackward0>)




## Backward pass

We use the stochastic gradient descent from pytorch with only the learning rate set to the same value as previously.
This way the result is expected to yield the same as a simple gradient descent algorithm.
The results are equivalent to the ones previously obtained

In [10]:
#make the backward propagation with gradient descent
optimizer = torch.optim.SGD(model.parameters(), lr = 0.5)
optimizer.zero_grad()
loss.backward()
optimizer.step()
model.print()

layer_0.weight Parameter containing:
tensor([[0.1498, 0.1996],
        [0.2498, 0.2995]], requires_grad=True)
layer_0.bias Parameter containing:
tensor([0.3456, 0.3450], requires_grad=True)
layer_1.weight Parameter containing:
tensor([[0.3589, 0.4087],
        [0.5113, 0.5614]], requires_grad=True)
layer_1.bias Parameter containing:
tensor([0.5308, 0.6190], requires_grad=True)


## Final forward pass

The forward pass with the updated weights and biases has an error which is similar to the one obtained by hand previously.

In [11]:
y_new=model.forward(x)
loss_new=criterion(yexp,y_new)
print('Error:',loss_new)

Layer 0 yields tensor([0.5922, 0.5957], grad_fn=<SigmoidBackward>)
Layer 1 yields tensor([0.7284, 0.7784], grad_fn=<SigmoidBackward>)
Error: tensor(0.2805, grad_fn=<MeanBackward0>)
