![UCA](https://univ-cotedazur.fr/medias/fichier/uca-logo-ligne-mono-bleu_1594383427225-)
# **Lab session 2: Backpropagation on neural networks**

## **Course: Optimization for data science**


#### Lab session proposed by
------------------------------------------------

### Rémy Sun
# Warning :
# "File -> Save a copy in Drive" before starting to modify the notebook, otherwise changes won't be saved.

In [None]:
!wget https://remysun.github.io/uploads/TPBackprop.zip
!unzip -j TPBackprop.zip
!wget https://remysun.github.io/uploads/utils-data.py

In [None]:
import math
import torch
from torch.autograd import Variable
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
%run 'utils-data.py'

# Part 1 : Fully manual backprop

We start by initializing the weights! It is done for you here, no need to code anything.

**Question: Why do we initialize with random values here?**

In [None]:
def init_params(nx, nh, ny):
    """
    nx, nh, ny: integers
    out params: dictionnary
    """
    params = {}

    params["Wh"] = torch.randn((nh, nx))*0.3
    params["Wy"] = torch.randn((ny, nh))*0.3
    params["bh"] = torch.zeros((nh,1))
    params["by"] = torch.zeros((ny,1))

    return params

Now we need to put the weight together into an actual neural network. Implement the forward function.

In [None]:
def forward(params, X):
    """
    params: dictionnary
    X: (n_batch, dimension)
    """
    bsize = X.size(0)
    nh = params['Wh'].size(0)
    ny = params['Wy'].size(0)
    outputs = {}

    outputs["X"] = X
    outputs["htilde"] = torch.mm(X, params["Wh"].T) + params["bh"].T
    outputs["h"] = torch.tanh(outputs["htilde"])
    outputs["ytilde"] = torch.mm(outputs["h"], params["Wy"].T) + params["by"].T
    outputs["yhat"] = torch.exp(outputs["ytilde"])
    outputs["yhat"] = outputs["yhat"] / outputs["yhat"].sum(dim=-1, keepdim=True)


    return outputs['yhat'], outputs

An additional part of computations is also to compute the loss functions, but this has little bearing on this lab session so it is done for you here.

In [None]:
def loss_accuracy(Yhat, Y):


    L = - torch.mean((Y * torch.log(Yhat)).sum(dim=1)) # mean for the batch

    _, indYhat = torch.max(Yhat, 1)
    _, indY = torch.max(Y, 1)

    acc = torch.sum(indY == indYhat) * 100. / indY.size(0);


    return L, acc

And now we need to implement the actual backpropagation algorithm: time to get the gradients!

**Question: How do you get gradients for the different weights?**

In [None]:
def backward(params, outputs, Y):
    bsize = Y.shape[0]
    grads = {}

    grads["Wy"] = None
    grads["Wh"] = None
    grads["by"] = None
    grads["bh"] = None

    return grads

Finally we have to implement and SGD step, which you already did in the last practical.

In [None]:
def sgd(params, grads, eta):

    params['Wy'] = None
    params['Wh'] = None
    params['by'] = None
    params['bh'] = None

    return params

## Manual global learning algorithm

We now put everything together to train the model!

**Question: Describe what happens during trainig (comment on the colors for instance)**

In [None]:
# init
data = CirclesData()
data.plot_data()
N = data.Xtrain.shape[0]
Nbatch = 10
nx = data.Xtrain.shape[1]
nh = 10
ny = data.Ytrain.shape[1]
eta = 0.03

params = init_params(nx, nh, ny)

curves = [[],[], [], []]

# epoch
for iteration in range(150):

    # permute
    perm = np.random.permutation(N)
    Xtrain = data.Xtrain[perm, :]
    Ytrain = data.Ytrain[perm, :]

    for j in range(N // Nbatch):

        indsBatch = range(j * Nbatch, (j+1) * Nbatch)
        X = Xtrain[indsBatch, :]
        Y = Ytrain[indsBatch, :]

        Y_hat, outputs = None
        loss, accuracy = None
        grads = None
        params = None


    Yhat_train, _ = forward(params, data.Xtrain)
    Yhat_test, _ = forward(params, data.Xtest)
    Ltrain, acctrain = loss_accuracy(Yhat_train, data.Ytrain)
    Ltest, acctest = loss_accuracy(Yhat_test, data.Ytest)
    Ygrid, _ = forward(params, data.Xgrid)

    title = 'Iter {}: Acc train {:.1f}% ({:.2f}), acc test {:.1f}% ({:.2f})'.format(iteration, acctrain, Ltrain, acctest, Ltest)
    print(title)
    data.plot_data_with_grid(Ygrid, title)

    curves[0].append(acctrain)
    curves[1].append(acctest)
    curves[2].append(Ltrain)
    curves[3].append(Ltest)

fig = plt.figure()
plt.plot(curves[0], label="acc. train")
plt.plot(curves[1], label="acc. test")
plt.plot(curves[2], label="loss train")
plt.plot(curves[3], label="loss test")
plt.legend()
plt.show()

# Part 2 : Simplify backward with `torch.autograd`

`torch.autograd` saves us a lot of hassle by computing gradients for us!

For that however, we need to tell pytorch the parameter tensors are part of a tracked computational graph by specifying they require gradients.


In [None]:
def init_params(nx, nh, ny):
    """
    nx, nh, ny: integers
    out params: dictionnary
    """
    params = {}

    params["Wh"] = torch.randn(nh, nx) * 0.3
    params["Wh"].requires_grad = True
    params["Wy"] = torch.randn(ny, nh) * 0.3
    params["Wy"].requires_grad = True

    params["bh"] = torch.zeros(nh,1, requires_grad=True)
    params["by"] = torch.zeros(ny,1, requires_grad=True)

    return params

`forward` and `loss_accuracy` are the same as before.

`backward` is not necessary anymore thanks to pytorch.

In [None]:
def sgd(params, eta):

    #####################
    ## Your code here  ##
    #####################
    # Update parameters with a SGD step
    # Take care to take this off the computation graph with torch.no_grad()
    # And remember to reset the gradients after use with .grad.zero_()

    params["Wh"] = None
    params["Wy"] = None
    params["bh"] = None
    params["by"] = None

    ####################
    ##      FIN        #
    ####################
    return params

And we put it all together in a training loop again.

**Question: What is the difference this time?**

In [None]:
# init
data = CirclesData()
data.plot_data()
N = data.Xtrain.shape[0]
Nbatch = 10
nx = data.Xtrain.shape[1]
nh = 10
ny = data.Ytrain.shape[1]
eta = 0.03

params = init_params(nx, nh, ny)

curves = [[],[], [], []]

# epoch
for iteration in range(150):

    # permute
    perm = np.random.permutation(N)
    Xtrain = data.Xtrain[perm, :]
    Ytrain = data.Ytrain[perm, :]

    #####################
    ## Your code here  ##
    #####################
    # batches
    for j in range(N // Nbatch):

        indsBatch = range(j * Nbatch, (j+1) * Nbatch)
        X = Xtrain[indsBatch, :]
        Y = Ytrain[indsBatch, :]

        # Write out the training code on batch inputs (X,Y)
        # Use our new forward, loss_accuracy, sgd
        # Gradients are taken care of by .backward() and .grad


    ####################
    ##      FIN        #
    ####################


    Yhat_train, _ = forward(params, data.Xtrain)
    Yhat_test, _ = forward(params, data.Xtest)
    Ltrain, acctrain = loss_accuracy(Yhat_train, data.Ytrain)
    Ltest, acctest = loss_accuracy(Yhat_test, data.Ytest)
    Ygrid, _ = forward(params, data.Xgrid)

    title = 'Iter {}: Acc train {:.1f}% ({:.2f}), acc test {:.1f}% ({:.2f})'.format(iteration, acctrain, Ltrain, acctest, Ltest)
    print(title)
    # detach() est utilisé pour détacher les predictions du graphes de calcul autograd
    data.plot_data_with_grid(Ygrid.detach(), title)

    curves[0].append(acctrain)
    curves[1].append(acctest)
    curves[2].append(Ltrain)
    curves[3].append(Ltest)

fig = plt.figure()
plt.plot(curves[0], label="acc. train")
plt.plot(curves[1], label="acc. test")
plt.plot(curves[2], label="loss train")
plt.plot(curves[3], label="loss test")
plt.legend()
plt.show()

# Part 3 : Deeper neural network (Bonus)

Fundamentally, adding an additional hidden layer would just add some computations in the forward and backward along with new weights.

Can you adapt the previous code and derivations to train a three layer neural network?

# Part 4 : MNIST (Bonus)


Adapt the previous code to the MNIST dataset MNIST.

In [None]:
# init
data = MNISTData()
N = data.Xtrain.shape[0]
Nbatch = 100
nx = data.Xtrain.shape[1]
nh = 100
ny = data.Ytrain.shape[1]
eta = 0.03