[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/nkeriven/ensta-mt12/blob/main/notebooks/07_MLP_NN/N4_Autograd.ipynb)

In [None]:
import torch

We define a simple quadratic function for which we know the gradient, and test torch autograd.

In [None]:
# define pd matrix Sigma
Sigma = torch.randn(5,5)
Sigma = Sigma @ Sigma.t() # we obtain a symmetric pd matrix

x = torch.rand(5, requires_grad=True)

value = x[None,:] @ Sigma @ x[:,None]
value.backward() # the magic happen here !
print(f'gradient with autograd {x.grad}')
print(f'Gradient with math {2*Sigma @x}')

In [None]:
print(loss)

### Exercize

Test a few other functions with autograd.

### Neural nets

We test simple neural nets on MNIST data.

In [None]:
!wget https://raw.githubusercontent.com/nkeriven/ensta-mt12/main/notebooks/data/zip_train_full.mat -O zip_train_full.mat
!wget https://raw.githubusercontent.com/nkeriven/ensta-mt12/main/notebooks/data/zip_test_full.mat -O zip_test_full.mat

In [None]:
%matplotlib inline

import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import scipy as sp
import scipy.io as spio

# Warning: put the data files in the notebook directory
data = spio.loadmat("zip_train_full.mat")
Xtrain = data["Xtrain_full"]
Ytrain = data["Ytrain_full"]
Xshape = Xtrain.shape
Ytrain = np.reshape(Ytrain, (Xshape[0],))
Yshape = Ytrain.shape

print("Xtrain is (n={},p={}) sized".format(Xshape[0], Xshape[1]))
print("Ytrain is a (n={},) sized vector of reponses".format(Yshape[0]))

data_test = spio.loadmat("zip_test_full.mat")
Xtest = data_test["Xtest_full"]
Ytest = data_test["Ytest_full"]
Ytest = np.reshape(Ytest, (Xtest.shape[0],))
print("Xtest is (n={},p={}) sized".format(Xtest.shape[0], Xtest.shape[1]))

#### Dataloader
Pytorch offers a class to automatically handle datasets and batch division, etc., the class `Dataloader`. Its input is a list of lists `[x, y]`

In [None]:
from torch.utils.data import DataLoader

temp = []
for i in range(len(Xtrain)):
   temp.append([Xtrain[i], Ytrain[i]])

trainloader = DataLoader(temp, shuffle=True, batch_size=32)

temp = []
for i in range(len(Xtest)):
   temp.append([Xtest[i], Ytest[i]])

testloader = DataLoader(temp, shuffle=True, batch_size=32)

#### Module
The main superclass to define a Neural Net is called `Module`. The main function it has to implement is the function `forward`, which computes the output of the neural net on `x`. **The first dimension of `x` is necessarily the batch size**!

Neurals Nets are formed of layers, often pre-implemented in Pytorch, themselves Modules. See the doc.

For a Multilayer perceptron, we consider only dense Linear layers, whose arguments are the input and output dimension. Let's consider 2 hidden layers.

In [None]:
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self, hidden_dim=30):
        super().__init__()
        self.lin1 = nn.Linear(256, hidden_dim) # implicitely, all parameters
        self.lin2 = nn.Linear(hidden_dim, hidden_dim)
        self.lin3 = nn.Linear(hidden_dim, 10)

    def forward(self, x):
        x = torch.flatten(x, 1) # flatten all dimensions except batch
        x = F.relu(self.lin1(x))
        x = F.relu(self.lin2(x))
        x = self.lin3(x)
        return x

net = Net()

In [None]:
# each layers has

print(net.lin1.weight) # of class Parameter, contains a tensor with requires_grad=True.
#All together, form the parameters of the Net
print(net.parameters)
print(list(net.parameters())[0]) # net.parameters() is a generator, like range

#### Optimizer
In pytorch, gradient descent are handled by `Optimizer`, like SGD, Adam, RMSProp, etc. They take a neural net as input, to optimizer over its net.parameters() that have requires_grad=True

In [None]:
import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.01, momentum=0.9)

In [None]:
# let's optimize !
import time
start=time.time()
for epoch in range(100):  # loop over the dataset multiple times
    
    total_loss = 0
    for data in trainloader:
        # a dataloader is a generator as well. When enumerated over, it yields batches one after the other
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data[0].float(), data[1]

        # zero the parameter gradients. DO NOT FORGET THIS !!!
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs) # compute the network forward
        loss = criterion(outputs, labels) # compute the loss
        loss.backward() # autograd: each parameter contains its gradient
        optimizer.step() # take a gradient step
        total_loss+=loss.item() # item takes only the value

    print(f'[Epoch {epoch + 1} loss: {total_loss / Xtrain.shape[0]}')

print(time.time()-start)
print('Finished Training')

In [None]:
# compute accuracy over test set
acc=0
for data in testloader:
    # a dataloader is a generator as well. When enumerated over, it yields batches one after the other
    # get the inputs; data is a list of [inputs, labels]
    inputs, labels = data[0].float(), data[1]

    # forward + backward + optimize
    outputs = net(inputs) # compute the network forward
    outputs = outputs.argmax(axis=1)
    acc += (outputs==labels).sum()
print(f'Accuracy {acc/Xtest.shape[0]}')

Draw a training error/test error curves vs. number of epochs. Run for many epochs. What do you observe ?