# TP3

# Back on TP2

In [None]:
import numpy as np
import torch

We would like to solve $Ax = b$
with
$$
A = \begin{pmatrix}
   2 & -1 & 0 & 0 & \cdots & 0 & 0\\
   -1 & 2 & -1 & 0 &\cdots & 0 & 0 \\
   0 & -1 & 2 & -1 &\cdots & 0 & 0 \\
   0 & 0 & -1 & 2 &\cdots & 0 & 0 \\
   \vdots  & \vdots & \vdots & \vdots& \ddots & \vdots & \vdots  \\
   0 & 0 & 0 & 0 & \cdots & 2 & -1 \\
   0 & 0 & 0 & 0 & \cdots & -1 & 2 
 \end{pmatrix} \in \mathbb{R}^{n \times n} = \mathscr{M}_n(\mathbb{R})
 \text{ and } 
 b = 
 \begin{pmatrix}
 1 \\
 1 \\
 \vdots \\
 1 \\
 1
 \end{pmatrix} \in \mathbb{R}^n
$$

It is sufficient to minimize the functional :       
$$F : x \in \mathbb{R}^n \mapsto \frac{1}{2}\langle Ax,x \rangle - \langle b, x \rangle$$

1. Implement the matrix ```A``` and the vector ```b``` in ```torch``` with $n = 20$
2. Compute the gradient of F w.r.t a random vector ```x``` with ```torch.autograd```
3. Verify that the computation of the gradient is correct. .
3. Implement it using ```torch.optim.SGD```, a learning rate equal to 0.1, a momentum equal to 0.9 and ```10**3``` iterations.

In [None]:
#1.
n=20
A = 2*torch.eye(n)-torch.diag(torch.ones(n-1),1)-torch.diag(torch.ones(n-1),-1)
b = torch.ones(n)

In [None]:
#2
x = torch.randn(n,requires_grad = True)
F = 1/2*(torch.matmul(A,x)*x).sum() - (b*x).sum()
F.backward() #In 1D, equivalent to F.backward(gradient=torch.tensor(1.))

In [None]:
#3
print(torch.norm(x.grad-torch.matmul(A,x)+b))

In [None]:
#4
x = torch.randn(n,requires_grad = True)
optim = torch.optim.SGD([x], lr=1e-1,momentum=0.9)
Nit = 10**3
for k in range(Nit) :
  optim.zero_grad() #x.grad *= 0
  F = 1/2*(torch.matmul(A,x)*x).sum() - (b*x).sum()
  F.backward()
  optim.step() #x = x-lr*x.grad

In [None]:
print(torch.norm(torch.matmul(A,x)-b))


# Neural Networks



Neural networks can be constructed using the ``torch.nn`` package.

Now that you had a glimpse of ``autograd``, ``nn`` depends on
``autograd`` to define models and differentiate them.
An ``nn.Module`` contains layers, and a method ``forward(input)`` that
returns the ``output``.

A typical training procedure for a neural network is as follows:

- Define the neural network that has some learnable parameters (or
  weights)
- Iterate over a dataset of inputs
- Process input through the network
- Compute the loss (how far is the output from being correct)
- Propagate gradients back into the network’s parameters
- Update the weights of the network, typically using a simple update rule:
  ``weight = weight - learning_rate * gradient``

## Define the network

Let’s define a network:


In [None]:
%matplotlib inline

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 5x5 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 5 * 5, 120)  # 5*5 from image dimension 
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square, you can specify with a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = torch.flatten(x, 1) # flatten all dimensions except the batch dimension
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net()
print(net)

 <font color='blue'> **Question** : Describe the architecture of the network. </font>

 <font color='blue'> **Question** : Print the weights and the bias of the first convolution </font>

You just have to define the ``forward`` function, and the ``backward``
function (where gradients are computed) is automatically defined for you
using ``autograd``.
You can use any of the Tensor operations in the ``forward`` function.

The learnable parameters of a model are returned by ``net.parameters()``



In [None]:
params = list(net.parameters())
print(len(params))
print(params[0].size())  # conv1's .weight

**Be careful !** When we speak about the number of parameters of a network, we speack about the number of real numbers in the matrices, biases and kernels involved in the network.

 <font color='blue'> **Question** : How many parameters are in this network ? </font>

Let's try a random 32x32 input.


 <font color='blue'> **Question** : Test the network on a random input which is a grayscale image (one channel) of size $32 \times 32$. </font>

In [None]:
input = #TODO
out = net(input)
print(out)

<div class="alert alert-info"><h4>Note</h4><p>``torch.nn`` only supports mini-batches. The entire ``torch.nn``
    package only supports inputs that are a mini-batch of samples, and not
    a single sample.

    For example, ``nn.Conv2d`` will take in a 4D Tensor of
    ``nSamples x nChannels x Height x Width``.

    If you have a single sample, just use ``input.unsqueeze(0)`` to add
    a fake batch dimension.</p></div>

Before proceeding further, let's recap all the classes you’ve seen so far.

**Recap:**
  -  ``torch.Tensor`` - A *multi-dimensional array* with support for autograd
     operations like ``backward()``. Also *holds the gradient* w.r.t. the
     tensor.
  -  ``nn.Module`` - Neural network module. *Convenient way of
     encapsulating parameters*, with helpers for moving them to GPU,
     exporting, loading, etc.
  -  ``nn.Parameter`` - A kind of Tensor, that is *automatically
     registered as a parameter when assigned as an attribute to a*
     ``Module``.
  -  ``autograd.Function`` - Implements *forward and backward definitions
     of an autograd operation*. Every ``Tensor`` operation creates at
     least a single ``Function`` node that connects to functions that
     created a ``Tensor`` and *encodes its history*.

**At this point, we covered:**
  -  Defining a neural network
  -  Processing inputs

**Still Left:**
  -  Calling backwad
  -  Computing the loss
  -  Updating the weights of the network

## Loss Function
A loss function takes the (output, target) pair of inputs, and computes a
value that estimates how far away the output is from the target.

There are several different
[loss functions](https://pytorch.org/docs/nn.html#loss-functions) under the
nn package .
A simple loss is: ``nn.MSELoss`` which computes the mean-squared error
between the output and the target.

For example:



In [None]:
output = net(input)
target = torch.randn(10)  # a dummy target, for example
target = target.view(1, -1)  # make it the same shape as output
criterion = nn.MSELoss()

loss = criterion(output, target)
print(loss)

Now, if you follow ``loss`` in the backward direction, using its
``.grad_fn`` attribute, you will see a graph of computations that looks
like this:



    input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d
          -> flatten -> linear -> relu -> linear -> relu -> linear
          -> loss

So, when we call ``loss.backward()``, the whole graph is differentiated
w.r.t. the neural net parameters, and all Tensors in the graph that have
``requires_grad=True`` will have their ``.grad`` Tensor accumulated with the
gradient.



## Backprop
To backpropagate the error all we have to do is to ``loss.backward()``.
You need to clear the existing gradients though, else gradients will be
accumulated to existing gradients.


Now we shall call ``loss.backward()``, and have a look at conv1's bias
gradients before and after the backward.



In [None]:
net.zero_grad()     # zeroes the gradient buffers of all parameters

print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

loss.backward()

print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)

##  <font color='blue'> Exercise </font>

1. We would like to define a network that classifies images of size $(3,32,32)$.
Construct a class Net() which applies : 
- a convolution with $6$ channels out and a kernel of size $5 \times 5$, followed by a ReLu
- a max pooling of size $2 \times 2$
- a convolution with $16$ channels out and a kernel of size $5 \times 5$, followed by a ReLu
- a linear layer with an output size $120$, followed by a ReLu
- a linear layer with an output size $84$, followed by a ReLu
- a linear layer with an output size $10$

Verify that it works with a random input.

2. Draw the network as it is done in the last slides on the course.

In [None]:
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        #TODO

    def forward(self, x):
      #TODO


net = Net()

In [None]:
x = #TODO
net(x)

## <font color='blue'> For your information : images with tensor </font>

- Quand on ouvre une image avec torch, elle est au format "pil"
- On peut la convertir en tensor, elle sera de shape [3,M,N], à valeurs dans [0,1]
- Pour convertir un tensor en format pil, il faut d'abord appliquer un *.clip(0,1)* pour le rendre à valeurs dans [0,1]
- Pour l'afficher, il faut la re-convertir en format pil et utiliser *display*
- Pour l'enregistrer, il faut la re-convertir en format pil et utiliser *.save('nom.png')*
- Pour afficher une image en tensor avec plt.imshow, il faut la convertir en numpy et la transformer en shape [M,N,3]
- L'avantage de display est que cela rend compte de la vraie taille de l'image. *plt.imshow()* effectue des interpolations cachées.

In [None]:
import torch
from torchvision.transforms.functional import to_tensor, to_pil_image
from IPython.display import display
from PIL import Image
import os

#Open an image
os.system("wget -c  https://render.fineartamerica.com/images/rendered/default/framed-print/images-medium-5/simeon-denis-poisson-french-physicist-science-source.jpg?imgWI=7.5&imgHI=10&sku=CRQ13&mat1=PM918&mat2=&t=2&b=2&l=2&r=2&off=0.5&frameW=0.875")
img_pil = Image.open("simeon-denis-poisson-french-physicist-science-source.jpg?imgWI=7.5")

#Afficher l'image
img_as_tensor = to_tensor(img_pil)
print(img_as_tensor.shape)
img_as_tensor = img_as_tensor.clip(0,1) #Le clip sert à assurer que les valeurs de l'images soient comprises entre 0 et 1 ( évite un warning de Python )
pil_img = to_pil_image(img_as_tensor) #Ceci convertit le tensor au Format PIL
display(pil_img)

#Enregistrer l'image
#pil_img.save('Simeon.png')

#si on voulait utiliser plt.imshow
import pylab as plt 
import numpy as np
npimg = img_as_tensor.numpy()
print(np.transpose(npimg, (1, 2, 0)).shape)
plt.imshow(np.transpose(npimg, (1, 2, 0)))
plt.show()


##  <font color='blue'> Exercise (eventually done during the next TP) </font>

Back to the TP1

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn.datasets import make_moons, make_circles, make_classification, make_blobs, make_gaussian_quantiles
from sklearn import metrics
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from IPython.display import display, clear_output

In [None]:
n_class = 3
# Three examples of synthetic 2D datasets:
X, t = make_blobs(n_features=2, centers = n_class,n_samples=100) 
X, t = make_classification(n_features=2, n_redundant=0, n_informative=2, random_state=24, n_classes=n_class, n_clusters_per_class=1,n_samples=200)
X, t = make_gaussian_quantiles(n_features=2, n_classes=n_class, n_samples=500)

X = StandardScaler().fit_transform(X)

X_train, X_test, t_train, t_test = train_test_split(X, t, test_size=.4, random_state=12)

# Number of points in each set:
N_train = X_train.shape[0]
N_test = X_test.shape[0]

figure = plt.figure(figsize=(10, 10))
plt.scatter(X_train[:, 0], X_train[:, 1], marker='o', c=t_train, s=50, edgecolor='k')
plt.scatter(X_test[:, 0], X_test[:, 1], marker='P', c=t_test, s=50, edgecolor='k');
plt.show()

 <font color='blue'> **Question :** Code a network composed of a linear layer with ouput size $d$ followed by a Relu and a final layer allowing the classification  </font>

In [None]:
import torch.nn as nn
import torch.nn.functional as F

d = 4

class Net(nn.Module):
    def __init__(self):
      #TODO

    def forward(self, x):
      #TODO


net = Net()

 <font color='blue'> **Question :** Complete the next cell </font>

In [None]:
import torch.optim as optim

criterion =  #TODO
optimizer = optim.SGD(______________, lr=0.001, momentum=0.9) #TODO

#Conversion of the numpy in tensors
X_train, X_test, t_train, t_test = torch.tensor(X_train, dtype=torch.float32), torch.tensor(X_test, dtype=torch.float32), torch.tensor(t_train, dtype=torch.int64), torch.tensor(t_test, dtype=torch.int64)

In [None]:
#training of the network

for epoch in range(100):  # loop over the dataset multiple times

    for i, x in enumerate(X_train, 0):
        # get the inputs; data is a list of [inputs, labels]
        x = x.unsqueeze(0)
        t = t_train[i].unsqueeze(0)

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        output = net(x)
        loss = criterion(output, t)
        loss.backward()
        optimizer.step()

print('Finished Training')

In [None]:
#visualize results:

x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5
h = 0.02
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                        np.arange(y_min, y_max, h))
X_grid = np.hstack((xx.ravel(), yy.ravel()))

N_grid = xx.ravel().shape[0]
X_grid = np.c_[xx.ravel(), yy.ravel()]

feature_transform = lambda x : (net(torch.tensor(x, dtype=torch.float32).unsqueeze(0)).detach().numpy())

Phi_grid = feature_transform(X_grid)

Z =np.argmax(Phi_grid,axis=2)
Z = Z.reshape(xx.shape)

figure = plt.figure(figsize=(16, 8))
ax = plt.subplot(1,2,1)
ax.set_title("Input data")
ax.scatter(X_train[:, 0], X_train[:, 1], marker='o', c=t_train, s=50, edgecolor='k')
ax.scatter(X_test[:, 0], X_test[:, 1], marker='P', c=t_test, s=50, edgecolor='k')
ax = plt.subplot(1,2,2)
cmap = ListedColormap(['b','y','r','m','g','c'])
plt.contourf(xx,yy,Z,  cmap = cmap, alpha=.8)
ax.scatter(X_train[:, 0], X_train[:, 1], marker='o', c=t_train, s=50, edgecolor='k')
ax.scatter(X_test[:, 0], X_test[:, 1], marker='P', c=t_test, s=50, edgecolor='k')