# From Numpy to PyTorch

**What will you learn in this notebook**:
- Tensors syntax
- Autograd
- Neural Network modules

Let's get back to last week exercise and migrate it to PyTorch. Luckily, the syntax is almost identical. The main difference is that *arrays* are replaced by *tensors*, and all the `np.*` functions become `torch.*`. For more advanced functionalities, we refer you to the [official documentation][torch_doc].

[torch_doc]: https://pytorch.org/docs/stable/index.html

## Single layer MLP in Numpy

Recall the feedforward neural network with a single hidden layer.

![simple_mlp](./simple_mlp.png)

Below is the Numpy implementation of the activations and the feedforward propagation

In [None]:
import numpy as np
from typing import Tuple
from numpy.typing import NDArray

def np_sigmoid(t):
    """apply sigmoid function on t."""
    return 1.0 / (1 + np.exp(-t))

def np_grad_sigmoid(t):
    """return the derivative of sigmoid on t."""
    return np_sigmoid(t) * (1 - np_sigmoid(t))

def np_mlp(
    x: NDArray[np.float_], w_1: NDArray[np.float_], w_2: NDArray[np.float_]
) -> Tuple[NDArray[np.float_], NDArray[np.float_], NDArray[np.float_]]:
    """Feed forward propagation on MLP

    Args:
        x (NDArray[np.float_]): Input vector of shape (d_in,)
        w_1 (NDArray[np.float_]): Parameter matrix of first hidden layer, of shape (d_in, d_hid)
        w_2 (NDArray[np.float_]): Parameter vector of output layer, of shape (d_hid,)

    Returns:
        Tuple[NDArray[np.float], NDArray[np.float], NDArray[np.float]]: Three
            arrays `y_hat`, `z_1`, `z_2`, containing repsectively the output and
            the two preactivations.
    """
    z_1 = w_1.T @ x
    x_1 = np_sigmoid(z_1)
    z_2 = w_2.T @ x_1
    y_hat = np_sigmoid(z_2)
    
    return y_hat, z_1, z_2


And this is the backpropagation with the Mean-squared error loss $\mathcal L (y, \hat y) = \frac{1}{2} \left( y - \hat y \right)^2$:

In [None]:
def np_mlp_backpropagation(
    y: NDArray[np.int_],
    x: NDArray[np.float_],
    w_2: NDArray[np.float_],
    y_hat: NDArray[np.float_],
    z_1: NDArray[np.float_],
    z_2: NDArray[np.float_],
) -> Tuple[NDArray[np.float_], NDArray[np.float_]]:
    """Do backpropagation and get parameter gradients.

    Args:
        y (NDArray[np.int_]): True label
        x (NDArray[np.float_]): Input data
        w_2 (NDArray[np.float_]): Readout layer parameters
        y_hat (NDArray[np.float_]): MLP output
        z_1 (NDArray[np.float_]): Hidden layer preactivations
        z_2 (NDArray[np.float_]): Readout layer preactivations

    Returns:
        Tuple[NDArray[np.float_], NDArray[np.float_]]: Gradients of w_1 and w_2
    """
    # Feed forward
    _loss = 0.5 * (y - y_hat)**2

    # Backpropogation
    delta_2 = (y_hat - y) * np_grad_sigmoid(z_2)
    x_1 = np_sigmoid(z_1)
    dw_2 = delta_2 * x_1
    delta_1 = delta_2 * w_2* np_grad_sigmoid(z_1)
    dw_1 = np.outer(x, delta_1)

    return dw_1, dw_2

Now, we can compute the MLP output and retrieve the gradients

In [None]:
x_np = np.array([0.01, 0.02, 0.03, 0.04])
w_1_np = np.random.randn(4, 5)
w_2_np = np.random.randn(5)

y = 1

y_hat_np, z_1, z_2 = np_mlp(x_np, w_1_np, w_2_np)
dw_1_np, dw_2_np = np_mlp_backpropagation(y, x_np, w_2_np, y_hat_np, z_1, z_2)

print(dw_1_np.shape)
print(dw_2_np.shape)

This indeed works, but as soon as we change the neural network architecture we have to change our backpropagation function, and keep track of all the computations that involve each parameter. It is a lot of work which we want to delegate to the machine.
This is what *automatic differentiation* does, and libraries like PyTorch implement it.

## Exercise 1

We can manipulate tensors as we want and, by asking for `require_grad=True`, PyTorch handles automatic differentation!

In [None]:
import torch

In [None]:
# EXAMPLE

a = torch.randn(10, 5)
b = torch.ones(5, requires_grad=True)

# Note that c is a scalar
c = torch.log(a @ b).sum()
print("c", c)
print("c.shape:", c.shape)
print()

# We ask to perform backpropagation
c.backward()

print("b.grad:", b.grad)
print("b.grad.shape:", b.grad.shape)


We now convert the previous code to PyTorch. Autograd is responsible of keeping track of each element in the computations, so we only need to implement the forward pass!

In [None]:
def sigmoid(t) -> torch.FloatTensor:
    """apply sigmoid function on t."""
    #vvvvv YOUR CODE HERE vvvvv#
    return ...

    #^^^^^^^^^^^^^^^^^^^^^^^^^^#

def mlp(
    x: torch.Tensor, w_1: torch.Tensor, w_2: torch.Tensor
) -> torch.Tensor:
    """Feed forward propagation on MLP

    Args:
        x (torch.Tensor): Input vector of shape (d_in,)
        w_1 (torch.Tensor): Parameter matrix of first hidden layer, of shape (d_in, d_hid)
        w_2 (torch.Tensor): Parameter vector of output layer, of shape (d_hid,)

    Returns:
        torch.Tensor: Network output
    """
    #vvvvv YOUR CODE HERE vvvvv#
    z_1 = ...
    x_1 = ...
    z_2 = ...
    y_hat = ...

    #^^^^^^^^^^^^^^^^^^^^^^^^^^#
    
    return y_hat


Now, we can verify that the output corresponds to the numpy implementation

In [None]:
#vvvvv YOUR CODE HERE vvvvv#

# Convert arrays to tensors. Mind that we will ask for parameters gradients!
x = torch.tensor(x_np)
w_1 = ...
w_2 = ...


y_hat = mlp(x, w_1, w_2)
loss = ...

#Now perform backpropagation
loss.backward()

#^^^^^^^^^^^^^^^^^^^^^^^^^^#

print(np.allclose(w_1.grad.numpy(), dw_1_np))
print(np.allclose(w_2.grad.numpy(), dw_2_np))


## Exercise 2

Computing gradients has now got much easier! :grin:

Still, PyTorch provides an even easier interface to build and train neural networks, whose components are in the `torch.nn` module.
The main tool is the `torch.nn.Module` class, from which all neural networks shall inherit. This must implement a `forward` method, and, if needed, declare its parameters in the `__init__` method. 

Let's convert our MLP to a proper Module

In [None]:
class MLP(torch.nn.Module):
    def __init__(self, dim_in: int, dim_hidden: int) -> None:
        #vvvvv YOUR CODE HERE vvvvv#
        super().__init__()

        self.w_1 = torch.nn.Parameter(
            torch.randn(...),
            requires_grad=True,
        )
        self.w_2 = ...

        #^^^^^^^^^^^^^^^^^^^^^^^^^^#
    
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        #vvvvv YOUR CODE HERE vvvvv#
        raise NotImplementedError

        #^^^^^^^^^^^^^^^^^^^^^^^^^^#

Even better, `torch.nn` comes with a lot of layers and functions which are ready to use.

For instance, we have a `torch.sigmoid` function, as well as `torch.nn.Linear` layer and a `torch.nn.MSELoss` loss.

Complete this minimal implementation:

In [None]:
from torch import nn

class MyMLP(nn.Module):
    def __init__(self, dim_in: int, dim_hidden: int) -> None:
        super().__init__()

        # NOTE: Linear has a `bias` term by default!
        #vvvvv YOUR CODE HERE vvvvv#
        self.linear1 = nn.Linear(...,  bias=False)
        self.linear2 = ...

        #^^^^^^^^^^^^^^^^^^^^^^^^^^#
    
    def forward(self, x):
        x = self.linear1(x).sigmoid()
        return self.linear2(x).sigmoid()

Now initialize your model and compute the gradients with resect to the MSE loss

In [None]:
DIM_IN = 5
DIM_HIDDEN = 10

x = torch.ones(DIM_IN)
y = torch.tensor([0.1])

#vvvvv YOUR CODE HERE vvvvv#

my_mlp = ...
print(my_mlp)

loss = ...

# Backpropagate
...

#^^^^^^^^^^^^^^^^^^^^^^^^^^#

## Question

Check the sizes of the gradients of each layer and verify that they correspond to what you expect.

In [None]:
#vvvvv YOUR CODE HERE vvvvv#

#^^^^^^^^^^^^^^^^^^^^^^^^^^#

The `nn.Sequential` module stacks the given layer one after the other.
Still, to get more control on the forward, is better to stick to self-defined module

In [None]:
sequential_mlp = nn.Sequential(
    nn.Linear(DIM_IN, DIM_HIDDEN, bias=False),
    nn.Sigmoid(),
    nn.Linear(DIM_HIDDEN, 1, bias=False),
    nn.Sigmoid(),
)

print(sequential_mlp)