# Multilayer perceptron with PyTorch

### Problem formulation

Let $\mathbf{x} \in \mathbb{R}^2$ denote the input vector and $y \in \{0,1\}$ the corresponding label.

We assume that there exist a function $f(\cdot; \boldsymbol\theta): \mathbb{R}^2 \mapsto [0,1]$ parametrized by $\boldsymbol\theta$ such that:

$$p(y=1|\mathbf{x} ; \theta) = f(\mathbf{x}; \boldsymbol\theta) = \hat{y}, \qquad p(y=0|\mathbf{x} ; \theta) = 1- f(\mathbf{x}; \boldsymbol\theta) = 1- \hat{y}$$

Let's first load the data.

In [None]:
my_seed = 1
import numpy as np
np.random.seed(my_seed)
import torch
torch.manual_seed(my_seed)
import torch.nn as nn
import matplotlib.pyplot as plt
from utils import plot_data, plot_decision_boundary
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split

In [None]:
X, Y = make_moons(n_samples=2000, noise=0.1)

X_train, X_val, Y_train, Y_val = train_test_split(X, Y, test_size=0.20, random_state=my_seed)

X_train_tensor = torch.from_numpy(X_train).float()
Y_train_tensor = torch.from_numpy(Y_train).float()
X_val_tensor = torch.from_numpy(X_val).float()
Y_val_tensor = torch.from_numpy(Y_val).float()

print(X_train_tensor.shape)
print(Y_train_tensor.shape)
print(X_val_tensor.shape)
print(Y_val_tensor.shape)

In [None]:
%matplotlib inline
fig, ax = plt.subplots(1, 1, facecolor='#4B6EA9')
ax.set_title('training data')
plot_data(ax, X_train, Y_train)

### Model

In the following cell, define a class `MyFirstMLP` that implements a simple multi-layer perceptron with one hidden layer. This class should inherits from [torch.nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module), the base class for all neural network modules in PyTorch.

In [None]:
class MyFirstMLP(nn.Module):
    # TO DO

We can instanciate this MLP, which creates a model with randomly initialized parameters. Let's have a look to the resulting decision boundary.

In [None]:
my_nn = MyFirstMLP(input_dim=2, output_dim=1, hidden_dim=50)

fig, ax = plt.subplots(1, 1, facecolor='#4B6EA9')
plot_decision_boundary(ax, X_train_tensor, Y_train_tensor, my_nn, use_tensor=True)

### Training

Now we need to train the model from a labeled training dataset $\mathcal{D} = \{\mathbf{x}_i, y_i\}_{i=1}^N$. As seen during the lesson, we need to define a loss function:

$$\mathcal{L}(\theta) = \frac{1}{N} \sum_{i=1}^N \ell(y_i, \hat{y}_i = f(\mathbf{x}_i; \boldsymbol\theta)).$$

For binary classification we use the **binary cross-entropy** loss:

$$ \ell(y, \hat{y}) = - (y \ln(\hat{y}) + (1-y)\ln(1-\hat{y})). $$

To estimate the model parameters $\boldsymbol\theta$ we have to minimize the loss function $\mathcal{L}(\boldsymbol\theta)$. To do so, we can use the [gradient descent](https://en.wikipedia.org/wiki/Gradient_descent) algorithm. It is an iterative algorithm which consists in iterating:

$$ \boldsymbol\theta \leftarrow \boldsymbol\theta - \gamma \nabla \mathcal{L}(\boldsymbol\theta), $$

where $\gamma$ is the learning rate. Both the learning rate and the initialization of the parameters have a critical influence on the behavior of the algorithm.

We have seen during the lesson that the gradient is computed using an algorithm called backpropagation. Fortunately, PyTorch handles this step automatically.

In the following cell, implement the PyTorch pipeline to train the model `my_nn`.

In [1]:
train_loss = []
val_loss = []
# TO DO

In [None]:
plt.plot(train_loss)
plt.plot(val_loss)
plt.legend(["training", "validation"])
plt.title("loss")
plt.xlabel("epochs")

In [None]:
fig, ax = plt.subplots(1, 1, facecolor='#4B6EA9')
plot_decision_boundary(ax, X_train_tensor, Y_train_tensor, my_nn, use_tensor=True)

## Questions

- What results do you obtain if you remove the hidden layer? Why?
- What results do you obtain if you add one or several hidden layers?
- What happens if you choose a learning rate that is either too low or too high?
- How can you use the validation set to choose the number of hidden layers and the number of training iterations?