<a href="https://colab.research.google.com/github/kscaman/DL_ENS/blob/main/TP/MLP_vs_toy_dataset.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Training and visualizing MLPs on toy datasets

In this practical, we are going to learn how to train a simple feed forward neural network (aka Multi-Layer Perceptron or MLP) on a synthetic dataset and visualize the result.

First import the three libraries we will use: **torch**, **numpy** (as np) and **matplotlib.pyplot** (as plt).

In [None]:
### YOUR CODE HERE ###

## Data generation

Then, create a function `create_checkerboard` that returns two pytorch Tensors: `X` of shape $(100,2)$ where $100$ is the number of data points and each point is drawn uniformly at random in $[-2,2]^2$, and `y` of shape $(100,2)$ where $y_i=(1,0)$ if $X_{i,1} \cdot X_{i,2} > 0$, and $y_i=(0,1)$ otherwise. The vectors $y_i$ are called **one-hot** encodings of the classes 0 and 1.

**IMPORTANT:** Always test the shape of your tensor with `print(X.shape)` to verify that your are computing the right quantity.



In [None]:
### YOUR CODE HERE ###

# Visualizing the dataset
X, y = create_checkerboard()
print(X.shape, y.shape)
plt.scatter(X[:, 0], X[:, 1], c=y[:,0])
plt.xlabel("$X_1$")
plt.ylabel("$X_2$")
plt.title("Checkerboard dataset")
plt.show()

## Model creation

First, let's create a two layer MLP with ReLU activations, 2 inputs, 2 outputs, and 10 internal neurons using `torch.nn.Sequential`, `torch.nn.Linear` and `torch.nn.ReLU` (see documentation).

Verify that the model outputs a 2-dimensional vector for each input data point on the checkerboard dataset.

In [None]:
### YOUR CODE HERE ###

## Training pipeline

The following code is a routine to visualize the output of the network and loss during training. Note that `epoch` is the number of epochs of training, data is a tuple containing `X, y`, `model` is a neural network and `losses` is a list of loss values.

In [None]:
from IPython.display import clear_output

def visualize(epoch, data, model, losses):
    X, y = data
    # Creates a 2d grid
    xx, yy = np.meshgrid(np.linspace(-2.5, 2.5, 100), np.linspace(-2.5, 2.5, 100))

    # Uses the model to predict the class
    grid_tensor = torch.FloatTensor(np.stack([xx.ravel(), yy.ravel()], axis=1))
    with torch.no_grad():
        Z = model(grid_tensor)
        Z = Z[:,0] - Z[:,1]

    # Reshapes the predictions to fit the grid
    Z = Z.reshape(xx.shape)

    # Plots the output of the neural network
    fig = plt.figure(1, figsize=(10, 4))
    plt.subplot(1,2,1)
    CS = plt.contourf(xx, yy, Z, alpha=0.8)
    fig.colorbar(CS)
    plt.scatter(X[:, 0], X[:, 1], c=y[:,0])
    plt.xlim([-2,2])
    plt.ylim([-2,2])
    plt.xlabel("$X_1$")
    plt.ylabel("$X_2$")
    plt.title(f"Classification with an MLP (epoch = {epoch})")

    # Plots the loss function
    plt.subplot(1,2,2)
    plt.plot(losses)
    plt.ylim([0,1])
    plt.xlabel("Epoch")
    plt.ylabel("Loss")
    plt.title("Loss vs. number of epochs")
    plt.tight_layout()
    clear_output()
    plt.show()


Our objective is to push the model, for an input `X` to return an output that matches `y`. To do so we will minimize the **mean square error** $MSE = \frac{1}{N}\sum_{i=1}^N \|\mbox{model}(X_i) - y_i\|^2$ over the entire dataset using gradient descent.

Create a function `train(data, model, epochs)` that takes a dataset of inputs and outputs `X, y`, a neural network `model` and a number of `epochs`, and trains the neural network on the dataset for the specified number of epochs. You will need to create a loop over all epochs, in which you will:

1. Prepare gradient computations using `model.zero_grad()`
2. Compute the output of the `model` over the input data `X`
3. Compute the mean square error loss over the dataset.
4. Use `loss.backward()` to automatically compute the gradients of the loss wrt the model parameters.
5. Make one step of gradient descent by updating each parameter `p` in `model.parameters()` with the formula $p = p - \eta \nabla L(p)$ where $\eta=10^{-2}$ is the step-size and $L$ is the loss. Note that the gradient computed by `loss.backward()` is accessible in `p.grad`.
6. Once every 1000 epochs, visualize the output of the neural network and loss.


In [None]:
### YOUR CODE HERE ###

Finally, train your MLP on the checkerboard dataset for 10000 epochs. Is it predicting the correct classes? Try with 2, 10, and 100 neurons in the internal layer.

In [None]:
### YOUR CODE HERE ###

## Harder dataset

The following code create a more challenging dataset consisting of two intertwined spirals.

In [None]:
def create_spiral():
    # Créez une spirale
    theta = np.linspace(0, 4 * np.pi, 100)
    r = np.linspace(0.5, 2, 100)
    x = r * np.cos(theta)
    y = r * np.sin(theta)

    # Créez une autre spirale avec une classe différente
    theta = np.linspace(0, 4 * np.pi, 100)
    r = np.linspace(0.5, 2, 100) + 0.3
    x2 = r * np.cos(theta)
    y2 = r * np.sin(theta)

    # Fusionnez les deux ensembles de données
    X = np.vstack((np.hstack((x, x2)), np.hstack((y, y2)))).T
    y = np.hstack((np.zeros(100), np.ones(100)))

    # De Numpy arrays à PyTorch Tensors
    X = torch.FloatTensor(X)
    y = torch.LongTensor([y,1-y]).T

    return X, y

# Visualisez les données
X, y = create_spiral()
plt.scatter(X[:, 0], X[:, 1], c=y[:,0])
plt.xlabel("$X_1$")
plt.ylabel("$X_1$")
plt.title("Spiral dataset")
plt.show()

Train your model (craete a new model to reinitialize the parameters) on the spiral dataset for 50000 epochs. Is it predicting the correct classes? Try increasing the number of neurons.

In [None]:
### YOUR CODE HERE ###