# Non linearity and activation functions

**Note :** to use this notebook in Google Colab, create a new cell with
the following line and run it.

``` shell
!pip install git+https://gitlab.in2p3.fr/jbarnier/ateliers_deep_learning.git
```

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import torch
from torch import nn

from adl import activations

In the previous notebooks, we used examples where we wanted to model a
linear relationship between several variables. But of course, in most
cases the relationship will not specially be linear.

Suppose we still want to model a relation between a single vector `x`
and a target `y`, but now the relationship is sinusoidal.

In [None]:
x = torch.tensor(np.linspace(-np.pi, np.pi, 100).reshape(-1, 1)).float()
y = x.sin()

plt.plot(x, y, ".")
plt.show()

We first try with a simple linear layer of size 1, which is equivalent
to doing a linear regression between `x` and `y`.

We first define and instantiate a model class with a single `nn.Linear`
layer with `in_features` and `out_features` of 1.

In [None]:
class SingleLinearModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.model = nn.Linear(1, 1)

    def forward(self, x):
        return self.model(x)


linear_model = SingleLinearModel()


We then use predefined functions to train the model and plot the target
values and the trained model predictions.

In [None]:
trained_model = activations.train(linear_model, x, y)
activations.plot(x, y, trained_model)

We can see that the result is a straight line which is not a good
representation of our data.

Maybe we could try to improve the model by adding another linear layer
with an hidden dimension of size 5? This way it could be able to capture
more nuanced relationships?

To do this we will modify the `model` attribute of our model class and
use `nn.Sequential`, which allows to define a series of layers which
will be applied sequentially to our input data.

In [None]:
class LinearModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.model = nn.Sequential(
            nn.Linear(1, 5),
            nn.Linear(5, 1),
        )

    def forward(self, x):
        return self.model(x)


linear_model = LinearModel()


trained_model = activations.train(linear_model, x, y)
activations.plot(x, y, trained_model)


We can see that the result is exactly the same: that’s because a
combination of linear transformations, at the end, is still a linear
transformation.

## Activation functions

To be able to capture non-linear relationships, deep neural networks use
*activation functions*, *ie* functions that will introduce non-linearity
between layers.

There are many available functions, below is a plot of three of them:
ReLU, Sigmoid and Tanh.

In [None]:
activation_fns = {"ReLU": nn.ReLU(), "Sigmoid": nn.Sigmoid(), "Tanh": nn.Tanh()}
activations.plot_activation_fns(activation_fns)

-   The ReLU function will keep all positive values as is, and transform
    all negative values to 0.
-   The Sigmoid function will map values between 0 and 1
-   The Tanh function will map values between -1 and 1

In general, activation functions are just transformative functions that
transform their inputs. They don’t “learn” anything during the training
process and don’t add any parameter to the model (there are some
exceptions like `PReLU`).

If we want to add an activation layer to our model, we just have to
insert it where we want, for example between our two linear layers.

Here is the result if we insert an `nn.ReLU()` layer.

In [None]:
class ReluModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.model = nn.Sequential(
            nn.Linear(1, 5),
            nn.ReLU(),
            nn.Linear(5, 1),
        )

    def forward(self, x):
        return self.model(x)


relu_model = ReluModel()

trained_model = activations.train(relu_model, x, y)
activations.plot(x, y, trained_model)


We can see that `ReLU` allows to break the linearity by creating some
sort of “steps” or “segments” that allow to much better fit our data.

We can replace `nn.ReLU` with `nn.Sigmoid()` to use a Sigmoid activation
layer instead.

In [None]:
class SigmoidModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.model = nn.Sequential(
            nn.Linear(1, 5),
            nn.Sigmoid(),
            nn.Linear(5, 1),
        )

    def forward(self, x):
        return self.model(x)


sigmoid_model = SigmoidModel()

trained_model = activations.train(sigmoid_model, x, y)
activations.plot(x, y, trained_model)


Due to the sinusoidal nature of our dataset, the Sigmoid function allows
to quite smoothly fit our data relationship.

## Exercise

We want to use a neural network to model the non-linear relationship
between the `xc` and `yc` variables plotted below.

In [None]:
xc = torch.tensor(np.linspace(-2.0, 2.0, 100)).reshape(-1, 1).float()
yc = torch.tensor([0.0] * 30 + [1.0] * 30 + [1.5] * 40).reshape(-1, 1)

plt.plot(xc, yc, ".")
plt.show()

Create the following three models, and for each model train it with the
`activations.train()` function and plot the results with the
`activations.plot()` function:

-   A linear model with two linear layers and an hidden dimension of 10
-   A model with two linear layers, an hidden dimension of 10, and a
    ReLU activation in between
-   A model with two linear layers, an hidden dimension of 3, and a Tanh
    activation in between