# Basic tutorial: regression task
#### Author: Matteo Caorsi

This short tutorial provides you with the basic example of regression.
Regression tasks can be performed in *giotto-deep* as easily as classification ones.

## Scope

The scope of a regression task in similar to a classificaiton one, with the difference that the space of classes is infinite! Hence, we cannot really ho to extend the losses and metrics of classificaiton tasks directly to regressions ones: however, the models, the training, th optimisation,... remain basically the same!

![img](./images/regression.jpeg)

In this tutorial we will try to fit a line in a 3D space: not super exciting indeed, but then you can have fun and use these tools to fit the curves in the *forex market* for example!

## Content

The main steps of the tutorial are the following:
 1. creation of a dataset
 2. creation of a model
 3. define metrics and losses
 4. run trainig
 5. visualise results interactively

In [None]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline
import numpy as np
import plotly.express as px
import torch
from torch import nn
import pandas as pd
from sklearn import datasets
from gtda.diagrams import BettiCurve
from gtda.plotting import plot_betti_surfaces
from torch.optim.lr_scheduler import ExponentialLR
from torch.optim import SGD, Adam

from gdeep.search import GiottoSummaryWriter
from gdeep.models import FFNet
from gdeep.visualization import persistence_diagrams_of_activations
from gdeep.trainer import Trainer
from gdeep.data.datasets import DatasetBuilder, FromArray, DataLoaderBuilder
from gdeep.utility import DEVICE


# Initialize the tensorboard writer

In order to analyse the results of your models, you need to start tensorboard.
On the terminal, move inside the `/example` folder. There you can run the following command:

```
tensorboard --logdir=runs
```

Then go [here](http://localhost:6006/), after the training phase, to see all the visualization results.

In [None]:
writer = GiottoSummaryWriter()


# Create your dataset

The dataset that we creeate is a 3D dataset representing a noisy hyperplane. The `X` is generated at random while the `y` has a linear relation eith `X`. The goal is to see if we can get `0.3` as linear coefficient once the modl has been trained.

In [None]:
X_train = np.array(np.random.rand(100, 3), dtype=np.float32)
y_train = np.array(
    0.3 * np.array(list(map(sum, X_train))).reshape(-1, 1), dtype=np.float32
)  # a hyperplane

X_val = np.array(np.random.rand(50, 3), dtype=np.float32)
y_val = np.array(
    0.3 * np.array(list(map(sum, X_train))).reshape(-1, 1), dtype=np.float32
)

dl_builder = DataLoaderBuilder((FromArray(X_train, y_train), FromArray(X_val, y_val)))
dl_tr, dl_val, _ = dl_builder.build(({"batch_size": 32}, {"batch_size": 16}))


## Define and train your model

The model is a simple feed-forward network: simple task, simple mdoel.

In [None]:
class model1(nn.Module):
    def __init__(self):
        super(model1, self).__init__()
        self.seqmodel = FFNet(arch=[3, 5, 1])

    def forward(self, x):
        return self.seqmodel(x)


model = model1()


## Define your metric

In case of regression tasks, accuracy is not really a good metric. We propose here to compuite the $L_1$ norm between the prediction and the input

In [None]:
def l1_norm(prediction, y):
    return torch.norm(prediction - y, p=1).to(DEVICE)


# Train the model

We are finally here: after having set up the dataset, the model, thee metrics... we are now ready to put all of this together with giotto-deep into a `Trainer` class. Check the cell below.

In [None]:
loss_fn = nn.MSELoss()

pipe = Trainer(model, (dl_tr, dl_val), loss_fn, writer, l1_norm)

# train the model with learning rate scheduler
pipe.train(
    Adam,
    3,
    False,
    lr_scheduler=ExponentialLR,
    scheduler_params={"gamma": 0.9},
    profiling=False,
    store_grad_layer_hist=True,
    writer_tag="line",
)


## Checking the results

Did we managed to get `0.3` after this fisrt part of the training?

In [None]:
pipe.model(torch.tensor([[1, 1, 1]]).float().to(DEVICE))


Let's **train again** the model with cross validation: we just have to set the parameter `cross_validation = True`.

The `keep_training = True` flag allow us to resume the trtaining from the same scheduler, optimiser and trained model obtained at the end of the last training in the instance of the class `pipe`.

In [None]:
# train the model with CV
pipe.train(SGD, 3, cross_validation=True, keep_training=True)

# since we used the keep training flag, the optimiser has not been modified compared to the previous training.
print(pipe.optimizer)


Did we manage this time to get `0.3`?

In [None]:
# evaluation
pipe.model(torch.tensor([[1, 1, 1]]).float().to(DEVICE))
