# Basic tutorial: regression task
#### Author: Matteo Caorsi

This short tutorial provides you with the basic functioning of *giotto-deep* API.

The main steps of the tutorial are the following:
 1. creation of a dataset
 2. creation of a model
 3. define metrics and losses
 4. run trainig
 5. visualise results interactively

In [1]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

import numpy as np
import plotly.express as px
import torch
from torch import nn
import pandas as pd

from sklearn import datasets

from gdeep.models import FFNet
from gdeep.models import ModelExtractor
from gdeep.analysis.interpretability import Interpreter
from torch.optim.lr_scheduler import ExponentialLR
from torch.optim import SGD, Adam

from gdeep.visualisation import  persistence_diagrams_of_activations
from gdeep.pipeline import Pipeline

from torch.utils.tensorboard import SummaryWriter
from gdeep.data import TorchDataLoader, DataLoaderFromArray

from gtda.diagrams import BettiCurve

from gtda.plotting import plot_betti_surfaces

No TPUs...


# Initialize the tensorboard writer

In order to analyse the reuslts of your models, you need to start tensorboard.
On the terminal, move inside the `/example` folder. There run the following command:

```
tensorboard --logdir=runs
```

Then go [here](http://localhost:6006/) after the training to see all the visualisation results.

In [2]:
writer = SummaryWriter()

# Create your dataset

In [3]:
X_train = np.random.rand(100,3)
y_train = 0.3*np.array(list(map(sum,X_train)))  # a hyperplane

X_val = np.random.rand(50,3)
y_val = 0.3*np.array(list(map(sum,X_train)))

dl = DataLoaderFromArray(X_train, y_train, X_val, y_val)
dl_tr, dl_val, _ = dl.build_dataloaders(batch_size=32)


## Define and train your model

In [4]:

class model1(nn.Module):
    def __init__(self):
        super(model1, self).__init__()
        self.seqmodel = FFNet(arch=[3, 5, 1])
    def forward(self, x):
        return self.seqmodel(x)

model = model1()

In [5]:

loss_fn = nn.MSELoss()

pipe = Pipeline(model, (dl_tr, dl_val), loss_fn, writer)

# train the model with learning rate scheduler
pipe.train(Adam, 3, False, lr_scheduler=ExponentialLR, scheduler_params={"gamma": 0.9}, 
           profiling=False, store_grad_layer_hist=True, writer_tag="line")


Epoch 1
-------------------------------
Epoch training loss: 0.241605 	Epoch training accuracy: 0.00%                                             
Time taken for this epoch: 0.00s
Learning rate value: 0.00100000
Validation results: 
 Accuracy: 0.00%,                 Avg loss: 0.220827 

Epoch 2
-------------------------------
Epoch training loss: 0.223411 	Epoch training accuracy: 0.00%                                             
Time taken for this epoch: 0.00s
Learning rate value: 0.00090000
Validation results: 
 Accuracy: 0.00%,                 Avg loss: 0.208553 

Epoch 3
-------------------------------
Epoch training loss: 0.212655 	Epoch training accuracy: 0.00%                                             
Time taken for this epoch: 0.00s
Learning rate value: 0.00081000



Using a target size (torch.Size([32])) that is different to the input size (torch.Size([32, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.


Using a target size (torch.Size([16])) that is different to the input size (torch.Size([16, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.


Using a target size (torch.Size([20])) that is different to the input size (torch.Size([20, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.



Validation results: 
 Accuracy: 0.00%,                 Avg loss: 0.197926 



(0.19792625308036804, 0.0)

In [6]:
pipe.model(torch.tensor([[1,1,1]]).float())

tensor([[-0.0731]], grad_fn=<AddmmBackward0>)

Let's train the model with cross validation: we just have to set the parameter `cross_validation = True`.

The `keep_training = True` flag allow us to restart from the same scheduler, optimiser and trained model obtained at thhe end of the last training in the instance of the class `pipe`.

In [7]:
# train the model with CV
pipe.train(SGD, 3, cross_validation=True, keep_training=True)

# since we used the keep training flag, the optimiser has not been modified compared to the previous training.
print(pipe.optimizer)



********** Fold  1 **************
Epoch 1
-------------------------------
Epoch training loss: 0.205768 	Epoch training accuracy: 0.00%                                             
Time taken for this epoch: 0.00s
Learning rate value: 0.00072900
Validation results: 
 Accuracy: 0.00%,                 Avg loss: 0.189330 

Epoch 2
-------------------------------
Epoch training loss: 0.190112 	Epoch training accuracy: 0.00%                                             
Time taken for this epoch: 0.00s
Learning rate value: 0.00072900
Validation results: 
 Accuracy: 0.00%,                 Avg loss: 0.179998 

Epoch 3
-------------------------------
Epoch training loss: 0.176886 	Epoch training accuracy: 0.00%                                             
Time taken for this epoch: 0.00s
Learning rate value: 0.00072900
Validation results: 
 Accuracy: 0.00%,                 Avg loss: 0.171004 



********** Fold  2 **************
Epoch 1
-------------------------------
Epoch training loss: 0.2

# Extract inner data from your models

In [8]:

me = ModelExtractor(pipe.model, loss_fn)

lista = me.get_layers_param()
for k, item in lista.items():
    print(k,item.shape)


seqmodel.linears.0.weight torch.Size([5, 3])
seqmodel.linears.0.bias torch.Size([5])
seqmodel.linears.1.weight torch.Size([1, 5])
seqmodel.linears.1.bias torch.Size([1])


In [9]:
x = next(iter(dl_tr))[0]
list_activations = me.get_activations(x)
len(list_activations)


4

# Visualise activations and other topological aspects of your model

In [10]:
from gdeep.visualisation import Visualiser

vs = Visualiser(pipe)

vs.plot_data_model()
vs.plot_activations(x)


Sending the plots to tensorboard: 
Step 4/4