# Basic Tutorial : Regularization with Giotto-deep
#### Author: Henry Kirveslahti

This short tutorial introduces the regularization techniques integrated to *giotto-deep*. We briefly introduce the topic and show how one can fit a LASSO-regularized model with *giotto-deep*.

This notebook is organized as follows:

1. Introduction
2. Data generation and an unregularized regression model
3. Using Tihonov-type regularizers with *giotto-deep*
4. Comparing the results from regularized and unregularized models
5. Concluding remarks and further reading


## 1. Introduction

Regularization is a powerful tool that aims to combat overfitting the data. If we want a model that is flexible enough to fit complex data, we need many parameters. In these cases it may be that there are multiple choices for the parameters to fit the data very well, and the the problem becomes ill-defined - there is no unique solution. When this is the case, we may introduce preference to a model that is in some way simpler, and we may do this with the help of a *regularizer*. This is the subject of this notebook.

On a high level, without regularization, our loss $L$ is a function of the input data $X$ and the response variable y:
$$
L=L(X,y).
$$
In *giotto-deep*, these cases are handled by the usual Trainer class.

However, when we want our loss to also depend on the model $M$ itself, that is,

$$
L=L(X,y,M),
$$
we need to use the RegularizedTrainer class.


## 2. Example - Unregularized model
In this section we generate some data and fit a standard regression model with *giotto-deep*. First we import some dependencies:

In [None]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline
import numpy as np
import torch
import torch.nn as nn
from torch.optim import SGD, Adam, RMSprop
import matplotlib.pyplot as plt
from torch.utils.data import TensorDataset, DataLoader
from sklearn.model_selection import train_test_split
from gdeep.trainer import Trainer
from gdeep.trainer.regularizer import TihonovRegularizer
from gdeep.search import GiottoSummaryWriter
from gdeep.models import ModelExtractor
from gdeep.utility import DEVICE
writer = GiottoSummaryWriter()

### Data generation
Next we generate data with sample size $S$. We have two covariates, $x_1$ and $x_2$, and a response variable $y$. The response variable $y$. More precisely :

$$z_{0,i} \sim^{\textrm{i.i.d}} N(0,1);$$
$$z_{1,i} = 0.9*z_{0,i} + 0.1*\tau_{1,i};$$
$$z_{2,i} = 0.85*z_{0,i} + 0.15*\tau_{2,i};$$
$$y_i=z_{0,i}+ \epsilon_i;$$
and
$$\tau_{j,i} \sim^{\textrm{i.i.d}} N(0,1), j \in \{1,2\};$$
$$\epsilon \sim^{\textrm{i.i.d}},N(0,1),$$
indenpedently of $\tau$s.

In other words, we have 2 covariates that both are corrupted versions of $z_0$ that is directly related to the response. The two covariates are then highly correlated, but $z_1$ has better signal-to-noise ratio than $z_2$. If we want to predict $y$, $z_2$ doesn't have any merits over $z_1$.


In [None]:
rng = np.random.default_rng()
S=100
z0=rng.standard_normal(S)
z1=0.9*z0+0.1*rng.standard_normal(S)
z2=0.85*z0+0.15*rng.standard_normal(S)
y=z0+rng.standard_normal(S)
X=np.stack([z1,z2],1)
y=y.reshape(-1,1)
y=y.astype(float)
X=X.astype(float)

## Modeling
Next we define our models. Our first model is an unregularized regression on 2 variables without intercept. Concretely, the underlying statistical model is:

$$
y_i = \alpha_1 z_{1,i} + \alpha_{2,i} z_{2,i} + \epsilon_i.
$$

We train this simple model for 20 Epochs. Concretely:


In [None]:
train_x, val_x, train_y, val_y = train_test_split(X, y, test_size=0.1)
tensor_x_t = torch.Tensor(train_x)
tensor_x_t=tensor_x_t.float()
tensor_y_t = torch.from_numpy(train_y)
tensor_y_t=tensor_y_t.float()
tensor_x_v = torch.Tensor(val_x)
tensor_y_v = torch.from_numpy(val_y)
train_dataset = TensorDataset(tensor_x_t,tensor_y_t)
dl_tr = DataLoader(train_dataset,batch_size=10)
val_dataset = TensorDataset(tensor_x_v,tensor_y_v)
dl_val = DataLoader(val_dataset,batch_size=10)

class Net(nn.Module):
    def __init__(self,featdim):
        super(Net, self).__init__() 
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(featdim, 1, bias=False),
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

In [None]:
network=Net(2)

In [None]:
def l2_norm(prediction, y):
    return torch.norm(prediction - y, p=2).to(DEVICE)

In [None]:
loss_fn = nn.MSELoss()
pipe = Trainer(network, (dl_tr, dl_val), loss_fn, writer,l2_norm)
pipe.train(SGD, 20, False, {"lr": 0.1})

## 3. Regularized model
Next we fit a regularized model to the same dataset. Our loss function is

$$
L(f,X,y) = \textrm{MSE_loss} \big(X,y \big) + \lambda \sum_{i=1}^{n} ||\beta_i||_{1},
$$

where the latter term is known as the LASSO penalty and the $\beta$s are the network weights.

This regression technique falls into the broader category of *Tihonov-type* regularizers, which take the general form

$$
L(f,X,y) = \textrm{MSE_loss} \big(X,y \big) + \lambda \sum_{i=1}^{n} || \beta_i ||_{p}^{p},
$$

and serve as a namesake for our regularizer class. More general versions, such as elastic net, are readily extendable by mimicking the functionality of the TihonovRegularizer Class.


### 3.1 Fitting the regularized model

To fit a regularized model, we need use RegularizedTrainer, and define a regularizer. The inner workings of the giotto-deep regularizers are beyond the scope of this introductory tutorial, for now we just need to define a Tihonov regularizer and the regression penalty coefficient $\lambda$. This is done as follows:

In [None]:
pipe2 = Trainer(network, (dl_tr, dl_val), loss_fn, writer,l2_norm,regularizer=TihonovRegularizer(0.2,p=1))
pipe2.train(SGD, 20, False, {"lr": 0.1})

## 4. Results
Let us take a look at the regression coefficients we obtained from our two runs:

In [None]:
ex = ModelExtractor(pipe.model, loss_fn)
[*ex.get_layers_param().values()]

In [None]:
ex = ModelExtractor(pipe2.model, loss_fn)
[*ex.get_layers_param().values()]

We see that our regularized model shrinks the second regression coefficient while keeping the first one relative close to 1.

### 4.1 Regularization coefficients as a function of the regression penalty

In [None]:
regpens=[0.001,0.01,0.05,0.1,0.2,0.5,1]
param1=np.zeros(len(regpens))
param2=np.zeros(len(regpens))
for i in range(len(regpens)):
    pipe2 = Trainer(network, (dl_tr, dl_val), loss_fn, writer,l2_norm,regularizer=TihonovRegularizer(regpens[i],p=1))
    pipe2.lamda=regpens[i]
    pipe2.regularize=True
    pipe2.train(SGD, 50, False, {"lr": 0.01})
    ex = ModelExtractor(pipe2.model, loss_fn)
    param1[i]=[*ex.get_layers_param().values()][0][0][0]
    param2[i]=[*ex.get_layers_param().values()][0][0][1]

In [None]:
plt.plot(regpens, param1, label = r'$\alpha_1$')
plt.plot(regpens, param2, label = r'$\alpha_2$')
plt.legend()
plt.title('Regression coefficients as a function of regularization penalty')
plt.xlabel('Regression penalty ' r'$\lambda$')
plt.ylabel('Value')
plt.show()

We should see this regularization helps bring the coefficient $\alpha_2$ to zero, for as long as the model is trained long enough to find the optimum. This also corrects the coefficient $\alpha_1$ upwards. If we set lambda too high, we over-regularize the model diminishing its generalizability.

## 5. Concluding remarks and Further Reading
In this tutorial we introduced the RegularizedTrainer class and showed how it can be used for Tihonov-type regularizered regression. *Giotto-deep* supports very flexible framework for defining more complicated regularization techniques, these are discussed in the notebook *deploying_custom_regularizers*.