# Customize training

### Outline

The CVs implemented in `mlcolvar.cvs` are subclasses of `pytorch_lightning.LightningModule` which can be tought as tasks rather than just plain models. Indeed, they incorporate also the optimizer as well as the loss function used in the training step. In this tutorial you will learn how to customize the different aspects of the training behaviour:

- optimizer
- loss function
- trainer

### Optimizer

The optimizer used is returned by the function `configure_optimizers` which is called by the lightning trainer. The default optimizer is `Adam`. To change it, or to customize the optimizer's arguments, you can interact with the CV's members `optimizer_name` and `optimizer_kwargs`.

For instance, this could be used to add an L2 regularization through the `weight_decay` argument.

In [8]:
from mlcolvar.cvs import RegressionCV

# define example CV
cv = RegressionCV(layers=[10,5,5,1], options={})

# choose optimizer
cv.optimizer_name = 'Adam' 

# choose arguments
cv.optimizer_kwargs = {'weigth_decay' : 1e-4 }

print(f'Optimizer: {cv.optimizer_name}')
print(f'Arguments: {cv.optimizer_kwargs}')

Optimizer: Adam
Arguments: {'weigth_decay': 0.0001}


### Loss function

The set of operations that is performed at each optimization step are encoded in the method `training_step` of each CV. They typically involve:
1. a forward pass of the model
2. the calculation of the loss function
3. a backward pass

The general workflow cannot be changed as it is specific to each CV, unless you subclass a given CV and overload the `training_step` method. However, there are some details that can be changed.

For example, one might want to change the loss function in a `RegressionCV` (or in an `AutoEncoderCV`) from Mean Square Error (MSE) to Mean Absolute Error (MAE). To do so, one need to define the function with the same signature of the one used in the CV and then set it into the `loss_fn` member:

In [30]:
from torch import Tensor

# print default loss
print(f'default: {cv.loss_fn}' )

# define new function
def mae_loss(input : Tensor, target: Tensor):
    return 

# assign it
cv.loss_fn = mae_loss

print(f'(a) new: {cv.loss_fn}' )

# this could also be accomplished with a lambda function
cv.loss_fn = lambda x,y : (x-y).abs().sum()

print(f'(b) new: {cv.loss_fn}' )

default: <function <lambda> at 0x7f89c069c670>
(a) new: <function mae_loss at 0x7f89f2c07280>
(b) new: <function <lambda> at 0x7f89c069c670>


Another setting which can be customized is the one in which the loss function has some options which can be customized. For instance, in the case of DeepLDA/DeepTICA CVs the loss function is `ReduceEigenvaluesLoss` which takes as inputs the eigenvalues of the underlying statistical problem and return a scalar (e.g. the sum of eigenvalues squared). To see the variables that can be set you should look at the documentation of the loss functions used.

For example, to change the reduction mode to the sum of the eigenvalues instead of the sum of the squared ones, one can update the loss function accordingly:

In [33]:
from mlcolvar.cvs import DeepTICA

# define CV
cv = DeepTICA(layers=[10, 5, 5, 2], options={})

# print default loss mode
print(f'default mode: {cv.loss_fn.mode}')

# change the mode
cv.loss_fn.mode = 'sum' 

# print new loss mode
print(f'>> new  mode: {cv.loss_fn.mode}')

default kwargs: {'mode': 'sum2', 'n_eig': 0}
>> new  kwargs: {'mode': 'sum', 'n_eig': 0}


### Trainer

Since we are using the pytorch lightning framework we can exploit all of the benefits of this library. For instance, we can decide to run the optimization of the model on the GPUs if available with no change to our code. 

In [39]:
import pytorch_lightning as pl

# choose accelerators
trainer = pl.Trainer(accelerator='cpu') #options are: "cpu", "gpu", "tpu", "ipu", "hpu", "mps", "auto"

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


An important class of functions that can be used to customize the behaviour during the training are **callbacks**. 

Quoting the lightning documentation: *[Callbacks](https://lightning.ai/docs/pytorch/latest/extensions/callbacks.html) allow you to add arbitrary self-contained programs to your training. At specific points during the flow of execution (hooks), the Callback interface allows you to design programs that encapsulate a full set of functionality. It de-couples functionality that does not need to be in the lightning module and can be shared across projects.*

For instance, they can be used to perform early stopping as well as to save model checkpoints or to save metrics. Here we will just give some examples of these functionalities, while we refer the reader to lightning [documentation](https://lightning.ai/docs/pytorch/latest/extensions/callbacks.html) for a more detailed overview.

#### Early stopping

Early stopping allows to stop the training when a given metric (typically the validation loss) does not decrease (increase) anymore, which is a symptom of overfitting.

In [40]:
from pytorch_lightning.callbacks.early_stopping import EarlyStopping

early_stopping = EarlyStopping(monitor="valid_loss",  # quantity to monitor
                               mode='min',            # whether this should me minimized or maximized durining training
                               min_delta=0,           # minimum value that the quantity should change
                               patience=10,           # how many epochs to wait before stopping the training
                               verbose=False 
                               )

trainer = pl.Trainer(callbacks=[early_stopping])

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


#### Model checkpointing

It is often useful to save the checkpoint of the model which perform best according to some metric. This is useful when used, for instance, with early stopping.

After training finishes, you can use `best_model_path` to retrieve the path to the best checkpoint file and `best_model_score` to retrieve its score.

In [42]:
from pytorch_lightning.callbacks.model_checkpoint import ModelCheckpoint

# see documentation for additional customization, e.g. location and file names ecc..
checkpoint = ModelCheckpoint(save_top_k=1,          # number of models to save (top_k=1 means only the best one is stored)
                            monitor="valid_loss"    # quantity to monitor
                            )

# assign callback to trainer
trainer = pl.Trainer(callbacks=[checkpoint])

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


After the training is over remember also to export the TorchScript model which is needed by PLUMED. The following code first load the best checkpoint and then compiles it.

#### Loggers

Lightning supports numerous ways of logging metrics, from saving CSV files to TensorBoard to Weight&Biases and more (see their website for the full list).

For instance, to save the metrics in a .csv file you can use the `CSVLogger`:

In [None]:
from pytorch_lightning.loggers import CSVLogger

logger = CSVLogger(save_dir="experiments",   # directory where to save file
                    name='myCV',             # name of experiment
                    version=None             # version number (if None it will be automatically assigned)
                    )

# assign callback to trainer
trainer = pl.Trainer(callbacks=[checkpoint])

Or again, the following snippet can be used to save the metrics in the TensorBoard format (requires `tensorboard` to be installed):

#### Adding new callbacks: save metrics into a dictionary

Callbacks can also be easily implemented in order to perform custom tasks. 

For instance, in `mlcolvar.utils.trainer` we implemented a simple `MetricsCallback` object which save the logged metrics into a dictionary. This allows to easily display the results in the tutorials without having to save the metrics with the loggers and load them back afterwards. 

In [43]:
from mlcolvar.utils.trainer import MetricsCallback

log = MetricsCallback()

# assign callback to trainer
trainer = pl.Trainer(callbacks=[checkpoint])

# After the training is over the metrics can be accessed with the dictionary .metrics

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
