# A mlcvs overview

## What is `mlcvs`
`mlcvs`, which stands for Machine Learning Collective Variables, is Python library for the construciton of machine learning-based Collective Variables (CVs) for atomistic simulations.

The main purposes of `mlcvs` are
- Simplify to the bone the use of such CVs for the users.
- Provide a flexible framework for further development over previous models.

`mlcvs` allows the user to start train and export mlcvs models from scratch with only few lines of code which furthermore do no require any expertise in coding.

The library is based on Pytorch and exploits many features of the Pytorch-Lightning package to simplify the overall workflow. 
The library is thought to be used alongside with PLUMED, thus it is structured to simplify as much as possible the interaction with that in terms of handling of data files and utilization of the mlcvs.


## `mlcvs` workflow
The main goal of `mlcvs` is to make the construction of mlcvs as straightforward and accessible as possible for all types of users.

In the basic workflow consists of few steps which corresponds to very few lines of code:
- Import training data using the functions in `utils`, i.e. PLUMED colvar files or others
- Organize the training data into a `DataModule` using the functions in `data`. This allows to expolit best the Lightning features
- Initialize the model as one of the CV classes in `cvs`.  
- Initialize a `pytorch_lightning.trainer`, this takes care of training, validating, logs and boring stuff :)
- Export the trained model with `model.to_torchscript()`
- TODO Generate a PLUMED input file 
- Enjoy the CV in PLUMED with our awesome interface

## Structure of CVs classes in `mlcvs`

The final product of `mlcvs` library are of course the CVs.
These are defined as classes which inherit from from a `BaseCV` class and from `pytorch_lightning.module`, which inherits from `torch.nn.module`.
The first super class is meant to define a template for all the CVs along with common utility methods and the handling of pre and post processing in the model. 

The second allows to exploit all the utilites from pytorch lightning.  

Each CV is characterized by its specific methods, attributes and properties, which are implemented on top of these two super classes.
The structure of CVs in `mlcvs` is thought to be modular, indeed the core of each model is defined as a series of `BLOCKS`, implemented as `torch.nn.module`, that are automatically executed sequentially in a similar fashion to what is done with `torch.nn.sequential`.
Each CV then has a `loss_fn` attribute that sets the loss function which has to be minimized for the optimization of the trainable blocks.

The CV




## Structure of the code

### core
Implements building blocks of the mlcvs classes
- **loss** :      Implements loss functions for the training of mlcvs
- **nn** :        Implements trainable machine-learning building blocks of the mlcvs classes, conceptually similar to torch.nn 
- **stats** :     Implements statistics-based builidng blocks for the mlcvs classes
- **transform** : Implements non-trainable transformations of data

### cvs
Implements ready-to-use mlcvs classes and the `BaseCV` template class.
The CVs are divided based on the criterion used for the optimization in: 
- **unsupervised** :      Only require data about the system (`Autoencoder_CV`and `VAE_CV`, variational autoencoder CV).
- **supervised**:         Require either labeled data from the different metastable states of the system (`DeepLDA_CV` and `DeepTDA_CV`) or data and target to be matched (`Regression_CV`)
- **timelagged**:         Require time-lagged data from reactive trajectory (`DeepTICA_CV`)

