# 4. Using Experiments

Although the process of training and evaluating models becomes easier due to the abstractions and facilities provided by this framework and Pytorch Lightning, we also standarize the way we conduct experiments, in order to allow for a more systematic and organized approach to the development of models.

The `LightningExperiment` class aims to standartize the way we conduct experiments, including: default callacks and loggers, the directory structure for the logs and checkpoints, logging of hyperparameters, and the way we handle the training and evaluation of models and data modules.

In this notebook, we will demonstrate how to use the `LightningExperiment` class to conduct experiments in a systematic and organized way.


## Experiment Structure

The `LightningExperiment` follows the structure below. The first box is the name of the class, the second box is the name of the attributes and their type, and the third box is the methods of that class, the input parameters and return type. 
The arrows represent the inheritance relationship between the classes. 
Derived classes inherit the attributes and methods of their parent classes, that is, it have access to all the attributes and methods of the parent class. 
Methods named in italic are abstract methods, that is, they must be implemented by the derived class. Some methods are not abstract, or it may already e implemented in some childs (overriden). 


![Experiment Structure](experiment_classes.svg)

### The `Experiment` class

The `Experiment` class is the base class for all experiments and includes the `experiment_dir` (where logs, checkpoints, and outputs are saved), the `name` and `run_id` (tipically, the time).
The experiment directory is created when the experiment is instantiated, and the `experiment_dir` attribute is set to the path of the created directory.
The experiment consist in 3 stages: `setup`, `run` and `teardown`.
You can use the `execute` method to run the experiment, that will call the `setup`, `run` and `teardown` methods in sequence.

### The `LightningExperiment` class

The `LightningExperiment` adds common parameters for train and test models using Pytorch Lightning. Usually this is the base class for any experiment that uses Pytorch Lightning.
This class also implements the `run` method, that execute a generic Pytorch Lightning pipeline, and calls the `get_callbacks`, `get_logger`, `get_data_module`, `get_model`, `get_trainer`, `load_checkpoint`, `run_model` and `log_hyperparameter` methods. 
The pseudo-code for the `run` method is:

1. Get the model and data module using `get_model` and `get_data_module` methods.
2. If `self.load` is provided, load the checkpoint using the `load_checkpoint` method.
3. Get the callbacks and logger using `get_callbacks` and `get_logger` methods.
4. Log the hyperparameters using the `log_hyperparameters` method.
5. Get the trainer using the `get_trainer` method.
6. Run the model using the `run_model` method.

The user can override these methods to customize the experiment. By default, `get_callbacks`, `get_logger`, `load_checkpoint`, and `log_hyperparameters` have default implementations, and `get_data_module`, `get_model`, `get_trainer`, and `run_model` are abstract methods that must be implemented by the derived class.


### The `LightningTrain` and `LightningTest` classes

The `LightningTrain` and `LightningTest` classes are derived from `LightningExperiment` and are used to train and test models, respectively.  These classes adds more specific parameters for training and testing models using Pytorch Lightning and implements specific `get_callbacks`, `get_trainer`, and `run_model` methods, that are specific for training and testing models, respectively. This standardizes the way we train and test models, logging the same information and using the same callbacks and loggers. Thus, it allows the user to focus on the model and data module, and not on the training and testing process, that is already standardized (and can be customized) and can be reused in different experiments.
In fact, `get_model` and `get_data_module` are abstract methods that must be implemented by the derived class, that varies according to the model and data module used in the experiment.


### The `LightningSSLTrain` class

The `LightningTrain` class allow to train arbitrary models. 
The `LightningSSLTrain` class is a derived class that is used to train models using self-supervised learning. It adds 4 new methods: 

* `get_pretrain_model` and `get_pretrain_data_module`: the user must return the model and data module used to pretrain the model.
* `get_finetune_model` and `get_finetune_data_module`: the user must return the model and data module used to finetune the model.

The `training_mode` variable is used to indicate if the model is in pretrain or finetune mode. In fact, the `get_model` and `get_data_module` methods will call the `get_pretrain_model` and `get_pretrain_data_module` methods if `training_mode` is `pretrain`, and the `get_finetune_model` and `get_finetune_data_module` methods if `training_mode` is `finetune`. 

One important thing to note is about `load` parameter. 
If it is provided, the `load_checkpoint` method will load the checkpoint for the model, in order to resume the training. The `get_finetune_model` receives an additional parameter, the `load_backbone` parameter. After the backbone is loaded, the `load` parameter is used to resume the finetuning, that is, load the checkpoint for the finetune model (`SSLDiscriminator`).

## Running CPC Experiment

In this notebook, we will demonstrate how to run a CPC experiment, from pretrain to finetune. The `CPCTrain` class derives from `LightningSSLTrain` and implements the `get_pretrain_model`, `get_pretrain_data_module`, `get_finetune_model` and `get_finetune_data_module` methods, while the `CPCTest` class derives from `LightningTest` and implements the `get_model` and `get_data_module` methods.
Both classes add specific parameters to create CPC model and instantiate the data module.

Let's first start by pretraining the CPC model, using KuHAR dataset, as in previous notebooks.

### Experiment of Pretraining CPC

The `CPCTrain` class will encapsuate the default code for creating models and data modules from previous notebooks into the `get_pretrain_model` and `get_pretrain_data_module`  methods. 
Thus, we just need to pass the required parameters to the `CPCTrain` class and call the `execute` method to run the experiment.
As `CPCTrain` is a derived class, we can pass the parameters from all parent classes (`epochs`, `accelerator`, `batch_size`, *etc.*), as well as the parameters from the `CPCTrain` class (`window_size`, `num_classes`, *etc.*) in the class constructor.

The `CPCTrain` includes parameters to create the model as well as the data module. These parameters include:

* `data`: the path to the dataset folder. For pretrain, the data must be the path to a dataset where the samples are the whole time-series of an user. For finetune, the data must be the path to a dataset where the samples are the windows of the time-series, as in previous notebooks.
* `encoding_size`: the size of the latent representation of the CPC model.
* `in_channel`: the number of features in the input data.
* `window_size`: size of the input windows (`X_t`) to be fed to the encoder.
* `pad_length`: boolean indicating if the input windows should be padded to the `window_size` or not.
* `num_classes`: number of classes in the dataset.
* `update_backbone`: boolean indicating if the backbone should be updated during finetuning (only useful for fine-tuning process).

Only the `data` parameter is required, the others have default values. Please check the documentation of the `CPCTrain` class for more details.

Let's create the `CPCTrain` class and run the pretraining experiment.

In [1]:
from ssl_tools.experiments.har_classification.cpc import CPCTrain

data_path = "/workspaces/hiaac-m4/ssl_tools/data/view_concatenated/KuHar_cpc"

cpc_experiment = CPCTrain(
    # General params
    training_mode="pretrain",
    # Data Module params
    data=data_path,
    # CPC model params
    encoding_size=150,
    window_size=60,
    in_channel=6,
    num_classes=6,
    # Trainer params
    epochs=10,
    num_workers=12,
    batch_size=1,
    accelerator="gpu",
    devices=1,
)

cpc_experiment

LightningExperiment(experiment_dir=logs/pretrain/CPC/2024-02-01_23-52-39, model=CPC, run_id=2024-02-01_23-52-39, finished=False)

In [2]:
# Executing the experiment. Result is the output of the run() method
result = cpc_experiment.execute()  

/usr/local/lib/python3.10/dist-packages/lightning/fabric/loggers/csv_logs.py:198: Experiment logs directory logs/pretrain/CPC/2024-02-01_23-52-39 exists and is not empty. Previous log files in this directory will be deleted when the new ones are saved!
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(limit_train_batches=1.0)` was configured so 100% of the batches per epoch will be used..
`Trainer(limit_val_batches=1.0)` was configured so 100% of the batches will be used..


Setting up experiment: CPC...
Running experiment: CPC...
Training will start
	Experiment path: logs/pretrain/CPC/2024-02-01_23-52-39


LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]


Output()

`Trainer.fit` stopped: `max_epochs=10` reached.


Training finished
Last checkpoint saved at: logs/pretrain/CPC/2024-02-01_23-52-39/checkpoints/last.ckpt
Teardown experiment: CPC...


Once the experiment finished, we may have a directory structure like this:

```
logs/
    pretrain/
        CPC/
            2024-02-01_22-01-31/
                checkpoints/
                    epoch=9-step=570.ckpt
                    last.ckpt
                hparams.yaml
                metrics.csv
```

This is the default directory structure for experiments, where the experiment directory is `logs/pretrain/CPC/2024-02-01_22-01-31/`. The `checkpoints directory` contains the saved checkpoints and inside it we may have a `last.ckpt` file which is the last checkpoint saved.
The `hparams.yaml` file contains the hyperparameters, and the `metrics.csv` file contains the metrics logged during training.


We can obtain the experiment's model, data module, logger, checkpoint directory, callbacks, trianer, and hyperparameters using the `cpc_experiment.model`, `cpc_experiment.data_module`, `cpc_experiment.logger`, `cpc_experiment.checkpoint_dir`, `cpc_experiment.callbacks`, `cpc_experiment.trainer`, and `cpc_experiment.hyperparameters` attributes, respectively. 
These objects are cached in the `cpc_experiment` object, thus, it is instantiated only once, and can be accessed multiple times.
Also, the `cpc_experiment.finished` attribute is a boolean indicating if the experiment has finished sucessfuly or not.

We will need this checkpoint to load the weights of the backbone for the finetuning process.
Let's  obtain the checkpoint file and the experiment's model and data module, and then run the finetuning experiment.

In [3]:
backbone_checkpoint_path = cpc_experiment.checkpoint_dir / "last.ckpt"
backbone_checkpoint_path

PosixPath('logs/pretrain/CPC/2024-02-01_23-52-39/checkpoints/last.ckpt')

### Experiment of Fine-tune CPC

The `CPCTrain` class also encapsuate the default code for creating models and data modules from previous notebooks into the `get_finetune_model` and `get_finetune_data_module` methods. 
The behaviour of these methods is similar to the `get_pretrain_model` and `get_pretrain_data_module` methods, but they are used to create the model and data module for the finetuning process.
In fact, the `get_finetune_model` will encapsulate the CPC code inside `SSLDisriminator` class, as seen in previous notebooks.

As we use the same class for pretrain and finetune, we just need to set the `training_mode` attribute to `finetune` and set the `load_backbone` parameter to the checkpoint file obtained in the pretrain process. 
Then, we can call the `execute` method to run the experiment.

However, it worth to notice that fine tune is an supervised learning process and uses windowed time-series as input. Thus, the `data` parameter must be the path to a dataset where the samples are the windows of the time-series, as in previous notebooks. In our case, we will use the standardized balanced view of the KuHar dataset.

In [4]:
data_path = "/workspaces/hiaac-m4/ssl_tools/data/standartized_balanced/KuHar/"

cpc_experiment = CPCTrain(
    # General params
    training_mode="finetune",
    load_backbone=backbone_checkpoint_path,
    # Data Module params
    data=data_path,
    # CPC model params
    encoding_size=150,
    window_size=60,
    in_channel=6,
    num_classes=6,
    # Trainer params
    epochs=10,
    num_workers=12,
    batch_size=128,
    accelerator="gpu",
    devices=1,
)

cpc_experiment

LightningExperiment(experiment_dir=logs/finetune/CPC/2024-02-02_00-03-12, model=CPC, run_id=2024-02-02_00-03-12, finished=False)

In [5]:
# Executing the experiment. Result is the output of the run() method
result = cpc_experiment.execute()  

/usr/local/lib/python3.10/dist-packages/lightning/fabric/loggers/csv_logs.py:198: Experiment logs directory logs/finetune/CPC/2024-02-02_00-03-12 exists and is not empty. Previous log files in this directory will be deleted when the new ones are saved!
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(limit_train_batches=1.0)` was configured so 100% of the batches per epoch will be used..
`Trainer(limit_val_batches=1.0)` was configured so 100% of the batches will be used..


Setting up experiment: CPC...
Running experiment: CPC...
Loading model from: logs/pretrain/CPC/2024-02-01_23-52-39/checkpoints/last.ckpt...
Model loaded successfully
Training will start
	Experiment path: logs/finetune/CPC/2024-02-02_00-03-12


LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]


Output()

We will pick the last checkpoint from the fine-tuning process to evaluate the model.

In [None]:
fine_tuned_checkpoint_path = cpc_experiment.checkpoint_dir / "last.ckpt"
fine_tuned_checkpoint_path

PosixPath('logs/finetune/CPC/2024-02-01_22-38-32/checkpoints/last.ckpt')

### CPC performance evaluation experiment

Finally, we can evaluate the performance of the CPC model using the `CPCTest` class. This class inherits from `LightningTest` and encapsulate the default code for creating models and data modules from previous notebooks into the `get_model` and `get_data_module` methods.

The signature of the `CPCTest` class is very similar to the `CPCTrain` class. Also, we will use the same data module used in the finetuning process. However, differently from the train process the test process uses the `.test` method in the trainer and not the `.fit` method.
Also, the `load` parameter is used to load the checkpoint obtained in the finetuning process (that load the weights from `SSLDiscriminator`, backbone and prediction haad).

Let's create experiments to test the CPC model, using the test set from different datasets besides KuHAR.

In [None]:
from pathlib import Path
from ssl_tools.experiments.har_classification.cpc import CPCTest

root_datasets_path = Path("/workspaces/hiaac-m4/ssl_tools/data/standartized_balanced/")

datasets = [
    "KuHar",
    "MotionSense",
    "RealWorld_thigh",
    "RealWorld_waist",
    "UCI"
    "WISDM"
]

results = dict()
for dataset in datasets:
    data_path = root_datasets_path / dataset
    print(f"Dataset at: {data_path}")
    cpc_experiment = CPCTest(
        # General params
        load=fine_tuned_checkpoint_path,
        # Data Module params
        data=data_path,
        # CPC model params
        encoding_size=150,
        window_size=60,
        in_channel=6,
        num_classes=6,
        # Trainer params
        accelerator="gpu",
        devices=1,
    )
    print(f"Loading model from {fine_tuned_checkpoint_path} and executing test using dataset at {dataset}...")
    results[dataset] = cpc_experiment.execute()
    print(f"Test on dataset {dataset} finished!")

/usr/local/lib/python3.10/dist-packages/lightning/fabric/loggers/csv_logs.py:198: Experiment logs directory logs/test/CPC/2024-02-01_23-01-24 exists and is not empty. Previous log files in this directory will be deleted when the new ones are saved!
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(limit_test_batches=1.0)` was configured so 100% of the batches will be used..
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]


Dataset at: /workspaces/hiaac-m4/ssl_tools/data/standartized_balanced/KuHar
Loading model from logs/finetune/CPC/2024-02-01_22-38-32/checkpoints/last.ckpt and executing test using dataset at KuHar...
Setting up experiment: CPC...
Running experiment: CPC...
Loading model from: logs/finetune/CPC/2024-02-01_22-38-32/checkpoints/last.ckpt...
Model loaded successfully


Output()

/usr/local/lib/python3.10/dist-packages/lightning/fabric/loggers/csv_logs.py:198: Experiment logs directory logs/test/CPC/2024-02-01_23-01-28 exists and is not empty. Previous log files in this directory will be deleted when the new ones are saved!
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(limit_test_batches=1.0)` was configured so 100% of the batches will be used..
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]


Teardown experiment: CPC...
Test on dataset KuHar finished !
Dataset at: /workspaces/hiaac-m4/ssl_tools/data/standartized_balanced/MotionSense
Loading model from logs/finetune/CPC/2024-02-01_22-38-32/checkpoints/last.ckpt and executing test using dataset at MotionSense...
Setting up experiment: CPC...
Running experiment: CPC...
Loading model from: logs/finetune/CPC/2024-02-01_22-38-32/checkpoints/last.ckpt...
Model loaded successfully


Output()

/usr/local/lib/python3.10/dist-packages/lightning/fabric/loggers/csv_logs.py:198: Experiment logs directory logs/test/CPC/2024-02-01_23-01-39 exists and is not empty. Previous log files in this directory will be deleted when the new ones are saved!
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(limit_test_batches=1.0)` was configured so 100% of the batches will be used..


Teardown experiment: CPC...
Test on dataset MotionSense finished !
Dataset at: /workspaces/hiaac-m4/ssl_tools/data/standartized_balanced/RealWorld_thigh
Loading model from logs/finetune/CPC/2024-02-01_22-38-32/checkpoints/last.ckpt and executing test using dataset at RealWorld_thigh...
Setting up experiment: CPC...
Running experiment: CPC...
Loading model from: logs/finetune/CPC/2024-02-01_22-38-32/checkpoints/last.ckpt...
Model loaded successfully


LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]


Output()

## Other advantages of using `LightningExperiment`

The `LightningExperiment` class also provides other advantages, such as:

* Automatically generate CLI applications for the experiments, using the `jsonargparse` library. In fact, every parameter in the class constructor is automatically converted to a command line argument. This allows the user to run the experiment from the command line, using the same parameters as in the class constructor.
* Default `metrics.csv` and `hparams.yaml` files are created, and the hyperparameters are logged in the `hparams.yaml` file. The `metrics.csv` file contains the metrics logged during training, and can be used to analyze the performance of the model.