# Tutorial #1
In this tutorial, a model is trained from scratch.

### Pre-tutorial

##### NeuralLib package

In [2]:
# Cell to be removed once the package is stable
import sys
import os

# Get the absolute path of your project directory
project_path = os.path.abspath("..")

# Add the project directory to sys.path
if project_path not in sys.path:
    sys.path.append(project_path)

##### Virtual Environment

Also, it is necessary to make sure that a conda env or a virtual env with the necessary packages (check requirements.txt) is activated.

And, for that, you need to install the IPython kernel in your virtual environment to use it with Jupyter: check steps 6 through 8 in https://medium.com/@WamiqRaza/how-to-create-virtual-environment-jupyter-kernel-python-6836b50f4bf4

In [3]:
# check if it is running the python from the virtual environment you want
import sys
print(sys.executable)


C:\Users\Catia Bastos\dev\envs\NeuralLibraryEnv\Scripts\python.exe


### Imports

In [4]:
from NeuralLib.config import DATASETS_GIB01  # directories saved in config.py
from NeuralLib.architectures import GRUseq2seq

### Data paths

In [5]:
X = os.path.join(DATASETS_GIB01, 'x')
Y = os.path.join(DATASETS_GIB01, 'y_bin')

### Step 1: Define architecture's parameters

In [6]:
arch_params = {
    'model_name': 'ECGPeakDetector',
    'n_features': 1,
    'hid_dim': 16,
    'n_layers': 2,
    'dropout': 0.3,
    'learning_rate': 0.01,
    'bidirectional': True,
    'task': 'classification',
    'num_classes': 1,
}

### Step 2: Define training parameters

In [6]:
# Minimal values for testing purposes
train_params = {
    'path_x': X,
    'path_y': Y,
    'epochs': 3,
    'batch_size': 1,
    'patience': 2,
    'dataset_name': 'private_gib01',
    'trained_for': 'peak detection',
    'all_samples': False,
    'samples': 3,
    'gpu_id': None,
    'enable_tensorboard': True
}

### Step 3: Initialize the model

Define the model's architecture (check biosignals_architectures.py) and set the hyperparameters.

As `task` is set to `classification`, and has 1 class (binary classification) the criterion (loss function) is automatically set to `BCEWithLogitsLoss`.

In [7]:
model = GRUseq2seq(**arch_params)

In [5]:
# To check which classes of architectures are available
from NeuralLib.architectures import list_architectures
list_architectures()

['GRUEncoderDecoder',
 'GRUseq2one',
 'GRUseq2seq',
 'TransformerEncoderDecoder',
 'Transformerseq2one',
 'Transformerseq2seq']

### Step 4: Train the model (from scratch)

In [8]:
model.train_from_scratch(**train_params)

No GPU available, using CPU.
Checkpoints directory created at C:\Users\Catia Bastos\dev\results\ECGPeakDetector\checkpoints\GRUseq2seq_[16, 16]hid_2l_bidirTrue_lr0.01_drop[0.3, 0.3]_dt2025-02-03_15-54-50


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


TensorBoard logs will be saved to C:\Users\Catia Bastos\dev\results\ECGPeakDetector\checkpoints\GRUseq2seq_[16, 16]hid_2l_bidirTrue_lr0.01_drop[0.3, 0.3]_dt2025-02-03_15-54-50\tensorboard_logs\version_0


C:\Users\Catia Bastos\dev\envs\NeuralLibraryEnv\lib\site-packages\pytorch_lightning\callbacks\model_checkpoint.py:654: Checkpoint directory C:\Users\Catia Bastos\dev\results\ECGPeakDetector\checkpoints\GRUseq2seq_[16, 16]hid_2l_bidirTrue_lr0.01_drop[0.3, 0.3]_dt2025-02-03_15-54-50 exists and is not empty.

  | Name           | Type              | Params | Mode 
-------------------------------------------------------------
0 | gru_layers     | ModuleList        | 6.6 K  | train
1 | dropout_layers | ModuleList        | 0      | train
2 | fc_out         | Linear            | 33     | train
3 | criterion      | BCEWithLogitsLoss | 0      | train
-------------------------------------------------------------
6.7 K     Trainable params
0         Non-trainable params
6.7 K     Total params
0.027     Total estimated model params size (MB)
8         Modules in train mode
0         Modules in eval mode


Sanity Checking: |                                                                               | 0/? [00:00<…

C:\Users\Catia Bastos\dev\envs\NeuralLibraryEnv\lib\site-packages\pytorch_lightning\trainer\connectors\data_connector.py:424: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=7` in the `DataLoader` to improve performance.
C:\Users\Catia Bastos\dev\envs\NeuralLibraryEnv\lib\site-packages\pytorch_lightning\utilities\data.py:78: Trying to infer the `batch_size` from an ambiguous collection. The batch size we found is 1. To avoid any miscalculations, use `self.log(..., batch_size=batch_size)`.
C:\Users\Catia Bastos\dev\envs\NeuralLibraryEnv\lib\site-packages\pytorch_lightning\trainer\connectors\data_connector.py:424: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=7` in the `DataLoader` to improve performance.
C:\Users\Catia Bastos\dev\envs\NeuralLibraryEnv\lib\site-packages\pytorch_l

Training: |                                                                                      | 0/? [00:00<…

Validation: |                                                                                    | 0/? [00:00<…

Validation: |                                                                                    | 0/? [00:00<…

Validation: |                                                                                    | 0/? [00:00<…

`Trainer.fit` stopped: `max_epochs=3` reached.


Total training time: 54.91 seconds
{'architecture': 'GRUseq2seq', 'model_name': 'ECGPeakDetector', 'train_dataset': 'private_gib01', 'task': 'peak detection', 'gpu_model': None, 'epochs': 3, 'optimizer': 'Adam (\nParameter Group 0\n    amsgrad: False\n    betas: (0.9, 0.999)\n    capturable: False\n    differentiable: False\n    eps: 1e-08\n    foreach: None\n    fused: None\n    initial_lr: 0.01\n    lr: 0.01\n    maximize: False\n    weight_decay: 1e-05\n)', 'learning_rate': 0.01, 'validation_loss': 0.034851472824811935, 'training_time': 54.91203784942627, 'retraining': False}
Training complete. Best_model_path: C:\Users\Catia Bastos\dev\results\ECGPeakDetector\checkpoints\GRUseq2seq_[16, 16]hid_2l_bidirTrue_lr0.01_drop[0.3, 0.3]_dt2025-02-03_15-54-50\GRUseq2seq_[16, 16]hid_2l_lr0.01_drop[0.3, 0.3].ckpt
Weights saved as C:\Users\Catia Bastos\dev\results\ECGPeakDetector\checkpoints\GRUseq2seq_[16, 16]hid_2l_bidirTrue_lr0.01_drop[0.3, 0.3]_dt2025-02-03_15-54-50\model_weights.pth


In [9]:
# checkpoints directory
checkpoints_dir = model.checkpoints_directory
print(checkpoints_dir)

C:\Users\Catia Bastos\dev\results\ECGPeakDetector\checkpoints\GRUseq2seq_[16, 16]hid_2l_bidirTrue_lr0.01_drop[0.3, 0.3]_dt2025-02-03_15-54-50


Breakdown of train_from_scratch:
1. **Model Initialization & Checkpoints:**

A directory for storing model checkpoints is created in 
`<DEV_BASE_DIR>/results/<model_name>/checkpoints/<architecture_name_hparams_datetime>`

2. **Dataset & DataLoader Preparation:**

Training and validation datasets are instantiated from `DatasetSequence`, loading data from `path_x` (iputs) and `path_y` (outputs). PyTorch `DataLoader` objects are created. It is prepared to handle dynamic sequence lengths. `path_x` and `path_y` must contain `val`, `train`, and `test` folders. 

3. **Defining Callbacks for Training:**
    - **Checkpoint Callback:** Saves the best model based on validation loss (`val_loss`).
    - **Early Stopping Callback:** Stops training early if validation loss doesn't improve for `patience` epochs.
    - **Loss Plot Callback:** Saves a loss curve to visualize training progress.

4. **Trainer Initialization & Logging:**
    - If TensorBoard is enabled, a `TensorBoardLogger` is set up for tracking metrics and hyperparameters (hparams.yaml). These are written inside the checkpoint directory
    - The PyTorch Lightning `Trainer` is instantiated, specifying: maximum epochs (`epochs`), device, callbacks (Checkpointing, Early Stopping, Loss Plot), logging
    
5. **Training Execution:**

The model is trained using `trainer.fit(model, train_dataloader, val_dataloader)`.

6. **Post-Training Processing & Model Saving:**
    - The **best (lowest) validation loss** is extracted from the checkpoint callback.
    - Training metadata (trainer state, optimizer, dataset, GPU info, loss, etc.) is saved (using `model.save_training_information()`) and written to `training_info.json` inside the checkpoint directory.
    - The **final model weights** (corresponding to the lowest validation loss) are saved in `model_weights.pth` inside the checkpoint directory.

### Step 5 (optional): Retrain the model
##### In this case, we are just continuing the training process for 4 more epochs (did not change anything, nor the data, nor the parameters, nor the task)

In [10]:
train_params_retrain = train_params.copy()
train_params_retrain['epochs'] = 4
model.retrain(
    checkpoints_directory=checkpoints_dir, # checkpoints directory where the models weights and parameters were stored in the previous step
    path_x=train_params_retrain['path_x'],
    path_y=train_params_retrain['path_y'],
    patience=train_params_retrain['patience'],
    batch_size=train_params_retrain['batch_size'],
    epochs=train_params_retrain['epochs'],
    gpu_id=train_params_retrain['gpu_id'],
    all_samples=train_params_retrain['all_samples'],
    samples=train_params_retrain['samples'],
    dataset_name=train_params_retrain['dataset_name'],
    trained_for=train_params_retrain['trained_for'],
    enable_tensorboard=train_params_retrain['enable_tensorboard'],
)

  return self.fget.__get__(instance, owner)()
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
C:\Users\Catia Bastos\dev\envs\NeuralLibraryEnv\lib\site-packages\pytorch_lightning\callbacks\model_checkpoint.py:654: Checkpoint directory C:\Users\Catia Bastos\dev\results\ECGPeakDetector\checkpoints\GRUseq2seq_[16, 16]hid_2l_bidirTrue_lr0.01_drop[0.3, 0.3]_retraining_dt2025-02-03_15-55-46 exists and is not empty.

  | Name           | Type              | Params | Mode 
-------------------------------------------------------------
0 | gru_layers     | ModuleList        | 6.6 K  | train
1 | dropout_layers | ModuleList        | 0      | train
2 | fc_out         | Linear            | 33     | train
3 | criterion      | BCEWithLogitsLoss | 0      | train
-------------------------------------------------------------
6.7 K     Trainable params
0         Non-trainable params
6.7 K     Total params
0.027     Total estimated model params 

No GPU available, using CPU.
Found existing .pth file: C:\Users\Catia Bastos\dev\results\ECGPeakDetector\checkpoints\GRUseq2seq_[16, 16]hid_2l_bidirTrue_lr0.01_drop[0.3, 0.3]_dt2025-02-03_15-54-50\model_weights.pth
Weights loaded successfully from C:\Users\Catia Bastos\dev\results\ECGPeakDetector\checkpoints\GRUseq2seq_[16, 16]hid_2l_bidirTrue_lr0.01_drop[0.3, 0.3]_dt2025-02-03_15-54-50\model_weights.pth
TensorBoard logs will be saved to C:\Users\Catia Bastos\dev\results\ECGPeakDetector\checkpoints\GRUseq2seq_[16, 16]hid_2l_bidirTrue_lr0.01_drop[0.3, 0.3]_retraining_dt2025-02-03_15-55-46\tensorboard_logs\version_0


Sanity Checking: |                                                                               | 0/? [00:00<…

Training: |                                                                                      | 0/? [00:00<…

Validation: |                                                                                    | 0/? [00:00<…

Validation: |                                                                                    | 0/? [00:00<…

Validation: |                                                                                    | 0/? [00:00<…

Total training time: 53.98 seconds
{'architecture': 'GRUseq2seq', 'model_name': 'ECGPeakDetector', 'train_dataset': 'private_gib01', 'task': 'peak detection', 'gpu_model': None, 'epochs': 3, 'optimizer': 'Adam (\nParameter Group 0\n    amsgrad: False\n    betas: (0.9, 0.999)\n    capturable: False\n    differentiable: False\n    eps: 1e-08\n    foreach: None\n    fused: None\n    initial_lr: 0.01\n    lr: 0.01\n    maximize: False\n    weight_decay: 1e-05\n)', 'learning_rate': 0.01, 'validation_loss': 0.02943500317633152, 'training_time': 53.97588133811951, 'retraining': True, 'training_history': {'architecture': 'GRUseq2seq', 'model_name': 'ECGPeakDetector', 'train_dataset': 'private_gib01', 'task': 'peak detection', 'gpu_model': None, 'epochs': 3, 'optimizer': 'Adam (\nParameter Group 0\n    amsgrad: False\n    betas: (0.9, 0.999)\n    capturable: False\n    differentiable: False\n    eps: 1e-08\n    foreach: None\n    fused: None\n    initial_lr: 0.01\n    lr: 0.01\n    maximize: Fa

### Step 6: Test on test set

In [11]:
predictions, avg_loss = model.test_on_test_set(
    path_x=train_params['path_x'],
    path_y=train_params['path_y'],
    checkpoints_dir=checkpoints_dir,
    gpu_id=train_params['gpu_id'],
    all_samples=False, # if True, test on all available samples
    samples=5,
    save_predictions=True
)

print(f"Average Test Loss: {avg_loss:.4f}")

No GPU available, using CPU.
Using device: cpu
Weights successfully loaded from C:\Users\Catia Bastos\dev\results\ECGPeakDetector\checkpoints\GRUseq2seq_[16, 16]hid_2l_bidirTrue_lr0.01_drop[0.3, 0.3]_dt2025-02-03_15-54-50\model_weights.pth.
Sample 0: Test Loss: 0.0362
Sample 1: Test Loss: 0.0362
Sample 2: Test Loss: 0.0373
Sample 3: Test Loss: 0.0373
Sample 4: Test Loss: 0.0373
Average Test Loss: 0.0368
Average Test Loss: 0.0368
