# 2. Training a Pytorch Lighning model

In this notebook, we show the training of a simple CNN model using Pytorch Lightning. 
We first start with data, then we define the model, and finally we train it.

## Creating KuHar LightningDataModule

In order to train a model, we must first create a LightningDataModule.
In this work, we will use the Standartized KuHar HAR data. Our data folder looks like this:

```
KuHar/
    test.csv
    train.csv
    validation.csv
```

As each CSV file contains time-windows signals of two 3-axis sensors (accelerometer and gyroscope), we must use the `MultiModalSeriesCSVDataset` class. After it, we must create a LightningDataModule, that will define the data loaders for training, validation and test. The `data_path` is the path to the `KuHar` folder, 

### Faciliting the creation of the LightningDataModule with MultiModalHARSeriesDataModule

In order to facilitate the `Dataset` and `DataLoader` creation, we will use the `MultiModalHARSeriesDataModule`. If:

1. Your directory is organized like the one above; and 
2. Each CSV file is a collection os time-windows of signals (that possibly would be used as a dataset wrapping `MultiModalSeriesCSVDataset`).

Then, you can use the `MultiModalHARSeriesDataModule` to create a `LightningDataModule`, easily. 
The `train_dataloader` method will use `train.csv`, `val_dataloader` will use `validation.csv` and `test_dataloader` will use `test.csv`.

To create a `MultiModalHARSeriesDataModule`, we must pass:
- `data_path`: the path to the `KuHar` folder;
- `feature_prefixes`: the prefixes of the features in the CSV files. In this case, we have `accel-x`, `accel-y`, `accel-z`, `gyro-x`, `gyro-y` and `gyro-z`;
- `batch_size`: the batch size for the data loaders; and
- `num_workers`: the number of workers for the data loaders. Essentially, the number of parallel processes to load the data.

All data loader will share the passed parameters, such as `batch_size`, `num_workers`, and `feature_prefixes`.

In [1]:
from ssl_tools.data.data_modules.har import MultiModalHARSeriesDataModule

data_path = "/workspaces/hiaac-m4/ssl_tools/data/standartized_balanced/KuHar/"

data_module = MultiModalHARSeriesDataModule(
    data_path=data_path,
    feature_prefixes=("accel-x", "accel-y", "accel-z", "gyro-x", "gyro-y", "gyro-z"),
    label="standard activity code",
    features_as_channels=True,
    batch_size=64,
    num_workers=0,  # Sequential, for notebook compatibility
)
data_module

  from .autonotebook import tqdm as notebook_tqdm


<ssl_tools.data.data_modules.har.MultiModalHARSeriesDataModule at 0x7f4e2cc53340>

We can test the dataloaders by getting the first batch of each one. Let's do it, but just for the `train_dataloader`. Note that the `.setup()` method must be called before getting the data loaders. If you don't call it, the data loaders will not be created. However, when used to train a model, the Pytorch Lightning `.fit()` method will call the `.setup()` method for you. So, we put it here just to show how to use it.

In [2]:
data_module.setup("fit")            # We just put it here to test.
                                    # When training a model, the Trainer will call this method.
train_dataloader = data_module.train_dataloader()

# Pick the first batch to inspect. The batch size is 64, so we have 64 samples.
batch = next(iter(train_dataloader))
# Each batch is a 2-element tuple with the first element being the 64 sample input and the second the 64 sample target.
inputs, targets = batch

print(f"Inputs shape: {inputs.shape}, Targets shape: {targets.shape}")

Inputs shape: torch.Size([64, 6, 60]), Targets shape: torch.Size([64])


## Training a simple model

We will create a simple 1D CNN Pytorch Lightning model using the `Simple1DConvNetwork`. The model will be trained to classify the activities in the KuHar dataset. 

Pytorch Lightning models must implement the `forward` method, `training_step` and `configure_optimizers` methods. Also, the `__init__` method is used to define the model.
The `forward` method is the same as the Pytorch `forward` method. 
The `training_step` method is the method that will be called for each batch of data during the training. 
The `configure_optimizers` method is the method that will define the optimizer to be used during the training.

The `Simple1DConvNetwork` is a simple 1D CNN model that will be used to classify the activities in the KuHar dataset. It has 3 convolutional layers and 2 fully connected layers. It is trained using the `Adam` optimizer and the `CrossEntropyLoss` loss function.

Besides that, Lightning models implemented in this framework, usually logs the training and validation losses.
Also, the `test` usually implement common metrics, such as accuracy.

In [3]:
from ssl_tools.models.nets.convnet import Simple1DConvNetwork

model = Simple1DConvNetwork(
    input_channels=6,   # The number of input channels (accel-x, accel-y, accel-z, gyro-x, gyro-y, gyro-z)
    num_classes=6,    # The number of output classes
    time_steps=60,  # Used to automatically calculate the input size of the linear layer
    learning_rate=1e-3, # The learning rate for the optimizer
)

model

Simple1DConvNetwork(
  (loss_func): CrossEntropyLoss()
  (features): Sequential(
    (0): Conv1d(6, 64, kernel_size=(5,), stride=(1,))
    (1): ReLU()
    (2): Dropout(p=0.5, inplace=False)
    (3): Conv1d(64, 64, kernel_size=(5,), stride=(1,))
    (4): ReLU()
    (5): Dropout(p=0.5, inplace=False)
    (6): Conv1d(64, 64, kernel_size=(5,), stride=(1,))
    (7): ReLU()
  )
  (classifier): Sequential(
    (0): Dropout(p=0.5, inplace=False)
    (1): Linear(in_features=3072, out_features=128, bias=True)
    (2): ReLU()
    (3): Dropout(p=0.5, inplace=False)
    (4): Linear(in_features=128, out_features=6, bias=True)
  )
)

To train a Lightning model using Pytorch Lightning, we must create a `Trainer` and call the `fit` method. The `Trainer` is responsible for training the model. It has several parameters, such as the number of epochs, the number of GPUs to use, the number of TPU cores to use, etc. 

We will train our model using the already defined dataloader. The `fit` method will be responsible for training the model using the training and validation data loaders. After the training, we will test the model using the test data loader.

The training will run for 300 epochs (`max_epochs`) and will use 1 (`devices`) GPU only (`accelerator`).

In [4]:
import lightning as L

trainer = L.Trainer(
    max_epochs=300,
    accelerator="gpu",
    devices=1,
    strategy="auto",
    num_nodes=1
)
trainer.fit(model, data_module)

GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]

  | Name       | Type             | Params
------------------------------------------------
0 | loss_func  | CrossEntropyLoss | 0     
1 | features   | Sequential       | 43.1 K
2 | classifier | Sequential       | 394 K 
------------------------------------------------
437 K     Trainable params
0         Non-trainable params
437 K     Total params
1.749     Total estimated model params size (MB)


Sanity Checking DataLoader 0:   0%|          | 0/2 [00:00<?, ?it/s]

/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=47` in the `DataLoader` to improve performance.


                                                                           

/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=47` in the `DataLoader` to improve performance.
/usr/local/lib/python3.10/dist-packages/lightning/pytorch/loops/fit_loop.py:293: The number of training batches (22) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.


Epoch 299: 100%|██████████| 22/22 [00:00<00:00, 64.10it/s, v_num=11, val_loss=13.80, val_acc=0.568, train_loss=0.0279] 

`Trainer.fit` stopped: `max_epochs=300` reached.


Epoch 299: 100%|██████████| 22/22 [00:00<00:00, 59.83it/s, v_num=11, val_loss=13.80, val_acc=0.568, train_loss=0.0279]


We can them test the model using the test data loader.

In [5]:
trainer.test(model, data_module)

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'test_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=47` in the `DataLoader` to improve performance.


Testing DataLoader 0: 100%|██████████| 3/3 [00:00<00:00, 89.65it/s] 


[{'test_loss': 0.626140832901001, 'test_acc': 0.9027777910232544}]

And if we want to test the model using the validation data loader, we also can use the `trainer.test` method, but passing the `val_dataloader`. 

In [6]:
data_module.setup("fit")
trainer.test(model, data_module.val_dataloader())

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'test_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=47` in the `DataLoader` to improve performance.


Testing DataLoader 0: 100%|██████████| 7/7 [00:00<00:00, 167.33it/s]


[{'test_loss': 13.804328918457031, 'test_acc': 0.5680751204490662}]