# Supervised Vision Pre-training

The framework allows to pre-train vision models in supervised fashion using classification datasets.
For this example usage we will be using the [`BigEarthNet DataModule`](extra/bigearthnet.ipynb) inside a [`pytorch lightning`](https://pytorch-lightning.readthedocs.io/en/stable/) trainer. The network will be integrated into a [`LightningModule`](https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html) to release us from writing training loop etc.

First we start by importing the basics we need from `torch` and `pytorch_lightning` that are needed to set up the `LightningModule`.

In [11]:
# remove-output
import pytorch_lightning as pl
import torch
import torch.nn.functional as F
from torch import optim

from configvlm import ConfigVLM

## Pytorch Lightning Module
The `Module` we use to encapsulate the model divides the usual loop into functions that are called internally by `pytorch_lightning`. The necessary functions are just `training_step` and `configure_optimizer`, but to have a fully functional script, we add the validation and test steps as well as evaluation of the validation and test results. All `_step` functions are working on a single batch while `_epoch_end` functions are called after all batches are used and are passed a list of all return values of their respective `_step` functions.

In [12]:
class LitVQAEncoder(pl.LightningModule):
    def __init__(
        self,
        config: ConfigVLM.VLMConfiguration,
        lr: float = 1e-3,
    ):
        super().__init__()
        self.lr = lr
        self.config = config
        self.model = ConfigVLM.ConfigVLM(config)

    def training_step(self, batch, batch_idx):
        x, y = batch
        x_hat = self.model(x)
        loss = F.binary_cross_entropy_with_logits(x_hat, y)
        self.log("train/loss", loss)
        return {"loss": loss}

    def configure_optimizers(self):
        optimizer = optim.AdamW(self.parameters(), lr=self.lr, weight_decay=0.01)
        return optimizer

    # ============== NON-MANDATORY-FUNCTION ===============

    def validation_step(self, batch, batch_idx):
        x, y = batch
        x_hat = self.model(x)
        loss = F.binary_cross_entropy_with_logits(x_hat, y)
        return {"loss": loss, "outputs": x_hat, "labels": y}

    def validation_epoch_end(self, outputs):
        avg_loss = torch.stack([x["loss"] for x in outputs]).mean()
        self.log("val/loss", avg_loss)

    def test_step(self, batch, batch_idx):
        x, y = batch
        x_hat = self.model(x)
        loss = F.binary_cross_entropy_with_logits(x_hat, y)
        return {"loss": loss, "outputs": x_hat, "labels": y}

    def test_epoch_end(self, outputs):
        avg_loss = torch.stack([x["loss"] for x in outputs]).mean()
        self.log("test/loss", avg_loss)

## Configuring
Now that we have our model, we will use the `pytorch_lightning.Trainer` to run our loops. Results are logged to `tensorboard`.

We start by importing some callbacks used during training

In [13]:
from pytorch_lightning.loggers import TensorBoardLogger
from configvlm.ConfigVLM import VLMConfiguration

as well as defining our hyperparameters.

In [14]:
model_name = "resnet18"
seed = 42
number_of_channels = 12
image_size = 120
epochs = 4
lr = 5e-4

Then we create the configuration for usage in model creation later and the logger.

In [15]:
# remove-output
# seed for pytorch, numpy, python.random, Dataloader workers, spawned subprocesses
pl.seed_everything(seed, workers=True)

model_config = VLMConfiguration(
    timm_model_name=model_name,
    hf_model_name=None,
    classes=19,
    image_size=image_size,
    channels=number_of_channels,
    network_type=ConfigVLM.VLMType.VISION_CLASSIFICATION
)

logger = TensorBoardLogger(
    save_dir="./tb_logs",
    name="Classification Test Model",
    version="testversion"
)

Global seed set to 42


We log the hyperparameters and create a [Trainer](https://pytorch-lightning.readthedocs.io/en/stable/common/trainer.html).

In [16]:
# remove-output
trainer = pl.Trainer(
    max_epochs=epochs,
    accelerator="auto",
    logger=logger,
    log_every_n_steps=1,
)

logger.log_hyperparams({
    "Model Name": "Classification Test Model",
    "Seed": seed,
    "Epochs": epochs,
    "Channels": number_of_channels,
    "Image Size": image_size,
    "GPU": torch.cuda.get_device_name() if torch.cuda.is_available() else "-",
    "Learning Rate": lr,
})

GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


## Creating Model + Dataset
Finally, we create the model defined above and our datamodule

In [17]:
# remove-input
# remove-output
import pathlib
my_data_path = str(pathlib.Path("").resolve().parent.joinpath("configvlm").joinpath("extra").joinpath("mock_data").resolve(strict=True))
# set precision on Ampere cards to bfloat16
torch.set_float32_matmul_precision('medium')

In [18]:
# remove-output
from configvlm.extra.BEN_DataModule_LMDB_Encoder import BENDataModule
model = LitVQAEncoder(config=model_config, lr=lr)
dm = BENDataModule(
    data_dir=my_data_path,
    img_size=(number_of_channels, image_size, image_size),
    num_workers_dataloader=4,
)

Dataloader using 4 workers


## Running
Now we just have to call the `fit()` and optionally the `test()` functions.

:::{note}
These calls generate quite a bit of output depending on the number of batches and epochs. The output is removed for readability.
:::

In [19]:
# remove-output
trainer.fit(model, datamodule=dm)

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name  | Type      | Params
------------------------------------
0 | model | ConfigVLM | 11.2 M
------------------------------------
11.2 M    Trainable params
0         Non-trainable params
11.2 M    Total params
44.858    Total estimated model params size (MB)


(12:21:51) Datamodule setup called
Loading BEN data for train...
    25 patches indexed
    25 filtered patches indexed
Loading BEN data for val...
    25 patches indexed
    25 filtered patches indexed
setup took 0.00 seconds
  Total training samples:       25  Total validation samples:       25
Epoch 0:  50%|█████     | 2/4 [00:00<00:00,  7.77it/s, loss=0.705, v_num=sion]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/2 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/2 [00:00<?, ?it/s][A
Epoch 0:  75%|███████▌  | 3/4 [00:00<00:00,  6.14it/s, loss=0.705, v_num=sion]
Epoch 0: 100%|██████████| 4/4 [00:00<00:00,  8.03it/s, loss=0.705, v_num=sion]
Epoch 1:  50%|█████     | 2/4 [00:00<00:00,  7.11it/s, loss=0.666, v_num=sion]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/2 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/2 [00:00<?, ?it/s][A
Epoch 1:  75%|███████▌  | 3/4 [00:00<00:00,  6.04it/s, loss=0.666, v_num=sion]
E

`Trainer.fit` stopped: `max_epochs=4` reached.


Epoch 3: 100%|██████████| 4/4 [00:00<00:00,  5.77it/s, loss=0.608, v_num=sion]


In [20]:
# remove-output
trainer.test(model, datamodule=dm)

(12:21:54) Datamodule setup called
Loading BEN data for test...
    25 patches indexed
    25 filtered patches indexed
setup took 0.00 seconds
  Total test samples:       25


LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Testing DataLoader 0: 100%|██████████| 2/2 [00:00<00:00, 98.35it/s] 


[{'test/loss': 0.6129004955291748}]