# Supervised Vision Pre-training

The framework allows to pre-train vision models in supervised fashion using classification datasets.
For this example usage we will be using the [`BigEarthNet DataModule`](extra/bigearthnet.ipynb) inside a [`pytorch lightning`](https://pytorch-lightning.readthedocs.io/en/stable/) trainer. The network will be integrated into a [`LightningModule`](https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html) to release us from writing training loop etc.

First we start by importing the basics we need from `torch` and `pytorch_lightning` that are needed to set up the `LightningModule`.

In [1]:
import pytorch_lightning as pl
import torch
import torch.nn.functional as F
from torch import optim
from torchmetrics.classification import MultilabelF1Score

from configvlm import ConfigVLM

  from .autonotebook import tqdm as notebook_tqdm


The `Module` we use to encapsulate the model divides the usual loop into functions that are called internally by `pytorch_lightning`. The necessary functions are just `training_step` and `configure_optimizer`, but to have a fully functional script, we add the validation and test steps as well as evaluation of the validation and test results. All `_step` functions are working on a single batch while `_epoch_end` functions are called after all batches are used and are passed a list of all return values of their respective `_step` functions.

In [2]:
class LitVQAEncoder(pl.LightningModule):
    def __init__(
        self,
        config: ConfigVLM.VLMConfiguration,
        lr: float = 1e-3,
    ):
        super().__init__()
        self.lr = lr
        self.model = ConfigVLM.ConfigVLM(config)

    def training_step(self, batch, batch_idx):
        x, y = batch
        x_hat = self.model(x)
        loss = F.binary_cross_entropy_with_logits(x_hat, y)
        self.log("train/loss", loss)
        return {"loss": loss}

    def configure_optimizers(self):
        optimizer = optim.AdamW(self.parameters(), lr=self.lr, weight_decay=0.01)
        return optimizer

    # ============== NON-MANDATORY-FUNCTION ===============

    def validation_step(self, batch, batch_idx):
        x, y = batch
        x_hat = self.model(x)
        loss = F.binary_cross_entropy_with_logits(x_hat, y)
        return {"loss": loss, "outputs": x_hat, "labels": y}

    def validation_epoch_end(self, outputs):
        metrics = self.get_metrics(outputs)
        self.log("val/loss", metrics["avg_loss"])
        self.log("val/f1", metrics["avg_f1_score"])

    def test_step(self, batch, batch_idx):
        x, y = batch
        x_hat = self.model(x)
        loss = F.binary_cross_entropy_with_logits(x_hat, y)
        return {"loss": loss, "outputs": x_hat, "labels": y}

    def test_epoch_end(self, outputs):
        metrics = self.get_metrics(outputs)
        self.log("test/loss", metrics["avg_loss"])
        self.log("test/f1", metrics["avg_f1_score"])

    def get_metrics(self, outputs):
        avg_loss = torch.stack([x["loss"] for x in outputs]).mean()
        logits = torch.cat([x["outputs"].cpu() for x in outputs], 0)
        labels = torch.cat(
            [x["labels"].cpu() for x in outputs], 0
        )  # Tensor of size (#samples x classes)

        f1_score = MultilabelF1Score(num_labels=self.config.classes, average=None).to(
            logits.device
        )(logits, labels)

        avg_f1_score = float(
            torch.sum(f1_score) / self.config.classes
        )  # macro average f1 score

        return {
            "avg_loss": avg_loss,
            "avg_f1_score": avg_f1_score,
        }

Now that we have our model, we will use the `pytorch_lightning.Trainer` to run our loops. Results are logged to `tensorboard`.

We start by importing some callbacks used during training

In [3]:
from pytorch_lightning.callbacks import EarlyStopping
from pytorch_lightning.callbacks import ModelCheckpoint
from pytorch_lightning.loggers import TensorBoardLogger
from configvlm.ConfigVLM import VLMConfiguration

as well as defining our hyperparameters.

In [4]:
data_dir = None
model_name = "resnet18"
seed = 42
number_of_channels = 12
image_size = 120
batch_size = 32
num_workers = 4
max_img_index = 100 * batch_size
epochs = 10
val_epoch_interval = 5
early_stopping_patience = 5
lr = 5e-4
drop_rate = 0.2

Then we create the callbacks used and the logger.

In [5]:
# seed for pytorch, numpy, python.random, Dataloader workers, spawned subprocesses
pl.seed_everything(seed, workers=True)

model_config = VLMConfiguration(
    timm_model_name=model_name,
    hf_model_name=None,
    classes=19,
    image_size=image_size,
    channels=number_of_channels,
    drop_rate=drop_rate,
    network_type=ConfigVLM.VLMType.VISION_CLASSIFICATION
)

logger = TensorBoardLogger(
    save_dir="./tb_logs",
    name=f"{model_name} Test Model"
)

monitor = "val/f1"
monitor_str = "F1_Score"

# checkpointing
checkpoint_callback = ModelCheckpoint(
    monitor="val/f1",
    dirpath="./checkpoints",
    filename=f"{model_name}-seed="
    + str(seed)
    + "-epoch={epoch:03d}-"
    + f"{monitor_str}"
    + "={"
    + f"{monitor}"
    + ":.3f}",
    auto_insert_metric_name=False,
    save_top_k=1,
    mode="max",
    save_last=True,
)
early_stopping_callback = EarlyStopping(
    monitor=monitor,
    min_delta=0.00,
    patience=early_stopping_patience,
    verbose=False,
    mode="max",
)

Global seed set to 42


We log the hyperparameters and create a [Trainer](https://pytorch-lightning.readthedocs.io/en/stable/common/trainer.html).

In [6]:
trainer = pl.Trainer(
    max_epochs=epochs,
    accelerator="cpu",
    check_val_every_n_epoch=val_epoch_interval,
    logger=logger,
    log_every_n_steps=5,
    callbacks=[checkpoint_callback, early_stopping_callback],
)

logger.log_hyperparams({
    "Model Name": model_name,
    "Seed": seed,
    "Epochs": epochs,
    "Channels": number_of_channels,
    "Image Size": image_size,
    "Max. Image Index": max_img_index,
    "Batch Size": batch_size,
    "# Workers": num_workers,
    "GPU": torch.cuda.get_device_name() if torch.cuda.is_available() else "-",
    "Validation Interval": val_epoch_interval,
    "Early Stopping Patience": early_stopping_patience,
    "Learning Rate": lr,
    "Drop Rate": drop_rate,
})

GPU available: True (cuda), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
  rank_zero_warn(
  return torch._C._cuda_getDeviceCount() > 0


Finally, we create the model defined above and our datamodule

In [7]:
from configvlm.extra.BEN_DataModule_LMDB_Encoder import BENDataModule
import pathlib
model = LitVQAEncoder(config=model_config, lr=lr)
dm = BENDataModule(
    batch_size=batch_size,
    data_dir=str(pathlib.Path("").resolve().parent.joinpath("configvlm").joinpath("extra").joinpath("mock_data").resolve(strict=True)),
    img_size=(number_of_channels, image_size, image_size),
    num_workers_dataloader=num_workers,
    max_img_idx=max_img_index
)



Dataloader using 4 workers


In [8]:
trainer.fit(model, datamodule=dm)


  | Name  | Type      | Params
------------------------------------
0 | model | ConfigVLM | 11.2 M
------------------------------------
11.2 M    Trainable params
0         Non-trainable params
11.2 M    Total params
44.858    Total estimated model params size (MB)


(12:14:34) Datamodule setup called
Loading BEN data for train...
    25 patches indexed
    25 filtered patches indexed
Loading BEN data for val...
    25 patches indexed
    25 filtered patches indexed
setup took 0.00 seconds
  Total training samples:       25  Total validation samples:       25


RuntimeError: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero.

In [1]:
trainer.test(model, datamodule=dm, ckpt_path="best")

KeyError: 'WANDB_API_KEY'