# Prov4ML Pytorch Lightning MNIST Example

This notebook is a simple example of how to use Prov4ML with Pytorch Lightning and MNIST dataset. The task is simple digit classification using an MLP model.

#### Importing Libraries and Constants

In [1]:
import lightning as L
import torch
import torch.nn.functional as F
from torchvision.datasets import MNIST
from torchvision import transforms
from torch.utils.data import DataLoader, Subset

import prov4ml

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
PATH_DATASETS = "./data"
BATCH_SIZE = 64
EPOCHS = 1

#### Create Run, Experiment and start logging

Initialize a new run within an experiment and start logging provenance data. 
This call specifies a user namespace, naming the experiment, defining the directory for saving provenance logs, and setting the logging frequency. 
 - **prov_user_namespace**: The unique identifier for the user or organization, ensuring the provenance data is correctly attributed.
 - **experiment_name**: The name of the experiment, used to group related runs together.
 - **provenance_save_dir**: The directory where the provenance logs are stored.
 - **save_after_n_logs**: The interval for saving logs to file, to empty the variables saved in memory.

In [None]:
prov4ml.start_run(
    prov_user_namespace="www.example.org",
    experiment_name="experiment_name", 
    provenance_save_dir="prov",
    save_after_n_logs=100,
)

#### Define the Pytorch Lightning Model

Prov4ml allows to log various metrics and parameters to ensure comprehensive tracking of the experiment’s provenance.
- **log_metric**: Logs a metric value to the provenance data, keeping track of the value, time, epoch and context.
- **log_parameters**:  Logs the parameters used in the experiment to the provenance data.

In [3]:
class MNISTModel(L.LightningModule):
    def __init__(self):
        super().__init__()
        self.model = torch.nn.Sequential(
            torch.nn.Linear(28 * 28, 10), 
        )

    def forward(self, x):
        return self.model(x.view(x.size(0), -1))

  
    def training_step(self, batch, _):
        x, y = batch
        loss = F.cross_entropy(self(x), y)
        prov4ml.log_metric("MSE", loss, prov4ml.Context.TRAINING, step=self.current_epoch)
        return loss
    
    def validation_step(self, batch, _):
        x, y = batch
        loss = F.cross_entropy(self(x), y)
        prov4ml.log_metric("MSE", loss, prov4ml.Context.VALIDATION, step=self.current_epoch)
        return loss
    
    def test_step(self, batch, _):
        x, y = batch
        loss = F.cross_entropy(self(x), y)
        prov4ml.log_metric("MSE",loss,prov4ml.Context.EVALUATION,step=self.current_epoch)
        return loss
    
    def on_train_epoch_end(self) -> None:
        prov4ml.log_metric("epoch", self.current_epoch, prov4ml.Context.TRAINING, step=self.current_epoch)
        prov4ml.log_system_metrics(prov4ml.Context.TRAINING,step=self.current_epoch)
        prov4ml.log_carbon_metrics(prov4ml.Context.TRAINING,step=self.current_epoch)
        prov4ml.save_model_version(self, f"model_version_{self.current_epoch}", prov4ml.Context.TRAINING, step=self.current_epoch)
        prov4ml.log_current_execution_time("train_epoch_time", prov4ml.Context.TRAINING, self.current_epoch)

    def configure_optimizers(self):
        optim = torch.optim.Adam(self.parameters(), lr=0.0002)
        prov4ml.log_param("optimizer", optim)
        return optim


#### Dataset and DataLoader definition, instantiate the model and the trainer

When defining the dataset transformations, datasets and data loaders, prov4ml allows logging of relevant information through the `log_dataset`  and `log_param` functions. 
- **log_dataset**: Logs various information extracted from the dataset used in the experiment.

In [5]:

tform = transforms.Compose([
    transforms.RandomRotation(10), 
    transforms.RandomHorizontalFlip(),
    transforms.RandomVerticalFlip(),
    transforms.ToTensor()
])
# log the dataset transformation as one-time parameter
prov4ml.log_param("dataset_transformation", tform)

train_ds = MNIST(PATH_DATASETS, train=True, download=True, transform=tform)
test_ds = MNIST(PATH_DATASETS, train=False, download=True, transform=tform)
val_ds = Subset(train_ds, range(BATCH_SIZE * 1))
train_ds = Subset(train_ds, range(BATCH_SIZE * 2))
test_ds = Subset(test_ds, range(BATCH_SIZE * 2))

train_loader = DataLoader(train_ds, batch_size=BATCH_SIZE)
val_loader = DataLoader(val_ds, batch_size=BATCH_SIZE)
test_loader = DataLoader(test_ds, batch_size=BATCH_SIZE)

prov4ml.log_dataset(train_loader, "train_dataset")
prov4ml.log_dataset(val_loader, "val_dataset")
prov4ml.log_dataset(test_loader, "test_dataset")

In [6]:
mnist_model = MNISTModel()
trainer = L.Trainer(
    accelerator="cpu",
    devices=1,
    max_epochs=EPOCHS,
    enable_checkpointing=False, 
)

GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


#### Train the model

Train the MNIST model using PyTorch Lightning Trainer, then log the final model version using prov4ml, and evaluate the model on the test dataset.
- **log_model**: Logs the model to the provenance data, including the model architecture, parameters, and weights.

In [7]:
trainer.fit(mnist_model, train_loader, val_dataloaders=val_loader)

prov4ml.log_model(mnist_model, "model_version_final")

result = trainer.test(mnist_model, test_loader)

Missing logger folder: /Users/gabrielepadovani/Desktop/Università/PhD/provenance/lightning_logs

  | Name  | Type       | Params
-------------------------------------
0 | model | Sequential | 7.9 K 
-------------------------------------
7.9 K     Trainable params
0         Non-trainable params
7.9 K     Total params
0.031     Total estimated model params size (MB)


                                                                           

/opt/homebrew/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=9` in the `DataLoader` to improve performance.
/opt/homebrew/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=9` in the `DataLoader` to improve performance.
/opt/homebrew/lib/python3.11/site-packages/lightning/pytorch/loops/fit_loop.py:298: The number of training batches (2) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.


Epoch 0: 100%|██████████| 2/2 [00:00<00:00, 17.65it/s, v_num=0]

`Trainer.fit` stopped: `max_epochs=1` reached.


Epoch 0: 100%|██████████| 2/2 [00:00<00:00, 17.11it/s, v_num=0]


/opt/homebrew/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'test_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=9` in the `DataLoader` to improve performance.


Testing DataLoader 0: 100%|██████████| 2/2 [00:00<00:00, 106.20it/s]


#### Save the training information to ProvJSON

Save the provenance data to a ProvJSON file for further analysis and visualization. 

In [8]:
prov4ml.end_run(create_graph=True, create_svg=True)

Git not found, skipping commit hash retrieval


fatal: not a git repository (or any of the parent directories): .git
