# Prov4ML InterTwinAI Logger Example

This notebook is a simple example of how to use Prov4ML with the InterTwinAI logger interface and MNIST dataset. The task is simple digit classification using an MLP model. 
In this notebook the main functionalities of the logger are presented, while in a *normal* use case the logger would be automatically called by the InterTwinAI platform.

#### Importing the necessary libraries and defining constants

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader, Subset
from torchvision.datasets import MNIST
from torchvision import transforms
from tqdm import tqdm

import prov4ml

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
PATH_DATASETS = "./data"
BATCH_SIZE = 64
EPOCHS = 2

#### Define the logger and create context

Initialize a new run within an experiment and start logging provenance data. 
This call specifies a user namespace, naming the experiment, defining the directory for saving provenance logs, and setting the logging frequency. 
 - **prov_user_namespace**: The unique identifier for the user or organization, ensuring the provenance data is correctly attributed.
 - **experiment_name**: The name of the experiment, used to group related runs together.
 - **provenance_save_dir**: The directory where the provenance logs are stored.
 - **save_after_n_logs**: The interval for saving logs to file, to empty the variables saved in memory.

The logger can be passed to the InterTwinAI workflow, which will automatically call the logger `create_logger_context` function. 
In this example, we will manually call the function to demonstrate the logger's functionalities.

In [4]:
logger = prov4ml.ProvMLItwinAILogger(
    prov_user_namespace="www.example.org",
    experiment_name="experiment_name",
    provenance_save_dir="prov",
    save_after_n_logs=100,
)
logger.create_logger_context()

#### Define the model and dataset classes

Prov4ml allows to log various metrics and parameters to ensure comprehensive tracking of the experiment’s provenance.
In the same way, the dataset transformations, datasets and data loaders can be defined, and prov4ml allows logging of relevant information through the same `log` function.  

In [3]:
class MNISTModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.model = torch.nn.Sequential(
            torch.nn.Linear(28 * 28, 10), 
        )

    def forward(self, x):
        return self.model(x.view(x.size(0), -1))

In [5]:
tform = transforms.Compose([
    transforms.RandomRotation(10), 
    transforms.RandomHorizontalFlip(),
    transforms.RandomVerticalFlip(),
    transforms.ToTensor()
])

logger.log(item=tform, identifier="dataset transformation", kind=prov4ml.LoggingItemKind.PARAMETER)

train_ds = MNIST(PATH_DATASETS, train=True, download=True, transform=tform)
test_ds = MNIST(PATH_DATASETS, train=False, download=True, transform=tform)
train_ds = Subset(train_ds, range(BATCH_SIZE*4))
test_ds = Subset(test_ds, range(BATCH_SIZE*2))
train_loader = DataLoader(train_ds, batch_size=BATCH_SIZE)
test_loader = DataLoader(test_ds, batch_size=BATCH_SIZE)

logger.log(item=train_loader, identifier="train_dataset", kind=prov4ml.LoggingItemKind.PARAMETER)
logger.log(item=test_loader, identifier="train_dataset", kind=prov4ml.LoggingItemKind.PARAMETER)


#### Train the model

Train the MNIST model using PyTorch, then log the final model version using prov4ml, and evaluate the model on the test dataset.

In [6]:
mnist_model = MNISTModel()

optim = torch.optim.Adam(mnist_model.parameters(), lr=0.0002)
logger.log(item=optim, identifier="optimizer", kind=prov4ml.LoggingItemKind.PARAMETER)

for epoch in tqdm(range(EPOCHS)):
    for i, (x, y) in enumerate(train_loader):
        optim.zero_grad()
        y_hat = mnist_model(x)
        loss = F.cross_entropy(y_hat, y)
        loss.backward()
        optim.step()
        logger.log(item=loss.item(), identifier="MSE_train", kind=prov4ml.LoggingItemKind.METRIC, context=prov4ml.Context.TRAINING, step=epoch)
    
    logger.log(item=epoch, identifier="epoch", kind=prov4ml.LoggingItemKind.CARBON_METRIC, context=prov4ml.Context.TRAINING, step=epoch)
    logger.log(item=epoch, identifier="epoch", kind=prov4ml.LoggingItemKind.SYSTEM_METRIC, context=prov4ml.Context.TRAINING, step=epoch)
    logger.log(item=mnist_model, identifier=f"mnist_model_version_{epoch}", kind=prov4ml.LoggingItemKind.MODEL_VERSION, context=prov4ml.Context.TRAINING, step=epoch)


for i, (x, y) in tqdm(enumerate(test_loader)):
    y_hat = mnist_model(x)
    loss = F.cross_entropy(y_hat, y)
    logger.log(item=loss.item(), identifier="MSE_test", kind=prov4ml.LoggingItemKind.METRIC, context=prov4ml.Context.EVALUATION, step=epoch)

logger.log(item=mnist_model, identifier="mnist_model_final", kind=prov4ml.LoggingItemKind.FINAL_MODEL_VERSION)

100%|██████████| 2/2 [00:00<00:00, 29.77it/s]
2it [00:00, 237.22it/s]


#### Close the logger and save to ProvJSON

Save the provenance data to a ProvJSON file for further analysis and visualization. 

In [7]:
logger.destroy_logger_context()

fatal: not a git repository (or any of the parent directories): .git


Git not found, skipping commit hash retrieval
