# **Modlee Experiment Documentation Example Walkthrough**

In this walkthrough, we will demonstrate how to train a machine learning model to classify images of clothing items from the `Fashion MNIST` dataset. We'll also document the experiment using the `modlee package`, which helps keep track of our model's training and validation processes.

We'll go through the process step-by-step, including importing necessary libraries, setting up the dataset, building a custom convolutional neural network model, using Modlee for automatic experiment documentation, and training the model while saving the training metadata and artifacts.

## Tips

For best performance, ensure that the runtime is set to use a GPU (`Runtime > Change runtime type > T4 GPU`).

## Help & Questions

If you have any questions, please reachout on our [Discord](https://discord.gg/dncQwFdN9m).

You can also use our [documenation](https://docs.modlee.ai/README.html) as a reference for using our package.

# **Environment Setup**
## Step 1:

First, we need to make sure that we have the necessary packages installed. We will need `modlee` and its related packages.


In [None]:
# Install required packages
!pip install modlee torch torchvision pytorch-lightning

# This should take a few minutes, thanks for your patience!

## Step 2:

We will import the necessary libraries, including `modlee` for automating experiment documentation and `torch` for handling neural networks.

We will also set our Modlee API key and initialize the Modlee package.
Make sure that you have a Modlee account and an API key [from the dashboard](https://www.dashboard.modlee.ai/).
Replace `replace-with-your-api-key` with your API key.

In [1]:
# Boilerplate imports
import os, sys
import ssl
ssl._create_default_https_context = ssl._create_unverified_context # Disable SSL verification
import lightning.pytorch as pl
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# Set the API key to an environment variable
os.environ['MODLEE_API_KEY'] = "OktSzjtS27JkuFiqpuzzyZCORw88Cz0P"

# Modlee-specific imports
import modlee

# Initialize Modlee with the API key
modlee.init(api_key=os.environ['MODLEE_API_KEY'])

# **Load the Data**

We will load the Fashion MNIST dataset, converting the grayscale images to RGB (3 channels) to be compatible with our model, and create DataLoaders for training and validation. This helps get the data ready for training and validation.



In [3]:
# Get Fashion MNIST, and convert from grayscale to RGB for compatibility with the model
train_dataloader, val_dataloader = modlee.utils.get_fashion_mnist(num_output_channels=3)

# Get the number of classes in the dataset (10 classes for Fashion MNIST)
num_classes = len(train_dataloader.dataset.classes)

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to .data/FashionMNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 26421880/26421880 [00:16<00:00, 1634899.56it/s]


Extracting .data/FashionMNIST/raw/train-images-idx3-ubyte.gz to .data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to .data/FashionMNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 29515/29515 [00:00<00:00, 188752.28it/s]


Extracting .data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to .data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to .data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 4422102/4422102 [00:07<00:00, 601054.88it/s] 


Extracting .data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to .data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to .data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 5148/5148 [00:00<00:00, 5237030.56it/s]

Extracting .data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to .data/FashionMNIST/raw






# **Build the Model**

We create a CNN model using a pre-trained ResNet-18 architecture from torchvision and wrap it in a custom class `ModleeClassifier` which extends `modlee.model.ModleeModel`. This class defines how the model processes data, handles training and validation steps, and sets up the optimizer.




In [5]:
# Use a pretrained torchvision ResNet
classifier_model = torchvision.models.resnet18(num_classes=10)

# Subclass the ModleeModel class to enable automatic documentation
class ModleeClassifier(modlee.model.ImageClassificationModleeModel):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.model = classifier_model
        # Define the loss function as cross-entropy loss
        self.loss_fn = F.cross_entropy

    def forward(self, x):
        return self.model(x)

    # Define the training step
    def training_step(self, batch, batch_idx):
        x, y_target = batch
        y_pred = self(x)
        loss = self.loss_fn(y_pred, y_target)
        return {"loss": loss}

    # Define the validation step
    def validation_step(self, val_batch, batch_idx):
        x, y_target = val_batch
        y_pred = self(x)
        val_loss = self.loss_fn(y_pred, y_target)
        return {'val_loss': val_loss}

    # Set up the optimizer for training
    def configure_optimizers(self):
        optimizer = torch.optim.SGD(self.parameters(), lr=0.001, momentum=0.9)
        return optimizer

# Create an instance of the model wrapped in Modlee's documentation class
modlee_model = ModleeClassifier()

# **Training the Model**

The next step is to train the model using PyTorch Lightning. The `Trainer` object from `PyTorch Lightning` runs the training of `modlee_model` over one epoch.

In [6]:
with modlee.start_run() as run:
    # Create a PyTorch Lightning trainer and set it to train for 1 epoch
    trainer = pl.Trainer(max_epochs=1)

    # Train the model using the training and validation data loaders
    trainer.fit(
        model=modlee_model,
        train_dataloaders=train_dataloader,
        val_dataloaders=val_dataloader
    )


  | Name  | Type   | Params | Mode 
-----------------------------------------
0 | model | ResNet | 11.2 M | train
-----------------------------------------
11.2 M    Trainable params
0         Non-trainable params
11.2 M    Total params
44.727    Total estimated model params size (MB)


Training: |          | 0/? [00:00<?, ?it/s]                                

INFO:Logging data metafeatures with <class 'modlee.data_metafeatures.ImageDataMetafeatures'>
INFO:Logging model metafeatures...
INFO:Logging model as code (model_graph.py) and text (model_graph.txt)...


Epoch 0: 100%|██████████| 938/938 [00:34<00:00, 27.50it/s, v_num=4]


# **Document and Inspect Artifacts**

After training, we inspect the artifacts saved by Modlee, including the model graph and various statistics.

In [7]:
# Get the path to the last run's saved data
last_run_path = modlee.last_run_path()
print(f"Run path: {last_run_path}")

# Get the path to the saved artifacts
artifacts_path = os.path.join(last_run_path, 'artifacts')
artifacts = os.listdir(artifacts_path)
print(f"Saved artifacts: {artifacts}")

# Set the artifacts path as an environment variable
os.environ['ARTIFACTS_PATH'] = artifacts_path

# Add the artifacts directory to the system path
sys.path.insert(0, artifacts_path)

Run path: /Users/mansiagrawal/Documents/modlee_pypi/src/modlee/notebooks_tests/mlruns/0/d3285e8aec274a70855b919da0af90e6
Saved artifacts: ['model_metafeatures', 'model_size', 'model_summary.txt', 'checkpoints', 'model.py', 'cached_vars', 'transforms.txt', 'stats_rep', 'model', 'model_graph.py', 'model_graph.txt', 'data_metafeatures']


In [8]:
# Print out the first few lines of the model
print("Model graph:")
!sed -n -e 1,15p $ARTIFACTS_PATH/model_graph.py
!echo "        ..."
!sed -n -e 58,68p $ARTIFACTS_PATH/model_graph.py
!echo "        ..."

Model graph:

import torch, onnx2torch
from torch import tensor
class Model(torch.nn.Module):
    
    def __init__(self):
        super().__init__()
        setattr(self,'Conv', torch.nn.modules.conv.Conv2d(**{'in_channels':3,'out_channels':64,'kernel_size':(7, 7),'stride':(2, 2),'padding':(3, 3),'dilation':(1, 1),'groups':1,'padding_mode':'zeros'}))
        setattr(self,'Relu', torch.nn.modules.activation.ReLU(**{'inplace':False}))
        setattr(self,'MaxPool', torch.nn.modules.pooling.MaxPool2d(**{'kernel_size':[3, 3],'stride':[2, 2],'padding':[1, 1],'dilation':[1, 1],'return_indices':False,'ceil_mode':False}))
        setattr(self,'Conv_1', torch.nn.modules.conv.Conv2d(**{'in_channels':64,'out_channels':64,'kernel_size':(3, 3),'stride':(1, 1),'padding':(1, 1),'dilation':(1, 1),'groups':1,'padding_mode':'zeros'}))
        setattr(self,'Relu_1', torch.nn.modules.activation.ReLU(**{'inplace':False}))
        setattr(self,'Conv_2', torch.nn.modules.conv.Conv2d(**{'in_channels':64,'ou

In [9]:
# Print the first lines of the data metafeatures
print("Data metafeatures:")
!head -20 $ARTIFACTS_PATH/stats_rep

Data metafeatures:
{
  "dataset_size": 60032,
  "num_sample": 1000,
  "batch_element_0": {
    "resnet18": {
      "feature_shape": [
        1,
        1
      ],
      "stats": {
        "kmeans": {}
      },
      "time_taken": "0.00019502639770507812"
    },
    "vgg16": {
      "feature_shape": [
        1,
        1
      ],
      "stats": {


# **Rebuild and Validate the Model**

We rebuild the model from the saved graph and validate that it produces the same output shapes as the original model.

In [10]:
import model_graph

# Rebuild the model from the saved graph
rebuilt_model = model_graph.Model()

# Set both models to evaluation mode
modlee_model.eval(); rebuilt_model.eval()

Model(
  (Conv): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3))
  (Relu): ReLU()
  (MaxPool): MaxPool2d(kernel_size=[3, 3], stride=[2, 2], padding=[1, 1], dilation=[1, 1], ceil_mode=False)
  (Conv_1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (Relu_1): ReLU()
  (Conv_2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (Add): OnnxBinaryMathOperation()
  (Relu_2): ReLU()
  (Conv_3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (Relu_3): ReLU()
  (Conv_4): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (Add_1): OnnxBinaryMathOperation()
  (Relu_4): ReLU()
  (Conv_5): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
  (Relu_5): ReLU()
  (Conv_6): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (Conv_7): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2))
  (Add_2): OnnxBinaryMathOperation()
  (Relu_6): ReLU()
  (Conv_8): Conv2d(128, 128, kernel_

In [11]:
# Get a batch from the training loader
x, y = next(iter(train_dataloader))

with torch.no_grad():
    # Get predictions from the original model
    y_original = modlee_model(x)
    # Get predictions from the rebuilt model
    y_rebuilt = rebuilt_model(x)
assert y_original.shape == y_rebuilt.shape # Ensure the output shapes match

print(f"Original input and output shapes: {x.shape}, {y_original.shape}")
print(f"Output shape from module-rebuilt model: {y_rebuilt.shape}")

Original input and output shapes: torch.Size([64, 3, 28, 28]), torch.Size([64, 10])
Output shape from module-rebuilt model: torch.Size([64, 10])


# **Load the Model from Checkpoint**

Another way is to load the model from a saved checkpoint and validate it produces the same output shapes as the original model.

In [12]:
# Reloading from the checkpoint
reloaded_model = torch.load(os.path.join(artifacts_path, 'model', 'data','model.pth'))

# Get predictions from the reloaded model
y_reloaded = reloaded_model(x)

#Ensure the output shapes match
assert y_original.shape == y_reloaded.shape
print(f"Output shape from checkpoint-reloaded model: {y_reloaded.shape}")

Output shape from checkpoint-reloaded model: torch.Size([64, 10])


# **Great Work!**

We've completed the setup, imported the necessary libraries, set up the dataset using Fashion MNIST, built a custom convolutional neural network model, utilized Modlee for automatic experiment documentation, and trained our model. Keep experimenting and learning!