# Documentation

In this exercise, you will implement the `modlee` package to document an image segmentation experiment with a [U-Net](https://en.wikipedia.org/wiki/U-Net).

In [1]:
# Boilerplate imports
import lightning.pytorch as pl
import torch.nn.functional as F
import torch.nn as nn
import torch
from torch.utils.data import DataLoader
import torchvision
from torchvision import transforms
from torchvision.transforms.functional import InterpolationMode
import os
import ssl
ssl._create_default_https_context = ssl._create_unverified_context


imagenet_mean = [0.485, 0.456, 0.406]  # mean of the imagenet dataset for normalizing
imagenet_std = [0.229, 0.224, 0.225]  # std of the imagenet dataset for normalizing
# set_seeds(43)

In the next cell, import `modlee` and initialize with an API key.

In [2]:
# Your code goes here. Import the modlee package and initialize with your API key.
import modlee
modlee.init(api_key="modleemichael")



In [3]:
modlee.modlee_client.__dict__

{'origin': 'http://ec2-3-84-155-233.compute-1.amazonaws.com:7070',
 'get_object': <bound method ModleeClient.get_callable of <modlee.client.ModleeClient object at 0x7fe1e4457b80>>,
 'get_function': <bound method ModleeClient.get_callable of <modlee.client.ModleeClient object at 0x7fe1e4457b80>>,
 'timeout': 3,
 'api_key': 'modleemichael'}

Load the training data.

In [4]:
def replace_tensor_value_(tensor, a, b):
    tensor[tensor == a] = b
    return tensor

input_resize = transforms.Resize((224, 224))
input_transform = transforms.Compose(
    [
        input_resize,
        transforms.ToTensor(),
        transforms.Normalize(imagenet_mean, imagenet_std),
    ]
)

target_resize = transforms.Resize((224, 224), interpolation=InterpolationMode.NEAREST)
target_transform = transforms.Compose(
    [
        target_resize,
        transforms.PILToTensor(),
        transforms.Lambda(lambda x: replace_tensor_value_(x.squeeze(0).long(), 255, 21)),
    ]
)

# Creating the dataset
train_dataset = torchvision.datasets.VOCSegmentation(
    './datasets/',
    year='2007',
    download=True,
    image_set='val',
    transform=input_transform,
    target_transform=target_transform,
)
val_dataset = torchvision.datasets.VOCSegmentation(
    './datasets/',
    year='2007',
    download=True,
    image_set='val',
    transform=input_transform,
    target_transform=target_transform,
)

BATCH_SIZE = 16
train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=True)

Using downloaded and verified file: ./datasets/VOCtrainval_06-Nov-2007.tar
Extracting ./datasets/VOCtrainval_06-Nov-2007.tar to ./datasets/
Using downloaded and verified file: ./datasets/VOCtrainval_06-Nov-2007.tar
Extracting ./datasets/VOCtrainval_06-Nov-2007.tar to ./datasets/


Create the U-Net.

In the next cell, wrap the model defined above in a `modlee.model.ModleeModel` object.
At minimum, you must define the `__init__()`, `forward()`, `training_step()`, and `configure_optimizers()` functions.
Refer to the [Lightning documentation](https://lightning.ai/docs/pytorch/stable/starter/introduction.html) for a refresher.

In [5]:
class ModleeUNet(modlee.model.ModleeModel):
    def __init__(self):                # Fill out the constructor
        # Fill out the constructor
        super().__init__()
        # self.model = UNet()
        self.model = torchvision.models.segmentation.fcn_resnet50(num_classes=22)
        pass
    
    def forward(self, x):
        # Fill out the forward pass
        return self.model(x)
        pass
    
    def training_step(self, batch, batch_idx):
        # Fill out the training step
        x, y_target = batch
        
        y_pred = self(x)['out']
        # print(y_pred)
        loss = F.cross_entropy(y_pred, y_target)
        return loss
        pass
    
    def configure_optimizers(self):
        # Fill out the optimizer configuration
        return torch.optim.Adam(
            self.parameters(), 
            lr=0.001,
        )
        pass
    
model = ModleeUNet()

In [6]:
batch = next(iter(train_loader))
print(batch[1].shape)
print(batch[0].shape)
print(batch[1].max(), batch[1].min())
y_out = model(batch[0])
print(y_out['out'].shape)

torch.Size([16, 224, 224])
torch.Size([16, 3, 224, 224])
tensor(21) tensor(0)
torch.Size([16, 22, 224, 224])


In the next cell, start training within a `modlee.start_run()` [context manager](https://realpython.com/python-with-statement/).
Refer to [`mlflow`'s implementation](https://mlflow.org/docs/latest/python_api/mlflow.html) as a refresher. 

In [7]:
# Your code goes here. Star training within a modlee.start_run() context manager
with modlee.start_run() as run:
    trainer = pl.Trainer(max_epochs=1)
    trainer.fit(
        model=model,
        train_dataloaders=train_loader,
    )

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name  | Type | Params
-------------------------------
0 | model | FCN  | 33.0 M
-------------------------------
33.0 M    Trainable params
0         Non-trainable params
33.0 M    Total params
131.830   Total estimated model params size (MB)
  rank_zero_warn(


Training: 0it [00:00, ?it/s]



AttributeError: 'NoneType' object has no attribute 'post_run'

`modlee` with `mlflow` underneath will document the experiment in an automatically generated `assets` folder. 

In [None]:
last_run_path = modlee.last_run_path()
print(f"Run path: {last_run_path}")
artifacts_path = os.path.join(last_run_path, 'artifacts')
artifacts = os.listdir(artifacts_path)
print(f"Saved artifacts: {artifacts}")

Run path: /home/ubuntu/projects/modlee_pypi/examples/mlruns/0/297bc8969bb64235bbfa0824f20dd24b/artifacts/mlruns/0/0729922d48c14c8b9c78f3b15ca962e3/artifacts/mlruns/0/910e72fc9bef4e958046ffc5fe3e3585
Saved artifacts: ['model_graph.py', 'model_graph.txt', 'model_size', 'model', 'cached_vars', 'stats_rep', 'snapshot_1.npy', 'snapshot_0.npy', 'model.py', 'loss_calls.txt', 'model_summary.txt']


We can build the model from the cached `model_graph.Model` class and confirm that we can pass an input through it.
Note that this model's weights will be uninitialized.
To load the model from the last checkpoint, we can load it directly from the cached `model.pth`.

Rebuild the saved model.
First, determine the path to the most recent run.

In [None]:
last_run_path = # Get the most recent run
artifacts_path = os.path.join(last_run_path, 'artifacts')

Next, import the model from the assets saved in the `artifacts/` directory.

In [None]:
os.chdir(artifacts_path)

import # the model graph
rebuilt_model = # Construct the model

# Pass an input through the model
x, _ = next(iter(train_loader))
y_rebuilt = rebuilt_model(x)

You've reached the end of the tutorial and can now implement `modlee` into your machine learning experiments.
Congratulations!