# Deployment preparation

In this notebook we will perform several supplementary actions to prepare our segmentation model for the deployment. Whereas none of the following actions is strictly necessary, we recommend the presented approach, as it makes the deployment with contenerized Vitis AI tools easier in the next step. Sharing the main codebase between the repository and the Vitis AI container can be tricky, as the deployment environment is rather constrained. In this notebook we will prepare the model and quantization calibration data in advance.

The Vitis AI deployment procedure requires several elements:
* The model to be deployed (pretrained, stripped from any 3rd party dependencies if possible).
* A *quantization calibration dataset* derived from the training input samples that will be used by the model quantizer.
* It is possible (but not necessary) to evalaute the quantized model before it is deployed to the target platform; in this notebook we will prepare the test subset of our dataset for post-quantization evaluation too.

In [1]:
from pathlib import Path

import h5py
import pytorch_lightning as pl
import torch
from torchsummary import summary
from tqdm import tqdm

from sml_tutorials_ml_deployment.datasets import DeepGlobeLandCover, transforms
from sml_tutorials_ml_deployment.training import SegmentationTask
from sml_tutorials_ml_deployment.model import Unet

In [2]:
# Reproducibility
pl.seed_everything(42)

Seed set to 42


42

## Preparing the quantization dataset

The model quantizer needs to be fed with a dataset that is representative of the data that the model will be used with. To make the quantization process inside Vitis AI container easier we can prepare the quantization subset in advance and save it in a h5 file.

Vitis AI documentation advises to use a quantization dataset with 100 up to 1000 samples derived from the train subset. The train GT labels are not required in the quantization process.

In [3]:
QUANTIZATION_SAMPLES_NUM = 250

It is best to preprocess all samples now and store them in a h5 file ready for inference. If you use complex transforms in your preprocssing, it may be challanging to to run them in the Vitis AI container.

In [4]:
train_ds = DeepGlobeLandCover(root_dir=Path("../../dataset/deep_globe_patched"), split="training", transforms=transforms)
test_ds = DeepGlobeLandCover(root_dir=Path("../../dataset/deep_globe_patched"), split="test", transforms=transforms)
num_classes = len(train_ds.CLASSES)

Now we can draw 250 preprocessed sample inputs from our dataset. Notice that we only save model inputs – the ground truth masks are not needed by the quantizer.

In [5]:
output_dir = Path("../../output/03-quantize")
output_dir.mkdir(parents=True, exist_ok=True)

In [6]:
with h5py.File(output_dir / "quantization_samples.h5", "w") as f:
    calibration = f.create_group("calibration")
    test = f.create_group("test")

quantization_indices = torch.randperm(len(train_ds))[:QUANTIZATION_SAMPLES_NUM]
for sample_idx in tqdm(quantization_indices, desc="Quantization samples"):
    with h5py.File(output_dir / "quantization_samples.h5", "a") as f:
        sample = train_ds[sample_idx]
        f["calibration"].create_dataset(f"{sample['id']}", data=sample["image"])

for sample in tqdm(test_ds, desc="Test samples"):
    with h5py.File(output_dir / "quantization_samples.h5", "a") as f:
        f["test"].create_dataset(f"{sample['id']}", data=sample["image"])

Quantization samples: 100%|██████████| 250/250 [00:10<00:00, 24.63it/s]
Test sampples: 100%|██████████| 3000/3000 [02:00<00:00, 24.81it/s]


Notice that you can also load the samples from you data directory and preprocess them directly in the docker environment omitting the process we have just shown. However, we recommend the demonstrated approach, as it allows you to do any kind of complex data processing in advance on you host enviroment.

## Saving model weights

If you have been training your model with some 3rd party Python modules or tools it is a good time to try to make it as standalone as possible. Importing custom libraries into the deployment environement may be rather challenging. In the case of our demo model we have used PyTorch Lightning to speed up the training process. We will extract the model weights from the Lightning training checkpoint and save them in a vanilla PyTorch state dict that can be easily loaded inside Vitis AI docker container with pure PyTorch.

In [7]:
MODEL_CHECKPOINT_PATH = Path("../../output/02-train/model.ckpt")

In [8]:
task = SegmentationTask.load_from_checkpoint(MODEL_CHECKPOINT_PATH, model=Unet(num_classes), map_location="cpu")
model = task.model

In [9]:
torch.save(model.state_dict(), output_dir / "state_dict.pt")

Finally, we can double check whether our final model architecture is compatible with the deployment tools. We can print the model summary to obtain (a rather lengthy) list of all elements in the network. You can check whether the model layers are supported by Vitis AI by comparing it against the [documentation][1].

Incompatible layers can be delegated to the CPU, in this case you have to reimplement them on the edge by yourself (presumably in C++ or Python). This is an advanced technique that will also likley lead to performance overhead. It is best to design you network with full Vitis AI acceleration in mind.

[1]: https://docs.amd.com/r/en-US/ug1414-vitis-ai/Operators-Supported-by-PyTorch

In [10]:
demo_input = train_ds[0]["image"]
demo_batch = demo_input.unsqueeze(0)
print(summary(model, demo_batch, depth=10))

Layer (type:depth-idx)                                  Output Shape              Param #
├─GraphModuleImpl: 1-1                                  [[-1, 512, 16, 16]]       --
├─ResNet: 1                                             []                        --
|    └─Conv2d: 2-1                                      [-1, 64, 256, 256]        9,408
├─GraphModuleImpl: 1                                    []                        --
|    └─Conv2d: 2-2                                      [-1, 64, 256, 256]        (recursive)
├─ResNet: 1                                             []                        --
|    └─BatchNorm2d: 2-3                                 [-1, 64, 256, 256]        128
├─GraphModuleImpl: 1                                    []                        --
|    └─BatchNorm2d: 2-4                                 [-1, 64, 256, 256]        (recursive)
├─ResNet: 1                                             []                        --
|    └─ReLU: 2-5                      