# Kidney Segmentation with PyTorch Lightning and OpenVINO™ Toolkit.

## **Part 2:** Convert and Quantize Model and Show Live Inference

This tutorial demonstrates training and inference with a kidney segmentation model. For training, [PyTorch Lightning](https://www.pytorchlightning.ai/) is used with a [UNet](https://arxiv.org/abs/1505.04597) segmentation model. The model is converted to OpenVINO IR with [Model Optimizer](https://docs.openvinotoolkit.org/latest/openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html), and quantized with OpenVINO's [Post-Training Optimization Tool](https://docs.openvinotoolkit.org/latest/pot_compression_api_README.html) API. 

Other notebooks in this series:

- [Data Preparation for 2D Segmentation of 3D Medical Data](../210-ct-scan-data-preparation)
- [Live Inference and Benchmark CT-scan data](../210-ct-scan-live-inference/210-ct-scan-live-inference.ipynb)


## Instructions

This notebook needs a trained UNet model that is converted to [ONNX](https://github.com/onnx/onnx) format. We provide a pretrained model trained for 20 epochs with the full [Kits-19](https://github.com/neheller/kits19) frames dataset, which has an F1 score on the validation set of 0.9. The training code will me made availble soon.

## Imports
The Post Training Optimization API is implemented in the `compression` library.

In [None]:
import glob
import os
import random
import sys
import time
import warnings
import zipfile
from pathlib import Path

warnings.filterwarnings("ignore")
import cv2
import matplotlib.pyplot as plt
import numpy as np
from addict import Dict
from async_inference import CTAsyncPipeline, SegModel
from compression.api import Metric
from compression.engines.ie_engine import IEEngine
from compression.graph import load_model, save_model
from compression.graph.model_utils import compress_model_weights
from compression.pipeline.initializer import create_pipeline
from IPython.display import Image
from omz_python.models import model as omz_model
from openvino.inference_engine import IECore
from yaspin import yaspin

sys.path.append("../utils")
from notebook_utils import benchmark_model, download_file

## Settings

To use the pretrained model, set MODEL_DIR to `Path("pretrained_model")` in the cell below. This is the default. To use a model that you trained yourself, adjust the paths. By default, this notebook will quantize one CT scan from the KITS19 dataset. To use the full dataset, set BASEDIR to the path of the dataset.

In [None]:
BASEDIR = Path("kits19_frames_1")
MODEL_DIR = Path("pretrained_model")

onnx_path = MODEL_DIR / "unet44.onnx"
ir_path = onnx_path.with_suffix(".xml")

## Download CT Scan Data

In [None]:
# The CT scan case number. For example: 16 for data from the case_00016 directory
# Currently only 16 is supported
case = 16
if not (BASEDIR / f"case_{case:05d}").exists():
    BASEDIR.mkdir(exist_ok=True)
    filename = download_file(
        f"https://s3.us-west-1.amazonaws.com/openvino.notebooks/case_{case:05d}.zip"
    )
    with zipfile.ZipFile(filename, "r") as zip_ref:
        zip_ref.extractall(path=BASEDIR)
        os.remove(filename)  # remove zipfile
        print(f"Downloaded and extracted data for case_{case:05d}")
else:
    print(f"Data for case_{case:05d} exists")

## Convert Model to OpenVINO IR
Call the Model Optimizer tool to convert the ONNX model to OpenVINO IR, with FP16 precision. The models are saved to the `model` directory. See the [Model Optimizer Developer Guide](https://docs.openvinotoolkit.org/latest/openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html) for more information.

Model Optimization was successful if the last lines of the output include `[ SUCCESS ] Generated IR version 10 model`.

In [None]:
!mo --input_model $onnx_path --output_dir $MODEL_DIR --data_type FP16

## Post-Training Optimization Tool (POT) Quantization
The Post-Training Optimization Tool (POT) `compression` API defines base classes for `Metric` and `DataLoader`. In this notebook, we use a custom Metric and DataLoader that show all the required methods.

### Configuration

#### Metric
Define a metric to determine the performance of the model. For the Default Quantization algorithm that is used in this tutorial, defining a metric is optional. The metric is used to compare the quantized INT8 model with the original FP16 IR model.

A metric for POT inherits from `compression.api.Metric` and should implement all the methods in this example.

For this demo, the [F1 score](https://en.wikipedia.org/wiki/F-score) or Dice coefficient is used.

In [None]:
def sigmoid(x):
    return np.exp(-np.logaddexp(0, -x))


class BinaryF1(Metric):
    """
    Metric to compute F1/Dice score for binary segmentation. F1 is computed as
    (2 * precision * recall) / (precision + recall) where precision is computed as
    the ratio of pixels that were correctly predicted as true and all actual true pixels,
    and recall as the ratio of pixels that were correctly predicted as true and all
    predicted true pixels.

    See https://en.wikipedia.org/wiki/F-score
    """

    # Required methods
    def __init__(self):
        super().__init__()
        self._name = "F1"
        self.y_true = 0
        self.y_pred = 0
        self.correct_true = 0

    @property
    def value(self):
        """Returns metric value for the last model output.
        Possible format: {metric_name: [metric_values_per_image]}
        """
        return {self._name: [0, 0]}

    @property
    def avg_value(self):
        """Returns average metric value for all model outputs.
        Possible format: {metric_name: metric_value}
        """

        recall = self.correct_true / self.y_pred
        precision = self.correct_true / self.y_true

        f1 = (2 * precision * recall) / (precision + recall)
        return {self._name: f1}

    def update(self, output, target):
        """
        :param output: model output
        :param target: annotations for model output
        """

        label = target[0].astype(np.byte)
        prediction = sigmoid(output[0]).round().astype(np.byte)

        self.y_true += np.sum(label)
        self.y_pred += np.sum(prediction)

        correct_true = np.sum(
            (label == prediction).astype(np.byte) * (label == 1).astype(np.byte)
        ).astype(np.float32)

        self.correct_true += correct_true

    def reset(self):
        """Resets metric"""
        self.y_true = 0
        self.y_pred = 0
        self.correct_true = 0

    def get_attributes(self):
        """
        Returns a dictionary of metric attributes {metric_name: {attribute_name: value}}.
        Required attributes: 'direction': 'higher-better' or 'higher-worse'
                             'type': metric type
        """
        return {self._name: {"direction": "higher-better", "type": "F1"}}

#### Data

##### Dataset

The dataset in the next cell is copied from the training notebook. It expects images and masks in the *basedir* directory, in a folder per patient. For more information about the dataset, see the data preparation notebook. This dataset follows POT's `compression.api.DataLoader` interface, which should implement `__init__`, `__getitem__` and `__len__`. It can therefore be used directly for POT.

In [None]:
class KitsDataset(object):
    def __init__(self, basedir: str, dataset_type: str, transforms=None):
        """
        Dataset class for prepared Kits19 data, for binary segmentation (background/kidney).

        :param basedir: Directory that contains the prepared CT scans, in subdirectories
                        case_00000 until case_00210
        :param dataset_type: either "train" or "val"
        :param transforms: Compose object with augmentations
        """
        allmasks = sorted(glob.glob(f"{basedir}/case_*/segmentation_frames/*png"))

        # Reserve 10% of the patients for the validation dataset
        # Set a random seed to ensure that this list is reproducable
        random.seed(2.71828)
        self.valpatients = sorted(random.choices(range(210), k=21))

        valcases = [f"case_{i:05d}" for i in self.valpatients]

        if dataset_type == "train":
            masks = [mask for mask in allmasks if Path(mask).parents[1].name not in valcases]
        elif dataset_type == "val":
            masks = [mask for mask in allmasks if Path(mask).parents[1].name in valcases]
        else:
            raise ValueError("Please choose train or val dataset split")

        random.shuffle(masks)
        self.basedir = basedir
        self.dataset_type = dataset_type
        self.dataset = masks
        self.transforms = transforms
        print(f"Created {dataset_type} dataset with {len(self.dataset)} items.")

    def __getitem__(self, index):
        """
        Get an item from the dataset at the specified index.
        Labels are converted to binary labels (background/kidney).

        :return: (annotation, input_image) where annotation is (index, segmentation_mask)
        """
        mask_path = self.dataset[index]
        mask = cv2.imread(mask_path, cv2.IMREAD_UNCHANGED)
        # The masks contain annotations for kidneys and tumors, in this tutorial we only segment
        # kidneys so we can set all pixels that contain a non-background value to 1.
        mask[mask > 0] = 1

        image_path = str(Path(mask_path.replace("segmentation", "imaging")).with_suffix(".jpg"))
        img = cv2.imread(image_path, cv2.IMREAD_UNCHANGED)

        if img.shape != (512, 512, 3):
            img = cv2.resize(img, (512, 512))
            mask = cv2.resize(mask, (512, 512))

        # TODO: add transforms with torchvision.transforms instead of albumentations
        # if self.transforms is not None:

        annotation = (index, mask.astype(np.uint8))
        input_image = np.expand_dims(img, axis=0).astype(np.float32)
        return annotation, input_image

    def __len__(self):
        return len(self.dataset)

To test that the data loader returns the expected output, we create a DataLoader instance and show an image and a mask. The image and mask are shown as returned by the dataloader, after resizing and preprocessing. Since this dataset contains a lot of slices without kidneys, we select a slice that contains at least 100 kidney pixels to verify that the annotations look correct.

In [None]:
# Create data loader
data_loader = KitsDataset(BASEDIR, "val")

# Find a slice that contains kidney annotations
# item[0] is the annotation: (id, annotation_data)
annotation, image_data = next(item for item in data_loader if np.count_nonzero(item[0][1]) > 100)

# The data loader returns images as floating point data with (C,H,W) layout. Convert to 8-bit
# integer data and transpose to (H,C,W) for visualization
image = image_data.astype(np.uint8).transpose(1, 2, 0)

# The data loader returns annotations as (index, mask) and mask in shape (1,H,W)
# grab only the mask, and remove the channel dimension for visualization
mask = annotation[1].squeeze()

fig, ax = plt.subplots(1, 2, figsize=(12, 6))
ax[0].imshow(image, cmap="gray")
ax[1].imshow(mask, cmap="gray");

#### Quantization Config

POT methods expect configuration dictionaries as arguments, which are defined in the cell below. The variable `ir_path` is defined in the [Settings](#Settings) cell at the top of the notebook. The other variables are defined in the cell above.

See [Post-Training Optimization Best Practices](https://docs.openvino.ai/2021.4/pot_docs_BestPractices.html) for more information on the settings.

In [None]:
# Model config specifies the model name and paths to model .xml and .bin file
model_config = Dict(
    {
        "model_name": f"quantized_{ir_path.stem}",
        "model": ir_path,
        "weights": ir_path.with_suffix(".bin"),
    }
)

# Engine config
engine_config = Dict({"device": "CPU"})

algorithms = [
    {
        "name": "DefaultQuantization",
        "stat_subset_size": 300,
        "params": {
            "target_device": "ANY",
            "preset": "mixed",  # choose between "mixed" and "performance"
        },
    }
]

print(f"model_config: {model_config}")

### Prepare Quantization Pipeline: DataLoader, Model, Metric, Inference Engine

The POT pipeline uses the functions `load_model()`, `IEEngine` and `create_pipeline()`. `load_model()` loads an IR model, specified in `model_config`, `IEEngine` is a POT implementation of Inference Engine, that will be passed to the POT pipeline created by `create_pipeline()`. The POT classes and functions expect a config argument. These configs are created in the Config section. The F1 metric and SegmentationDataLoader are defined earlier in this notebook.

Running the POT quantization pipeline takes just two lines of code. We create the pipeline with the `create_pipeline` function, and then run that pipeline with `pipeline.run()`. To reuse the quantized model later, we compress the model weights and save the compressed model to disk.

In [None]:
# Step 1: create data loader
data_loader = KitsDataset(BASEDIR, "val")

# Step 2: load model
ir_model = load_model(model_config=model_config)

# Step 3: initialize the metric
metric = BinaryF1()

# Step 4: Initialize the engine for metric calculation and statistics collection.
engine = IEEngine(config=engine_config, data_loader=data_loader, metric=metric)

# Step 5: Create a pipeline of compression algorithms.
# quantization_algorithm is defined in the Settings
pipeline = create_pipeline(algorithms, engine)
algorithm_name = pipeline.algo_seq[0].name

# Step 6: Execute the pipeline to quantize the model
with yaspin(text=f"Executing POT pipeline on {model_config['model']} with {algorithm_name}") as sp:
    start_time = time.perf_counter()
    compressed_model = pipeline.run(ir_model)
    end_time = time.perf_counter()
    sp.text = f"Quantization finished in {end_time - start_time:.2f} seconds"
    sp.ok("✔")

# Step 7 (Optional): Compress model weights to quantized precision
#                    in order to reduce the size of the final .bin file.
compress_model_weights(compressed_model)

# Step 8: Save the compressed model to the desired path.
# Set save_path to the directory where the directory
compressed_model_paths = save_model(
    model=compressed_model, save_path="optimized_model", model_name=ir_model.name
)

compressed_model_path = compressed_model_paths[0]["model"]
print("The quantized model is stored at", compressed_model_path)

## Compare Metric of FP16 and INT8 Model

In [None]:
# Compute the F1 score on the quantized model and compare with the F1 score on the FP16 IR model.

ir_model = load_model(model_config=model_config)
evaluation_pipeline = create_pipeline(algo_config=algorithms, engine=engine)

original_metric = None
with yaspin(text="Evaluating original IR model") as sp:
    start_time = time.time()
    original_metric = evaluation_pipeline.evaluate(ir_model)
    stop_time = time.time()
    sp.text = f"Finished evaluating original IR model in {stop_time-start_time:.2f} seconds"
    sp.ok("✔")

with yaspin(text="Evaluating quantized IR model") as sp:
    start_time = time.time()
    quantized_metric = evaluation_pipeline.evaluate(compressed_model)
    stop_time = time.time()
    sp.text = f"Finished evaluating quantized IR model in {stop_time-start_time:.2f} seconds"
    sp.ok("✔")

if quantized_metric:
    for key, value in quantized_metric.items():
        print(f"The {key} score of the quantized INT8 model is {value:.3f}")

if original_metric:
    for key, value in original_metric.items():
        print(f"The {key} score of the original FP16 model is {value:.3f}")

## Compare Performance of the Original and Quantized Models

To measure the inference performance of the FP16 and INT8 models, we use [Benchmark Tool](https://docs.openvinotoolkit.org/latest/openvino_inference_engine_tools_benchmark_tool_README.html), OpenVINO's inference performance measurement tool. Benchmark tool is a command line application that can be run in the notebook with `! benchmark_app` or `%sx benchmark_app`.

In this tutorial, we use a wrapper function from [Notebook Utils](https://github.com/openvinotoolkit/openvino_notebooks/blob/main/notebooks/utils/notebook_utils.ipynb). It prints the `benchmark_app` command with the chosen parameters.

> NOTE: For the most accurate performance estimation, we recommended running `benchmark_app` in a terminal/command prompt after closing other applications. Run `benchmark_app --help` to see all command line options.

In [None]:
# Show the parameters and docstring for `benchmark_model`
benchmark_model?

In [None]:
# By default, benchmark on MULTI:CPU,GPU if a GPU is available, otherwise on CPU.
ie = IECore()
device = "MULTI:CPU,GPU" if "GPU" in ie.available_devices else "CPU"
# Uncomment one of the options below to benchmark on other devices
# device = "GPU"
# device = "CPU"
# device = "AUTO"

In [None]:
# Benchmark FP16 model
benchmark_model(model_path=ir_path, device=device, seconds=15)

In [None]:
# Benchmark INT8 model
benchmark_model(model_path=compressed_model_path, device=device, seconds=15)

## Show Live Inference

To show live inference on the model in the notebook, we use the asynchronous processing feature of OpenVINO Inference Engine.

If you use a GPU device, with `device="GPU"` or `device="MULTI:CPU,GPU"` to do inference on an integrated graphics card, model loading will be slow the first time you run this code. The model will be cached, so after the first time model loading will be fast. See the [OpenVINO API tutorial](../002-openvino-api/002-openvino-api.ipynb) for more information on Inference Engine, including Model Caching.

#### Visualization Functions

We define a helper function `show_array` to efficiently show images in the notebook. The `do_inference` function uses [Open Model Zoo](https://github.com/openvinotoolkit/open_model_zoo/)'s AsyncPipeline to perform asynchronous inference. After inference on the specified CT scan has completed, the total time and throughput (fps), including preprocessing and displaying, will be printed.

In [None]:
def showarray(frame: np.ndarray, display_handle=None):
    """
    Display array `frame`. Replace information at `display_handle` with `frame`
    encoded as jpeg image

    Create a display_handle with: `display_handle = display(display_id=True)`
    """
    _, frame = cv2.imencode(ext=".jpeg", img=frame)
    if display_handle is None:
        display_handle = display(Image(data=frame.tobytes()), display_id=True)
    else:
        display_handle.update(Image(data=frame.tobytes()))
    return display_handle


def do_inference(imagelist: List, model: omz_model.Model, device: str):
    """
    Do inference of images in `imagelist` on `model` on the given `device` and show
    the results in real time in a Jupyter Notebook

    :param imagelist: list of images/frames to do inference on
    :param model: Model instance for inference
    :param device: Name of device to perform inference on. For example: "CPU"
    """
    display_handle = None
    next_frame_id = 0
    next_frame_id_to_show = 0

    input_layer = next(iter(model.net.input_info))

    # Create asynchronous pipeline and print time it takes to load the model
    load_start_time = time.perf_counter()
    pipeline = CTAsyncPipeline(
        ie=ie, model=model, plugin_config={}, device=device, max_num_requests=0
    )
    load_end_time = time.perf_counter()

    # Perform asynchronous inference
    start_time = time.perf_counter()

    while next_frame_id < len(imagelist) - 1:
        results = pipeline.get_result(next_frame_id_to_show)

        if results:
            # Show next result from async pipeline
            result, meta = results
            display_handle = showarray(result, display_handle)

            next_frame_id_to_show += 1

        if pipeline.is_ready():
            # Submit new image to async pipeline
            image = imagelist[next_frame_id]
            pipeline.submit_data(
                inputs={input_layer: image}, id=next_frame_id, meta={"frame": image}
            )
            next_frame_id += 1
        else:
            # If the pipeline is not ready yet and there are no results: wait
            pipeline.await_any()

    pipeline.await_all()

    # Show all frames that are in the pipeline after all images have been submitted
    while pipeline.has_completed_request():
        results = pipeline.get_result(next_frame_id_to_show)
        if results:
            result, meta = results
            display_handle = showarray(result, display_handle)
            next_frame_id_to_show += 1

    end_time = time.perf_counter()
    duration = end_time - start_time
    fps = len(imagelist) / duration
    print(f"Loaded model to {device} in {load_end_time-load_start_time:.2f} seconds.")
    print(f"Total time for {next_frame_id+1} frames: {duration:.2f} seconds, fps:{fps:.2f}")

#### Load Model and Images

Load the segmentation model with `SegModel`, based on the [Open Model Zoo](https://github.com/openvinotoolkit/open_model_zoo/) Model API. Load a  CT scan from the `BASEDIR` directory (by default: _kits19_frames_) to a list.

In [None]:
ie = IECore()
segmentation_model = SegModel(ie=ie, model_path=Path(compressed_model_path))

In [None]:
case = 16
demopattern = f"{BASEDIR}/case_{case:05d}/imaging_frames/*jpg"
imlist = sorted(glob.glob(demopattern))
images = [cv2.imread(im, cv2.IMREAD_UNCHANGED) for im in imlist]

#### Show Inference

In the next cell, we run the `do inference` function, which loads the model to the specified device (using caching for faster model loading on GPU devices), performs inference, and displays the results in real-time.

In [None]:
# Possible options for device include "CPU", "GPU", "AUTO", "MULTI"
device = "MULTI:CPU,GPU" if "GPU" in ie.available_devices else "CPU"
do_inference(imagelist=images, model=segmentation_model, device=device)

#### Visualization Functions

We define a helper function `show_array` to efficiently show images in the notebook. The `do_inference` function uses [Open Model Zoo](https://github.com/openvinotoolkit/open_model_zoo/)'s AsyncPipeline to perform asynchronous inference. After inference on the specified CT scan has completed, the total time and throughput (fps), including preprocessing and displaying, will be printed.

#### Show Inference

In the next cell, we run the `do inference` function, which loads the model to the specified device (using caching for faster model loading on GPU devices), performs inference, and displays the results in real-time.

In [None]:
# Possible options for device include "CPU", "GPU", "AUTO", "MULTI"
device = "MULTI:CPU,GPU" if "GPU" in ie.available_devices else "CPU"
do_inference(imagelist=images, model=segmentation_model, device=device)