# INT8 Quantization by POT in Simplified Mode tutorial

This tutorial shows how to quantize a ResNet20 image classification model, trained on CIFAR10 dataset, using the Simplified Mode of OpenVINO's Post-Training Optimization Tool. 

Simplified mode is designed to make data preparation for the model optimization process easier. The mode is represented by an implementation of Engine interface from the POT API. It allows reading the data from an arbitrary folder specified by the user. Currently, Simplified mode is available only for image data in PNG or JPEG formats, stored in a single folder.

Note: This mode cannot be used with accuracy-aware methods. There is no way to control accuracy after optimization. Nevertheless, this mode can be helpful to estimate performance benefits when using model optimizations.

This tutorial has the following steps:

- Downloading and saving the CIFAR10 dataset
- Preparing the model for quantization
- Compressing the prepared model
- Measuring and comparing the performance of the original and quantized models
- Demonstrating the use of the quantized model for image classification


In [None]:
import os
from pathlib import Path
import warnings

import torch
from torchvision import transforms as T
from torchvision.datasets import CIFAR10

import matplotlib.pyplot as plt
import numpy as np

from openvino.inference_engine import IECore

warnings.filterwarnings("ignore")

# Set the data and model directories
MODEL_DIR = 'model'
CALIB_DIR = 'calib'
CIFAR_DIR = 'cifar'
CALIB_SET_SIZE = 300

os.makedirs(MODEL_DIR, exist_ok=True)
os.makedirs(CALIB_DIR, exist_ok=True)

## Prepare the calibration dataset
To prepare the calibration dataset we need to do the following:
- Download CIFAR10 dataset from Torchvision.datasets repository
- Save the selected number of elements from this dataset as .png images to the separate folder

In [None]:
transform = T.Compose([T.ToTensor()])
dataset = CIFAR10(root=CIFAR_DIR, train=False, transform=transform, download=True)

In [None]:
pil_converter = T.ToPILImage(mode="RGB")

for idx, info in enumerate(dataset):
    im = info[0]
    if idx >= CALIB_SET_SIZE:
        break
    label = info[1]
    pil_converter(im.squeeze(0)).save(Path(CALIB_DIR) / f'{label}_{idx}.png')

## Prepare the Model
Model preparation stage includes the following steps:,
- Download PyTorch model from Torchvision repository,
- Convert it to ONNX format,
- Run OpenVINO Model Optimizer tool to convert ONNX to OpenVINO Intermediate Representation (IR)

In [None]:
model = torch.hub.load("chenyaofo/pytorch-cifar-models", "cifar10_resnet20", pretrained=True)
dummy_input = torch.randn(1, 3, 32, 32)

onnx_model_path = Path(MODEL_DIR) / 'resnet20.onnx'
ir_model_xml = onnx_model_path.with_suffix('.xml')
ir_model_bin = onnx_model_path.with_suffix('.bin')

torch.onnx.export(model, dummy_input, onnx_model_path)

Now we convert this model into the OpenVINO IR using the Model Optimizer:

In [None]:
!mo --framework=onnx --data_type=FP32 --input_shape=[1,3,32,32] -m $onnx_model_path  --output_dir $MODEL_DIR

## Compression stage
Model compression can be performed by calling the following command:
  
`pot -q default -m <path_to_xml> -w <path_to_bin> --engine simplified --data-source <path_to_data>`

In [None]:
!pot -q default -m $ir_model_xml -w $ir_model_bin --engine simplified --data-source $CALIB_DIR --output-dir compressed --direct-dump 

## Compare Performance of the Original and Quantized Models

Finally, we will measure the inference performance of the FP32 and INT8 models. To do this, we use [Benchmark Tool](https://docs.openvinotoolkit.org/latest/openvino_inference_engine_tools_benchmark_tool_README.html) - OpenVINO's inference performance measurement tool.

NOTE: For more accurate performance, we recommended running benchmark_app in a terminal/command prompt after closing other applications. Run benchmark_app -m model.xml -d CPU to benchmark async inference on CPU for one minute. Change CPU to GPU to benchmark on GPU. Run benchmark_app --help to see an overview of all command line options.

In [None]:
optimized_model_path = Path('compressed/optimized')
optimized_model_xml = optimized_model_path / 'resnet20.xml'
optimized_model_bin = optimized_model_path / 'resnet20.bin'

In [None]:
# Inference FP32 model (IR)
!benchmark_app -m $ir_model_xml -d CPU -api async

In [None]:
# Inference INT8 model (IR)
!benchmark_app -m $optimized_model_xml -d CPU -api async

## Demonstration of the results

In this section the usage of the comressed model will be demonstrated. For this purpose we will run the optimized model on a selected number of pictures from the CIFAR10 dataset and show the predictions of this model.

In the first step the network is loaded using the IECore:

In [None]:
ie = IECore()

# read and load quantized model
quantized_net = ie.read_network(
    model=optimized_model_xml, weights=optimized_model_bin
)
quantized_net = ie.load_network(network=quantized_net, device_name="CPU")

Then all the pictures and their labels from the dataset are stored in the lists:

In [None]:
# define all possible labels from CIFAR10
labels_names = ["airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck"]
all_pictures = []
all_labels = []

# get all pictures and their labels 
for batch in dataset:
    all_pictures.append(batch[0])
    all_labels.append(batch[1])

In this section the function, which shows the pictures and their labels using the indexes and two lists formed on the previous step, is defined:

In [None]:
def plot_pictures(indexes: list, all_pictures=all_pictures, all_labels=all_labels):
    """Plot pictures with the specified indexes.
    :param indexes: a list of indexes of pictures to be displayed.
    :param all_batches: batches with pictures.
    """
    num_pics = len(indexes)
    f, axarr = plt.subplots(1, num_pics)
    for idx, im_idx in enumerate(indexes):
        assert idx < 10000, 'Cannot get such index, there are only 10000'
        pic = np.rollaxis(all_pictures[im_idx].squeeze().numpy(), 0, 3)
        axarr[idx].imshow(pic)
        axarr[idx].set_title(labels_names[all_labels[im_idx]])

In this section we define the function, which uses optimized model to get the predictions for the selected pictures:

In [None]:
def infer_on_pictures(net, indexes: list, all_pictures=all_pictures):
    """ Inference model on a set of pictures.
    :param net: model on which do inference
    :param indexes: list of indexes 
    """
    predicted_labels = []
    for idx in indexes:
        assert idx < 10000, 'Cannot get such index, there are only 10000'
        result = list(net.infer(inputs={'input.1': all_pictures[idx]}).values())
        result = labels_names[np.argmax(result[0])]
        predicted_labels.append(result)
    return predicted_labels

In [None]:
indexes_to_infer = [0, 1, 2]  # to plot specify indexes

plot_pictures(indexes_to_infer)

results_quanized = infer_on_pictures(quantized_net, indexes_to_infer)

print(f"Labels for picture from quantized model : {results_quanized}.")