# Quantization of Image Classification Models

This tutorial demostrates how to apply INT8 quantization to Image Classification model using [Post-training Optimization Tool API](../../compression/api/README.md). The Mobilenet V2 model trained on the ImageNet-tiny dataset from Torchvision is used as an example. The code of this tutorial is designed to be extandable to custom model and dataset. It consists of the following steps:
- Install OpenVINO and required tools and packages using PIP manager
- Prepare the model for quantization
- Prepare the ImageNet-tiny dataset
- Define data loading and accuracy validation functionality
- Run optimization pipeline
- Compare accuracy of the original and quantized models
- Compare performance of the original and quantized models

In [None]:
import os
import sys
from pathlib import Path
from zipfile import ZipFile

import numpy as np
import torch
from addict import Dict

sys.path.append('../utils')
from notebook_utils import download_file

# Prepare dataset

Imagenet tiny with total 200 classes will be used. Note that mobilenet was trained on imagenet with 1000 classes and different preprocessing was used, so the results are expected to be lower for float model. 

Imagenet can be downloaded and checked too. 

In [None]:
# Set the data and model directories
DATA_DIR = 'data'
MODEL_DIR = 'model'

os.makedirs(DATA_DIR, exist_ok=True)
os.makedirs(MODEL_DIR, exist_ok=True)

In [None]:
download_path_data = 'http://cs231n.stanford.edu/tiny-imagenet-200.zip'
data_name = 'tiny-imagenet-200.zip'

download_file(download_path_data, directory=DATA_DIR, show_progress=True)
with ZipFile(f'{DATA_DIR}/{data_name}', 'r') as zip_ref:
    zip_ref.extractall(DATA_DIR)

## Prepare the Model
Model preparation stage has the following steps:
- Download PyTorch model from Torchvision repository
- Convert it to ONNX format
- Run OpenVINO Model Optimizer tool to convert ONNX to OpenVINO Intermediate Representation (IR)



In [None]:
import torchvision.models as models

# Export the model to ONNX format
mobilenet_v2 = models.mobilenet_v2(pretrained=True)
dummy_input = torch.randn(1, 3, 224, 224)

onnx_model_path = Path(MODEL_DIR) / 'mobilenet.onnx'
ir_model_xml = onnx_model_path.with_suffix('.xml')
ir_model_bin = onnx_model_path.with_suffix('.bin')

torch.onnx.export(mobilenet_v2, dummy_input, onnx_model_path, verbose=True)

# Run OpenVINO Model Optimization tool to convert ONNX to OpenVINO IR
!mo --framework=onnx --data_type=FP16 --mean_values=[123.675,116.28,103.53] --input_shape=[1,3,224,224] --scale_values=[58.624,57.12,57.375] --reverse_input_channels -m $onnx_model_path  --output_dir $MODEL_DIR

## Define Data Loader
At this step the `DataLoader` interface from POT API is implemented. OpenCV Python is used for data reading and preprocessing.

In [None]:
from cv2 import imread, resize as cv2_resize
from compression.api import Metric, DataLoader


def resize(image, params):
    shape = params['height'], params['width']
    return cv2_resize(image, shape)


def crop(image, params):

    height, width = image.shape[:2]

    dst_height = int(height * params['central_fraction'])
    dst_width = int(width * params['central_fraction'])

    if height < dst_height or width < dst_width:
        resized = np.array([width, height])
        if width < dst_width:
            resized *= dst_width / width
        if height < dst_height:
            resized *= dst_height / height
        image = cv2_resize(image, tuple(np.ceil(resized).astype(int)))

    top_left_y = (height - dst_height) // 2
    top_left_x = (width - dst_width) // 2
    return image[top_left_y:top_left_y + dst_height, top_left_x:top_left_x + dst_width]


PREPROC_FNS = {'resize': resize, 'crop': crop}


# Custom DataLoader class implementation that is required for
# the proper reading of Imagenet images and annotations.
class ImageNetDataLoader(DataLoader):

    def __init__(self, config):
        if not isinstance(config, Dict):
            config = Dict(config)
        super().__init__(config)
        self._annotations, self._img_ids = self._read_img_ids_annotations(config)

    def __len__(self):
        return len(self._img_ids)

    def __getitem__(self, index):
        if index >= len(self):
            raise IndexError

        annotation = (index, self._annotations[self._img_ids[index]])\
            if self._annotations else (index, None)
        return annotation, self._read_image(self._img_ids[index])

    # Methods specific to the current implementation
    @staticmethod
    def _read_img_ids_annotations(dataset):
        """ Parses annotation file or directory with images to collect image names and annotations.
        :param dataset: dataset config
        :returns dictionary with annotations
                 list of image ids
        """
        annotations, annotations_decode = {}, {}
        img_ids = []
        if dataset.annotation_file:
            with open(dataset.annotation_file, encoding='utf-8') as f:
                for line in f:
                    line = line.split("\t")
                    img_id, annotation = line[0], line[1]
                    try:
                        annotation = annotations_decode[annotation]
                    except KeyError:
                        annotations_decode[annotation] = len(annotations_decode)
                        annotation = annotations_decode[annotation]
                      
                    annotations[img_id] = annotation + 1 if dataset.has_background else annotation
                    img_ids.append(img_id)
        else:
            img_ids = sorted(os.listdir(dataset.data_source))

        return annotations, img_ids

    def _read_image(self, index):
        """ Reads images from directory.
        :param index: image index to read
        :return ndarray representation of image batch
        """
        image = imread(os.path.join(self.config.data_source, index))
        image = self._preprocess(image)
        return image.transpose(2, 0, 1)

    def _preprocess(self, image):
        """ Does preprocessing of an image according to the preprocessing config.
        :param image: ndarray image
        :return processed image
        """
        for prep_params in self.config.preprocessing:
            image = PREPROC_FNS[prep_params.type](image, prep_params)
        return image

## Define Accuracy Metric Calculation
At this step the `Metric` interface for accuracy Top-1 metric is implemented. It is used for validating accuracy of quantized model.

In [None]:
# Custom implementation of classification accuracy metric.
class Accuracy(Metric):

    # Required methods
    def __init__(self, top_k=1):
        super().__init__()
        self._top_k = top_k
        self._name = 'accuracy@top{}'.format(self._top_k)
        self._matches = []

    @property
    def value(self):
        """ Returns accuracy metric value for the last model output. """
        return {self._name: self._matches[-1]}

    @property
    def avg_value(self):
        """ Returns accuracy metric value for all model outputs. """
        return {self._name: np.ravel(self._matches).mean()}

    def update(self, output, target):
        """ Updates prediction matches.
        :param output: model output
        :param target: annotations
        """
        if len(output) > 1:
            raise Exception('The accuracy metric cannot be calculated '
                            'for a model with multiple outputs')
        if isinstance(target, dict):
            target = list(target.values())
        predictions = np.argsort(output[0], axis=1)[:, -self._top_k:]
        match = [float(t in predictions[i]) for i, t in enumerate(target)]

        self._matches.append(match)

    def reset(self):
        """ Resets collected matches """
        self._matches = []

    def get_attributes(self):
        """
        Returns a dictionary of metric attributes {metric_name: {attribute_name: value}}.
        Required attributes: 'direction': 'higher-better' or 'higher-worse'
                             'type': metric type
        """
        return {self._name: {'direction': 'higher-better',
                             'type': 'accuracy'}}

## Run Quantization Pipeline and compare the accuracy of the original and quantized models
Here we define a configuration for our quantization pipeline and run it. 

**Note**: we use built-in `IEEngine` implementation of the `Engine` interface from the POT API for model inference. `IEEngine` is built on top of OpenVINO Python* API for inference and provides basic functionality for inference of simple models, ImageNet pre-trained models. If you have a more complicated inference flow for your model/models you should create your own implementation of `Engine` interface, for example by inheriting from `IEEngine` and extending it.

In [None]:
from compression.graph import load_model, save_model
from compression.graph.model_utils import compress_model_weights
from compression.engines.ie_engine import IEEngine
from compression.pipeline.initializer import create_pipeline

model_config = Dict({
    'model_name': 'mobilenetv2',
    'model': ir_model_xml,
    'weights': ir_model_bin
})
engine_config = Dict({
    'device': 'CPU',
    'stat_requests_number': 2,
    'eval_requests_number': 2
})
dataset_config = {
    'data_source': os.path.join(DATA_DIR, 'tiny-imagenet-200/val/images'),
    'annotation_file': os.path.join(DATA_DIR, 'tiny-imagenet-200/val/val_annotations.txt'),
    'has_background': False,
    'preprocessing': [
        {
            'type': 'crop',
            'central_fraction': 0.875
        },
        {
            'type': 'resize',
            'width': 224,
            'height': 224
        }
    ],
}
algorithms = [
    {
        'name': 'DefaultQuantization',
        'params': {
            'target_device': 'CPU',
            'preset': 'performance',
            'stat_subset_size': 300
        }
    }
]

# Steps 1-7: Model optimization
# Step 1: Load the model.
model = load_model(model_config)

# Step 2: Initialize the data loader.
data_loader = ImageNetDataLoader(dataset_config)

# Step 3 (Optional. Required for AccuracyAwareQuantization): Initialize the metric.
metric = Accuracy(top_k=1)

# Step 4: Initialize the engine for metric calculation and statistics collection.
engine = IEEngine(engine_config, data_loader, metric)

# Step 5: Create a pipeline of compression algorithms.
pipeline = create_pipeline(algorithms, engine)

# Step 6: Execute the pipeline.
compressed_model = pipeline.run(model)

# Step 7 (Optional): Compress model weights quantized precision
#                    in order to reduce the size of final .bin file.
compress_model_weights(compressed_model)

# Step 8: Save the compressed model to the desired path.
compressed_model_paths = save_model(model=compressed_model, save_path=MODEL_DIR, model_name="quantized_mobilenet"
)
compressed_model_xml = compressed_model_paths[0]["model"]

# Step 9: Compare accuracy of the original and quantized models.
metric_results = pipeline.evaluate(model)
if metric_results:
    for name, value in metric_results.items():
        print('Accuracy of the original model: {: <27s}: {}'.format(name, value))

metric_results = pipeline.evaluate(compressed_model)
if metric_results:
    for name, value in metric_results.items():
        print('Accuracy of the optimized model: {: <27s}: {}'.format(name, value))

## Compare Performance of the Original and Quantized Models

Finally, we will measure the inference performance of the FP32 and INT8 models. To do this, we use [Benchmark Tool](https://docs.openvinotoolkit.org/latest/openvino_inference_engine_tools_benchmark_tool_README.html) - OpenVINO's inference performance measurement tool.

NOTE: For more accurate performance, we recommended running benchmark_app in a terminal/command prompt after closing other applications. Run benchmark_app -m model.xml -d CPU to benchmark async inference on CPU for one minute. Change CPU to GPU to benchmark on GPU. Run benchmark_app --help to see an overview of all command line options.


In [None]:
# Inference FP16 model (IR)
!benchmark_app -m $ir_model_xml -d CPU -api async

In [None]:
# Inference INT8 model (IR)
!benchmark_app -m $compressed_model_xml -d CPU -api async