# Quantize the Open Model Zoo resnet-50-tf model
Quantizing a model accelerates a trained model by reducing the precision necessary for its calculations.  Acceleration comes from lower-precision calculations being faster as well as less memory needed and less data to transfer since the data type itself is smaller along with the model weights data.  Though lower-precision may reduce model accuracy, typically a model using 32-bit floating-point precision (FP32) can be quantized to use lower-precision 8-bit integers (INT8) giving good results that are worth the trade off between accuracy and speed.  To see how quantization can accelerate models, see [INT8 vs FP32 Comparison on Select Networks and Platforms](https://docs.openvino.ai/latest/openvino_docs_performance_int8_vs_fp32.html#doxid-openvino-docs-performance-int8-vs-fp32) for some benchmarking results.

[Intel Distribution of OpenVINO toolkit](https://software.intel.com/openvino-toolkit) includes the [Post-Training Optimization Tool (POT)](https://docs.openvino.ai/latest/pot_README.html) to automate quantization.  For models available from the [Open Model Zoo](https://github.com/openvinotoolkit/open_model_zoo), the [`omz_quantizer`](../104-model-tools/104-model-tools.ipynb) tool is available to automate running POT using its [DefaultQuantization](https://docs.openvino.ai/latest/pot_compression_algorithms_quantization_default_README.html#doxid-pot-compression-algorithms-quantization-default-r-e-a-d-m-e) 8-bit quantization algorithm to quantize models down to INT8 precision.

This Jupyter* Notebook will go step-by-step through the workflow of downloading the [resnet-50-tf](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/resnet-50-tf) model from the Open Model Zoo through quantization and then checking and benchmarking the results.  The workflow consists of following the steps:
1. Download and set up the the [Imagenette](https://github.com/fastai/imagenette) (subset of [ImageNet](http://www.image-net.org/)) validation dataset to be used by omz_quantize
2. Download model from the Open Model Zoo
3. Convert model to FP32 IR files
4. Quantize FP32 model to create INT8 IR files
5. Run inference on original and quantized model
6. Check accuracy before and after quantization
7. Benchmark before and after quantization

While performing the steps above, the following [OpenVINO tools](../104-model-tools/104-model-tools.ipynb) will be used to download, convert, quantize, check accuracy, and benchmark the model:
- `omz_downloader` - Download model from the Open Model Zoo
- `omz_converter` - Convert an Open Model Zoo model
- `omz_quantizer` - Quantize an Open Model Zoo model
- `accuracy_check` - Check the accuracy of models using a validation dataset
- `benchmark_app` - Benchmark models

## About the model
This notebook uses the resnet-50-tf model which is a TensorFlow* implementation of ResNet-50, an image classification model that has been trained on the ImageNet dataset. The input to the converted model is a 224x224 BGR image.  The output of the model is 1001 prediction probabilities in the range of 0.0-1.0 for each of the 1000 classes, plus one for background.

For details more details on the resnet-50-tf model, see the Open Model Zoo [model](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/resnet-50-tf), the [paper](https://arxiv.org/abs/1512.03385) and the [repository](https://github.com/tensorflow/models/tree/v2.2.0/official/r1/resnet).

## Imports

In [None]:
# necessary imports
import glob
import os
import shutil
import sys
import tarfile
from pathlib import Path
from subprocess import PIPE, STDOUT, Popen

import cv2
import matplotlib.pyplot as plt
import numpy as np
from openvino.inference_engine import IECore

sys.path.append("../utils")
import notebook_utils as nbutils

## Settings

By default, this notebook downloads the model, dataset, etc. to subdirectories where this notebook is located.  The following variables may be used to set file locations:
* `OMZ_MODEL_NAME`: Model name as it appears on the Open Model Zoo
* `DATA_DIR`: Directory where dataset will be downloaded and set up
* `MODEL_DIR`: Models will be downloaded into the `intel` and `public` folders in this directory
* `OUTPUT_DIR`: Directory used to store any output and other downloaded files (e.g. configuration files for running accuracy_check)

In [None]:
# base settings
OMZ_MODEL_NAME = "resnet-50-tf"
DATA_DIR = Path("data")
MODEL_DIR = Path("model")
OUTPUT_DIR = Path("output")
DATASET_DIR = DATA_DIR / "imagenette"
LABELS_PATH = DATASET_DIR / "imagenet_2012.txt"

# different model precisions location
MODEL_PUBLIC_DIR = MODEL_DIR / "public" / OMZ_MODEL_NAME
MODEL_FP32_DIR = MODEL_PUBLIC_DIR / "FP32"
MODEL_FP32INT8_DIR = MODEL_PUBLIC_DIR / "FP32-INT8"

# create directories if they do not already exist
DATA_DIR.mkdir(exist_ok=True)
MODEL_DIR.mkdir(exist_ok=True)
OUTPUT_DIR.mkdir(exist_ok=True)

## Helper functions
The `run_command_line()` helper function is provided to aid filtering the output of some of the commands that will be run.

In [None]:
def run_command_line(cmd: str, filter=None):
    """
    runs the given command-line outputting lines as they become available to show progress in ~realtime.
    If a filter is provided, it will be called with each line before printing the result from calling the filter
    :param cmd: String containing complete command-line to run
    :param filter: Optional filter called per-line before printing
    :return: none
    """
    proc = Popen(cmd.split(), stdout=PIPE, stderr=STDOUT, universal_newlines=True)
    while proc.poll() is None:
        line = proc.stdout.readline()
        if filter is not None:
            line = filter(line)
        if line is not None:
            sys.stdout.write("%s" % (line))

## Download and set up the validation dataset
Instead of using the very large [ImageNet](http://www.image-net.org/) dataset, the smaller [Imagenette 320px](https://github.com/fastai/imagenette) dataset containing 10 classes with lower-resolution images will be used by this notebook.  The Imagenette dataset will be downloaded and arranged to look just like ImageNet so that it can be used by the `omz_quantizer` and `accuracy_check` tools.  Any ImageNet (or subset of ImageNet) dataset may be used when following the steps in the notebook, however all must be set up as described on the Open Model Zoo [dataset.md:ImageNet](https://github.com/openvinotoolkit/open_model_zoo/blob/master/data/datasets.md#imagenet) page.

In [None]:
def set_up_imagenette_dataset(output_dir):
    output_dir.mkdir(exist_ok=True, parents=True)

    img_val_path = output_dir / "ILSVRC2012_img_val"
    img_val_path.mkdir(exist_ok=True, parents=True)
    img_val_ann_path = output_dir / "val.txt"

    data_tgzname = "imagenette2-320.tgz"
    data_url = f"https://s3.amazonaws.com/fast-ai-imageclas/{data_tgzname}"
    data_tgzpath = nbutils.download_file(data_url, data_tgzname, output_dir)

    # uncompress files
    tar_ref = tarfile.open(data_tgzpath, "r:gz")
    tar_ref.extractall(path=output_dir)
    tar_ref.close()

    # download the class labels
    labels_url = f"https://github.com/openvinotoolkit/open_model_zoo/raw/master/data/dataset_classes/{LABELS_PATH.name}"
    nbutils.download_file(labels_url, LABELS_PATH.name, output_dir)

    # load labels for lookup
    with open(LABELS_PATH) as labels_file:
        labels = [line.rstrip() for line in labels_file]

    # move image files and generate annotation file
    dir_dict = {}
    winid_map = {}
    val_path = output_dir / data_tgzpath.stem / "val"
    for root, dirs, files in os.walk(val_path):
        # match each winid directory
        if Path(root).name != "val":
            file_list = [Path(root) / fname for fname in files]
            winid = Path(root).name
            dir_dict[winid] = file_list
            label_idx = [i for i, item in enumerate(labels) if item.startswith(winid)][0]
            winid_map[winid] = label_idx

    # simple shuffle and move image files prefixed with "val_n_" and create annotations file
    pop_count = len(dir_dict)
    total_files = 0
    ann_file = open(img_val_ann_path, "w")
    while pop_count > 0:
        pop_count = 0
        for winid in dir_dict:
            if len(dir_dict[winid]) > 0:
                src_path = dir_dict[winid].pop()
                dst_path = img_val_path / f"val_{total_files:08}_{src_path.stem}.JPEG"
                shutil.move(src_path, dst_path)
                ann_file.write(f"{dst_path.name} {winid_map[winid]}\n")
                pop_count += 1
                total_files += 1

    ann_file.close()


if not LABELS_PATH.exists():
    set_up_imagenette_dataset(DATASET_DIR)

## Download model
The OpenVINO tool [`omz_downloader`](../104-model-tools/104-model-tools.ipynb) is used to automatically download files from the Open Model Zoo.

> **NOTE**: If model IR files are available from the Open Model Zoo, then the downloaded models will appear in the `intel` subdirectory.  If no model IR files are available, then the downloaded models will appear in the `public` directory.

In [None]:
!omz_downloader --name $OMZ_MODEL_NAME --output $MODEL_DIR

## Convert model to IR files

The public models from the Open Model Zoo are made available in their native framework file format and must be converted to OpenVINO Intermediate Representation (IR) files before running inference.  The OpenVINO tool [`omz_convert`](../104-model-tools/104-model-tools.ipynb) is used to convert Open Model Zoo models to the IR files necessary to run inference.

> **NOTE**: For models that are downloaded from the Open Model Zoo already as IR files, the converter utility will not do any conversion and will output the message "Skipping <model_name> (no conversions defined)".

In [None]:
!omz_converter --name $OMZ_MODEL_NAME --precisions FP32 --download_dir $MODEL_DIR  --output $MODEL_DIR

## Quantize the model to INT8
For models downloaded from the Open Model Zoo, the [`omz_quantizer`](../104-model-tools/104-model-tools.ipynb) tool is used to quantize the model to a lower precision (e.g. quantize FP32 to INT8 precision).

In [None]:
def filter_omz_quantizer_output(line):
    if (line.startswith("Quantization command") or line.startswith("Moving") or line.startswith("INFO")):
        return line
    return None


cmd = f"omz_quantizer --name {OMZ_MODEL_NAME} --model_dir {MODEL_DIR}  --output {MODEL_DIR}  --dataset_dir {DATASET_DIR} --precisions FP32-INT8"
run_command_line(cmd, filter_omz_quantizer_output)

## Run the model
Now that the model has been quantized, we will run inference using both the original FP32 model and the new INT8 quantized model to see their results.  First we will run the FP32 model.

In [None]:
def run_inference(model_base_path, image_path):
    """
    runs inferrence on an image using the given model and then displays the results
    :param model_base_path: String containing path and file name of model excluding the extension (i.e. ".xml")
    :param image_path: String containing full path to the input image
    :return: none
    """
    # Load the model
    ie = IECore()

    # create the network from the model
    net = ie.read_network(
        model=f"{model_base_path}.xml", weights=f"{model_base_path}.bin"
    )
    exec_net = ie.load_network(network=net, device_name="CPU")

    input_key = next(iter(exec_net.input_info))
    output_key = next(iter(exec_net.outputs.keys()))

    # Load image
    image = nbutils.load_image(image_path)
    # N,C,H,W = batch size, number of channels, height, width
    N, C, H, W = exec_net.input_info[input_key].tensor_desc.dims
    # The network expects images in BGR format, same as OpenCV so just resize
    input_image = cv2.resize(src=image, dsize=(W, H))
    # reshape image to network input shape ([W,H,C]->[B,C,H,W])
    input_image = np.expand_dims(input_image.transpose(2, 0, 1), 0)
    # display original image (imshow requires RGB format, so convert BGR->RGB)
    plt.imshow(nbutils.to_rgb(image))

    # Run inference, result = [1,1001] with confidence level for each of the 1000 
    #  classes and +1 for background.  The class with the highest confidence is 
    #  used to output the final result.
    result = exec_net.infer(inputs={input_key: input_image})[output_key][0]
    label_id = np.argmax(result)
    conf = round(result[label_id] * 100, 2)
    print(f"label_id={label_id}, conf={conf} %")

    # Convert the inference result to a class name using the labels file
    with open(LABELS_PATH) as f:
        labels = [line.rstrip() for line in f]

    print(f"Image contains a '{labels[label_id]}', with {conf}% confidence")


# find known image
files = glob.glob(f"{DATASET_DIR}/**/*n02102040_2051.JPEG", recursive=True)
test_input_image = files[0]

run_inference(f"{MODEL_FP32_DIR}/{OMZ_MODEL_NAME}", test_input_image)

Now, we run the INT8 model and can compare the results to the FP32 results above.

In [None]:
run_inference(f"{MODEL_FP32INT8_DIR}/{OMZ_MODEL_NAME}", test_input_image)

## Set up to run accuracy_check
We will check the accuracy of the two FP32 and INT8 models using  [OpenVINO's Accuracy Checker Tool](https://docs.openvino.ai/latest/omz_tools_accuracy_checker.html), [`accuracy_check`](../104-model-tools/104-model-tools.ipynb).  For each model, The Open Model Zoo includes the necessary `accuracy-check.yml` configuration and the global [`dataset_definitions.yml`](https://github.com/openvinotoolkit/open_model_zoo/blob/master/data/dataset_definitions.yml) files needed to run the `accuracy_check` tool.

In [None]:
# retrieve files needed by accuracy_check
OMZ_GITHUB_URL = "https://github.com/openvinotoolkit/open_model_zoo/raw/master"
dataset_def_yml = "dataset_definitions.yml"
dataset_def_yml_url = f"{OMZ_GITHUB_URL}/data/{dataset_def_yml}"
model_acheck_yml = "accuracy-check.yml"
model_acheck_yml_url = (
    f"{OMZ_GITHUB_URL}/models/public/{OMZ_MODEL_NAME}/{model_acheck_yml}"
)

model_acheck_yml_path = nbutils.download_file(
    model_acheck_yml_url, model_acheck_yml, OUTPUT_DIR
)

dataset_def_yml_path = nbutils.download_file(
    dataset_def_yml_url, dataset_def_yml, OUTPUT_DIR
)

## Check accuracy of the model before and after quantization
Now we will run `accuracy_check` for both the original FP32 and the new quantized INT8 models to compare accuracies.  First we will check the accuracy of the FP32 model.

> **NOTE**: In this notebook, we run accuracy_check on a subset of the images in the dataset which takes less time.  For a more accurate check, all images should be used which may be done by not specifying the "-ss <number>" command line argument.

> **NOTE**: The higher the percentage reported by `accuracy_check` the better, however most models are not 100% accurate.  For reference on what to expect form the model, the details for [resnet-50-tf](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/resnet-50-tf) on the Open Model Zoo include the accuracy of the original trained model.

In [None]:
# set to '-ss <number>' to use only <number> of images, or set '' to use all images
num_subsamples = "-ss 300"

cmd = f"accuracy_check -tf dlsdk -td CPU -s {DATASET_DIR} -d {dataset_def_yml_path} -c {model_acheck_yml_path} -m {MODEL_FP32_DIR} {num_subsamples}"
run_command_line(cmd)

Now, we check the accuracy of the INT8 model and can compare the results to the FP32 results above.

In [None]:
cmd = f"accuracy_check -tf dlsdk -td CPU -s {DATASET_DIR} -d {dataset_def_yml_path} -c {model_acheck_yml_path} -m {MODEL_FP32INT8_DIR} {num_subsamples}"
run_command_line(cmd)

## Benchmark the model before and after quantization
Finally, we will measure the inference performance of the FP32 and INT8 models using  [OpenVINO's Benchmark Tool](https://docs.openvinotoolkit.org/latest/openvino_inference_engine_tools_benchmark_tool_README.html), [`benchmark_app`](../104-model-tools/104-model-tools.ipynb)
  
> **NOTE**: In this notebook, we run benchmark_app for 15 seconds ("-t <time_seconds>" argument) to give a quick indication of performance. For more accurate performance, we recommended running benchmark_app for 60 seconds in a terminal/command prompt after closing other applications.  

In [None]:
def filter_benchmark_output(line):
    if not (line.startswith(r"[") or line.startswith("  ") or len(line.rstrip()) < 1):
        return line
    return None


# time to run benchmark
time_secs = 15

cmd = f"benchmark_app -m {MODEL_FP32_DIR}/{OMZ_MODEL_NAME}.xml -d CPU -api async -t {time_secs}"
run_command_line(cmd, filter_benchmark_output)
print()
cmd = f"benchmark_app -m {MODEL_FP32INT8_DIR}/{OMZ_MODEL_NAME}.xml -d CPU -api async -t {time_secs}"
run_command_line(cmd, filter_benchmark_output)

## Cleanup
Optionally, all the downloaded and generated files may be removed by setting `do_cleanup` to `True`

In [None]:
do_cleanup = False
if do_cleanup:
    shutil.rmtree(DATASET_DIR)
    shutil.rmtree(MODEL_DIR)
    shutil.rmtree(OUTPUT_DIR)