# Quantize the Open Model Zoo resnet-50-tf model
Quantizing a model accelerates a trained model by reducing the precision necessary for its calculations.  Acceleration comes from lower-precision calculations being faster as well as less memory needed and less data to transfer since the data type itself is smaller along with the model weights data.  Though lower-precision may reduce model accuracy, typically a model using 32-bit floating-point precision (FP32) can be quantized to use lower-precision 8-bit integers (INT8) giving good results that are worth the trade off between accuracy and speed.  To see how quantization can accelerate models, see [INT8 vs FP32 Comparison on Select Networks and Platforms](https://docs.openvino.ai/latest/openvino_docs_performance_int8_vs_fp32.html#doxid-openvino-docs-performance-int8-vs-fp32) for some benchmarking results.

[Intel Distribution of OpenVINO toolkit](https://software.intel.com/openvino-toolkit) includes the [Post-Training Optimization Tool (POT)](https://docs.openvino.ai/latest/pot_README.html) to automate quantization.  For models available from [Open Model Zoo](https://github.com/openvinotoolkit/open_model_zoo), the [`omz_quantizer`](https://pypi.org/project/openvino-dev/) tool is available to automate running POT using its [DefaultQuantization](https://docs.openvino.ai/latest/pot_compression_algorithms_quantization_default_README.html#doxid-pot-compression-algorithms-quantization-default-r-e-a-d-m-e) 8-bit quantization algorithm to quantize models down to INT8 precision.

This Jupyter* Notebook will go step-by-step through the workflow of downloading the [resnet-50-tf](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/resnet-50-tf) model from the Open Model Zoo through quantization and then checking and benchmarking the results.  The workflow consists of following the steps:
1. Download and set up the the [Imagenette](https://github.com/fastai/imagenette) (subset of [ImageNet](http://www.image-net.org/)) validation dataset to be used by omz_quantize
2. Download model
3. Convert model to FP32 IR files
4. Quantize FP32 model to create INT8 IR files
5. Run inference on original and quantized model
6. Check accuracy before and after quantization
7. Benchmark before and after quantization

While performing the steps above, the following [OpenVINO tools](https://pypi.org/project/openvino-dev/) will be used to download, convert, quantize, check accuracy, and benchmark the model:
- `omz_downloader` - Download model from the Open Model Zoo
- `omz_converter` - Convert an Open Model Zoo model
- `omz_quantizer` - Quantize an Open Model Zoo model
- `accuracy_check` - Check the accuracy of models using a validation dataset
- `benchmark_app` - Benchmark models

## Imports

In [None]:
# necessary imports
import glob
import os
import shutil
import sys
import tarfile
from pathlib import Path
from subprocess import PIPE, STDOUT, Popen

import cv2
import matplotlib.pyplot as plt
import numpy as np
from openvino.inference_engine import IECore

sys.path.append("../utils")
import notebook_utils as nbutils

print("Imports complete.")

## Settings

By default, this notebook downloads the model, dataset, etc. to subdirectories where this notebook is located.  The following variables may be used to set file locations:
* `OMZ_MODEL_NAME`: Model name as it appears on the Open Model Zoo
* `DATA_DIR`: Directory where dataset will be downloaded and set up
* `MODEL_DIR`: Models will be downloaded into the `intel` and `public` folders in this directory
* `OUTPUT_DIR`: Directory used to store any output and other downloaded files (e.g. configuration files for running accuracy_check)

In [None]:
# base settings
OMZ_MODEL_NAME = "resnet-50-tf"
DATA_DIR = Path("data")
MODEL_DIR = Path("model")
OUTPUT_DIR = Path("output")
DATASET_DIR = DATA_DIR / "imagenette"
LABELS_PATH = DATASET_DIR / "imagenet_2012.txt"

# different model precisions location
MODEL_PUBLIC_DIR = MODEL_DIR / "public" / OMZ_MODEL_NAME
MODEL_FP32_DIR = MODEL_PUBLIC_DIR / "FP32"
MODEL_FP32INT8_DIR = MODEL_PUBLIC_DIR / "FP32-INT8"

# create directories if they do not already exist
DATA_DIR.mkdir(exist_ok=True)
MODEL_DIR.mkdir(exist_ok=True)
OUTPUT_DIR.mkdir(exist_ok=True)

print(f"OMZ_MODEL_NAME={OMZ_MODEL_NAME}")
print(f"MODEL_PUBLIC_DIR={MODEL_PUBLIC_DIR}")
print(f"DATASET_DIR={DATASET_DIR}")
print(f"OUTPUT_DIR={OUTPUT_DIR}")
print(f"LABELS_PATH={LABELS_PATH}")

## Helper functions
The `run_command_line()` helper function is provided to aid filtering the output of some of the commands that will be run.

In [None]:
def run_command_line(cmd: str, filter=None):
    """
    runs the given command-line outputting lines as they become available to show progress in ~realtime.
    If a filter is provided, it will be called with each line before printing the result from calling the filter
    :param cmd: String containing complete command-line to run
    :param filter: Optional filter called per-line before printing
    :return: none
    """
    print(f"running command: {cmd}")
    proc = Popen(cmd.split(), stdout=PIPE, stderr=STDOUT, universal_newlines=True)
    while proc.poll() is None:
        line = proc.stdout.readline()
        if filter is not None:
            line = filter(line)
        if line is not None:
            sys.stdout.write("%s" % (line))


print("helper functions defined.")

## Download and set up the validation dataset
Instead of using the very large [ImageNet](http://www.image-net.org/) dataset, the smaller [Imagenette 320px](https://github.com/fastai/imagenette) dataset containing 10 classes with lower-resolution images will be used by this notebook.  The Imagenette dataset will be downloaded and arranged to look just like ImageNet so that it can be used by the `omz_quantizer` and `accuracy_check` tools.  Any ImageNet (or subset of ImageNet) dataset may be used when following the steps in the notebook, however all must be set up as described on the Open Model Zoo [dataset.md:ImageNet](https://github.com/openvinotoolkit/open_model_zoo/blob/master/data/datasets.md#imagenet) page.

In [None]:
def set_up_imagenette_dataset(output_dir):
    output_dir.mkdir(exist_ok=True, parents=True)

    img_val_path = output_dir / "ILSVRC2012_img_val"
    img_val_path.mkdir(exist_ok=True, parents=True)
    img_val_ann_path = output_dir / "val.txt"

    print("Downloading dataset...")
    data_tgzname = "imagenette2-320.tgz"
    data_url = f"https://s3.amazonaws.com/fast-ai-imageclas/{data_tgzname}"
    data_tgzpath = nbutils.download_file(data_url, data_tgzname, output_dir)
    print("Done.")

    # uncompress files
    print(f"Extracting dataset from {data_tgzpath.relative_to(Path.cwd())}...")
    tar_ref = tarfile.open(data_tgzpath, "r:gz")
    tar_ref.extractall(path=output_dir)
    tar_ref.close()
    print("Done.")

    # download the class labels
    print("Downloading class labels...")
    labels_url = f"https://github.com/openvinotoolkit/open_model_zoo/raw/master/data/dataset_classes/{LABELS_PATH.name}"
    nbutils.download_file(labels_url, LABELS_PATH.name, output_dir)
    print("Done.")

    # load labels for lookup
    with open(LABELS_PATH) as labels_file:
        labels = [line.rstrip() for line in labels_file]

    # move image files and generate annotation file
    print("Moving image files and creating annotation file...")
    dir_dict = {}
    winid_map = {}
    val_path = output_dir / data_tgzpath.stem / "val"
    for root, dirs, files in os.walk(val_path):
        # match each winid directory
        if Path(root).name != "val":
            file_list = [Path(root) / fname for fname in files]
            winid = Path(root).name
            dir_dict[winid] = file_list
            label_idx = [i for i, item in enumerate(labels) if item.startswith(winid)][
                0
            ]
            winid_map[winid] = label_idx
            print(
                f"dir {root}, # files={len(files)}, label_idx={label_idx}, {labels[label_idx]}"
            )

    # simple shuffle and move image files prefixed with "val_n_" and create annotations file
    pop_count = len(dir_dict)
    total_files = 0
    ann_file = open(img_val_ann_path, "w")
    while pop_count > 0:
        pop_count = 0
        for winid in dir_dict:
            if len(dir_dict[winid]) > 0:
                src_path = dir_dict[winid].pop()
                dst_path = img_val_path / f"val_{total_files:08}_{src_path.stem}.JPEG"
                # print(f"winid={winid}, moving {src_path} to {dst_path}")
                shutil.move(src_path, dst_path)
                ann_file.write(f"{dst_path.name} {winid_map[winid]}\n")
                pop_count += 1
                total_files += 1

    ann_file.close()
    print(f"moved {total_files} files")
    print("Done.")
    print("set up of imagenette dataset is complete.")


if not LABELS_PATH.exists():
    set_up_imagenette_dataset(DATASET_DIR)
else:
    print(f"{LABELS_PATH} exists, skipping setting up dataset")

## Download model
The OpenVINO tool [`omz_downloader`](https://pypi.org/project/openvino-dev/) is used to automatically download files from the Open Model Zoo.  To see the complete list of available models, the following command is used:
```bash
omz_downloader --print_all
```

The format of the command for `omz_downloader` to download a model is:
```bash
omz_downloader --name <model_name> --output <path_to_downloaded_models_dir>
```
The input arguments are as follows:
- **--name** : The name of the model to download. **Note**: The name must be one of the model names listed from running the command: `omz_downloader --print_all`
- **--output** : The top directory where models will be stored after they are downloaded 

> **NOTE**: If model IR files are available from the Open Model Zoo, then the downloaded models will appear in the `intel` subdirectory.  If no model IR files are available, then the downloaded models will appear in the `public` directory.

In [None]:
!omz_downloader --name $OMZ_MODEL_NAME --output $MODEL_DIR

print("All files that were downloaded:")
print(*glob.glob(f"{MODEL_DIR}/**", recursive=True), sep="\n")

## Convert model to IR files

The public models from the Open Model Zoo are made available in their native framework file format and must be converted to OpenVINO Intermediate Representation (IR) files before running inference.  The OpenVINO tool [`omz_convert`](https://pypi.org/project/openvino-dev/) is used to convert Open Model Zoo models to the IR files necessary to run inference.

The format to run the `omz_converter` command is:
```bash
omz_converter --name <model_name> --precisions <precision1,precision2,...> 
    --download_dir <path_to_downloaded_models_dir> --output <path_to_output_models_dir>
```
The input arguments are as follows:
- **--name** : The name of the model to convert. It must be one of the models listed from running the command: `omz_downloader --print_all`
- **--precisions** : The precisions (e.g. "FP16", "FP32", etc.) of the model to create
- **--download_dir** : The top directory where models were originally downloaded by `omz_downloader`  
- **--output** : The top directory where models will be stored after they are converted 


> **NOTE**: For models that are downloaded from the Open Model Zoo already as IR files, the converter utility will not do any conversion and will output the message "Skipping <model_name> (no conversions defined)".

In [None]:
!omz_converter --name $OMZ_MODEL_NAME --precisions FP32 --download_dir $MODEL_DIR  --output $MODEL_DIR

print("All IR files that were created:")
print(*glob.glob(f"{MODEL_FP32_DIR}/**", recursive=True), sep="\n")

## Quantize the model to INT8
For models downloaded from the Open Model Zoo, the [`omz_quantizer`](https://pypi.org/project/openvino-dev/) tool is used to quantize the model to a lower precision (e.g. quantize FP32 to INT8 precision).

The format to run the `omz_quantizer` command is:
```bash
omz_quantizer --name <model_name> --model_dir <path_to_models_dir> --precisions <precision> 
    --dataset_dir <path_to_dataset_dir> --output <path_to_output_models_dir>
```
The input arguments are as follows:
- **--name** : The name of the model to convert. It must be one of the models listed from running the command: `omz_downloader --print_all`
- **--model_dir** : The top directory where models were stored after conversion by `omz_converter`  
- **--output** : The top directory where models will be stored after they are quantized 
- **--dataset_dir** : The path to the dataset top directory
- **--precisions** : The precision (e.g. "FP32-INT8") of the quantized model

In [None]:
!omz_quantizer --name $OMZ_MODEL_NAME --model_dir $MODEL_DIR  --output $MODEL_DIR  --dataset_dir $DATASET_DIR --precisions FP32-INT8

print("All FP32-INT8 IR files that were created:")
print(*glob.glob(f"{MODEL_FP32INT8_DIR}/**", recursive=True), sep="\n")

## Run the model
Now that the model has been quantized, we will run inference using both the original FP32 model and the new INT8 quantized model to see their results.

In [None]:
def run_inference(model_base_path, image_path):
    """
    runs inferrence on an image using the given model and then displays the results
    :param model_base_path: String containing path and file name of model excluding the extension (i.e. ".xml")
    :param image_path: String containing full path to the input image 
    :return: none
    """
    # Load the model
    ie = IECore()

    # create the network from the model
    net = ie.read_network(
        model=f"{model_base_path}.xml", weights=f"{model_base_path}.bin"
    )
    exec_net = ie.load_network(network=net, device_name="CPU")

    input_key = next(iter(exec_net.input_info))
    output_key = next(iter(exec_net.outputs.keys()))
    print(f"model loaded from: {model_base_path}/{net.name}")

    # Load image
    image = nbutils.load_image(image_path)
    # N,C,H,W = batch size, number of channels, height, width
    N, C, H, W = exec_net.input_info[input_key].tensor_desc.dims
    # The network expects images in BGR format, same as OpenCV so just resize
    input_image = cv2.resize(src=image, dsize=(W, H))
    # reshape image to network input shape ([W,H,C]->[B,C,H,W])
    input_image = np.expand_dims(input_image.transpose(2, 0, 1), 0)
    # display original image (imshow requires RGB format, so convert BGR->RGB)
    plt.imshow(nbutils.to_rgb(image))

    # Run inference, result = [1,1001] with confidence level for each of the 1000 
    #  classes and +1 for background.  The class with the highest confidence is 
    #  used to output the final result.
    result = exec_net.infer(inputs={input_key: input_image})[output_key][0]
    label_id = np.argmax(result)
    conf = round(result[label_id] * 100, 2)
    print(f"label_id={label_id}, conf={conf} %")

    # Convert the inference result to a class name using the labels file
    with open(LABELS_PATH) as f:
        labels = [line.rstrip() for line in f]

    print(f"Image contains a '{labels[label_id]}', with {conf}% confidence")


# find known image
files = glob.glob(f"{DATASET_DIR}/**/*n02102040_2051.JPEG", recursive=True)
test_input_image = files[0]

run_inference(f"{MODEL_FP32_DIR}/{OMZ_MODEL_NAME}", test_input_image)

In [None]:
run_inference(f"{MODEL_FP32INT8_DIR}/{OMZ_MODEL_NAME}", test_input_image)

## Set up to run accuracy_check
We will check the accuracy of the two FP32 and INT8 models using  [OpenVINO's Accuracy Checker Tool](https://docs.openvino.ai/latest/omz_tools_accuracy_checker.html), [`accuracy_check`](https://pypi.org/project/openvino-dev/).  For each model, The Open Model Zoo includes the necessary `accuracy-check.yml` configuration and the global [`dataset_definitions.yml`](https://github.com/openvinotoolkit/open_model_zoo/blob/master/data/dataset_definitions.yml) files needed to run the `accuracy_check` tool.

In [None]:
# retrieve files needed by accuracy_check
OMZ_GITHUB_URL = "https://github.com/openvinotoolkit/open_model_zoo/raw/master"
dataset_def_yml = "dataset_definitions.yml"
dataset_def_yml_url = f"{OMZ_GITHUB_URL}/data/{dataset_def_yml}"
model_acheck_yml = "accuracy-check.yml"
model_acheck_yml_url = (
    f"{OMZ_GITHUB_URL}/models/public/{OMZ_MODEL_NAME}/{model_acheck_yml}"
)

print(f"Downloading {OMZ_MODEL_NAME}/accuracy-check.yml ...")
model_acheck_yml_path = nbutils.download_file(
    model_acheck_yml_url, model_acheck_yml, OUTPUT_DIR
)
print("Done.\n")

print(f"Downloading {dataset_def_yml} ...")
dataset_def_yml_path = nbutils.download_file(
    dataset_def_yml_url, dataset_def_yml, OUTPUT_DIR
)
print("Done.")
print("set up to run accuracy_check is complete.")

## Check accuracy of the model before and after quantization
Now we will run `accuracy_check` for both the original FP32 and the new quantized INT8 models to compare accuracies.

The format to run the `accuracy_check` command is:
```bash
accuracy_check -tf <framework> -td <device> -s <path_to_dataset> -d <path_to_dataset_definitions_yml} 
    -c {path_to_model_configuration_yml} -m <path_to_model_ir_files> -ss <number_of_subsamples>
```

The input arguments are as follows:
- **-tf** : The name of the framework (`dlsdk` refers to use OpenVINO) in the model's configuration .yml
- **-td** : The device ("CPU", "GPU", etc.) to use when running inference
- **-s** : The path to the dataset files
- **-d** : The path to the dataset definitions .yml file
- **-c** : The path to the model's configuration .yml file
- **-m** : The path to the model's IR files (directory holding `<model>.xml` and `<model>.bin`)
- **-ss** : The number of images to use from the dataset.  Default is all of them.

> **NOTE**: In this notebook, we run accuracy_check on a subset of the images in the dataset which takes less time.  For a more accurate check, all images should be used which may be done by not specifying the "-ss <number>" command line argument.

> **NOTE**: The higher the percentage reported by `accuracy_check` the better, however most models are not 100% accurate.  For reference on what to expect form the model, the details for [resnet-50-tf](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/resnet-50-tf) on the Open Model Zoo include the accuracy of the original trained model.

In [None]:
# set to '-ss <number>' to use only <number> of images, or set '' to use all images
num_subsamples = "-ss 300"

print(f"Checking accuracy of FP32 model {MODEL_FP32_DIR} ...")
cmd = f"accuracy_check -tf dlsdk -td CPU -s {DATASET_DIR} -d {dataset_def_yml_path} -c {model_acheck_yml_path} -m {MODEL_FP32_DIR} {num_subsamples}"
run_command_line(cmd)
print("Done.\n")

In [None]:
print(f"Checking accuracy of FP32-INT8 model {MODEL_FP32INT8_DIR} ...")
cmd = f"accuracy_check -tf dlsdk -td CPU -s {DATASET_DIR} -d {dataset_def_yml_path} -c {model_acheck_yml_path} -m {MODEL_FP32INT8_DIR} {num_subsamples}"
run_command_line(cmd)
print("Done.")

## Benchmark the model before and after quantization
Finally, we will measure the inference performance of the FP32 and INT8 models using  [OpenVINO's Benchmark Tool](https://docs.openvinotoolkit.org/latest/openvino_inference_engine_tools_benchmark_tool_README.html), [`benchmark_app`](https://pypi.org/project/openvino-dev/)

The format to run the `benchmark_app` command is:
```bash
benchmark_app -m <path_to_model_xml_file> -d <device> -api <api_mode> -t <time_in_seconds>
```
The input arguments are as follows:
- **-m** : The path to the model's `<model>.xml` IR file (`<model>.bin` will also be read)
- **-d** : The device ("CPU", "GPU", etc.) to run during benchmarking
- **-api** : The API mode to use when running inference: [`async`](https://docs.openvino.ai/latest/openvino_inference_engine_tools_benchmark_tool_README.html#asynchronous-api) (default) for asynchronous throughput-oriented measurement or [`sync`](https://docs.openvino.ai/latest/openvino_inference_engine_tools_benchmark_tool_README.html#synchronous-api) for synchronous (latency-oriented) measurement
- **-t** : The number of seconds to run benchmarking.  Default is 60 seconds.
    
> **NOTE**: In this notebook, we run benchmark_app for 15 seconds to give a quick indication of performance. For more accurate performance, we recommended running benchmark_app for 60 seconds in a terminal/command prompt after closing other applications.  

In [None]:
def filter_benchmark_output(line):
    if not (line.startswith(r"[") or line.startswith("  ") or len(line.rstrip()) < 1):
        return line
    return None


# time to run benchmark
time_secs = 15

print(f"Benchmarking FP32 model {MODEL_FP32_DIR} over {time_secs} seconds ..")
cmd = f"benchmark_app -m {MODEL_FP32_DIR}/{OMZ_MODEL_NAME}.xml -d CPU -api async -t {time_secs}"
run_command_line(cmd, filter_benchmark_output)
print("Done.\n")

print(f"Benchmarking FP32-INT8 {MODEL_FP32INT8_DIR} over {time_secs} seconds ..")
cmd = f"benchmark_app -m {MODEL_FP32INT8_DIR}/{OMZ_MODEL_NAME}.xml -d CPU -api async -t {time_secs}"
run_command_line(cmd, filter_benchmark_output)
print("Done.")
print("benchmarking complete.")

## Cleanup
Optionally, all the downloaded and generated files may be removed by setting `doCleanup` to `True`

In [None]:
do_cleanup = False
if do_cleanup:
    shutil.rmtree(DATASET_DIR)
    shutil.rmtree(MODEL_DIR)
    shutil.rmtree(OUTPUT_DIR)