# Quantize Open Model Zoo object detection models
Quantizing a model accelerates a trained model by reducing the precision necessary for its calculations.  Acceleration comes from lower-precision calculations being faster as well as less memory needed and less data to transfer since the data type itself is smaller along with the model weights data.  Though lower-precision may reduce model accuracy, typically a model using 32-bit floating-point precision (FP32) can be quantized to use lower-precision 8-bit integers (INT8) giving good results that are worth the trade off between accuracy and speed.  To see how quantization can accelerate models, see [INT8 vs FP32 Comparison on Select Networks and Platforms](https://docs.openvino.ai/latest/openvino_docs_performance_int8_vs_fp32.html#doxid-openvino-docs-performance-int8-vs-fp32) for some benchmarking results.

[Intel Distribution of OpenVINO toolkit](https://software.intel.com/openvino-toolkit) includes the [Post-Training Optimization Tool (POT)](https://docs.openvino.ai/latest/pot_README.html) to automate quantization.  For models available from the [Open Model Zoo](https://github.com/openvinotoolkit/open_model_zoo), the [`omz_quantizer`](../104-model-tools/104-model-tools.ipynb) tool is available to automate running POT using its [DefaultQuantization](https://docs.openvino.ai/latest/pot_compression_algorithms_quantization_default_README.html#doxid-pot-compression-algorithms-quantization-default-r-e-a-d-m-e) 8-bit quantization algorithm to quantize models down to INT8 precision.

This Jupyter* Notebook will go step-by-step through the workflow of downloading either the [ssd_mobilenet_v1_coco](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/ssd_mobilenet_v1_coco) or the [yolo-v4-tf](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/yolo-v4-tf) model from the Open Model Zoo through quantization and then checking and benchmarking the results.  The workflow consists of following these steps:
1. Download and set up the the [Common Objects in Context (COCO)](https://cocodataset.org/) validation dataset to be used by omz_quantize
2. Download model from the Open Model Zoo
3. Convert model to FP32 IR files
4. Quantize FP32 model to create INT8 IR files
5. Run inference on original and quantized model
6. Check accuracy before and after quantization
7. Benchmark before and after quantization

While performing the steps above, the following [OpenVINO tools](../104-model-tools/104-model-tools.ipynb) will be used to download, convert, quantize, check accuracy, and benchmark the model:
- `omz_downloader` - Download model from the Open Model Zoo
- `omz_converter` - Convert an Open Model Zoo model
- `omz_quantizer` - Quantize an Open Model Zoo model
- `accuracy_check` - Check the accuracy of models using a validation dataset
- `benchmark_app` - Benchmark models

## About the models
This notebook is configurable to work with either of the two Open Model Zoo object detection models: ssd_mobilenet_v1_coco (the default) or yolo-v4-tf.

### About the ssd_mobilenet_v1_coco model
The ssd_mobilenet_v1_coco model is a [Single-Shot multi-box Detection (SSD) network](https://arxiv.org/abs/1801.04381) that has been trained on the COCO dataset to perform object detection.  
The input to the converted model is a 300x300 BGR image.  The output of the model is an array of detection information for up to 100 objects giving the:
- image_id: image identifier of the image within the batch
- label: class identifier in the range of 1-91 for each class, plus one for background
- confidence: the prediction probability in the range of 0.0-1.0 for label
- (x_min, y_min): coordinates in normalized format (range 0.0-1.0) of the top-left of the bounding box
- (x_max, y_max): coordinates in normalized format (range 0.0-1.0) of the bottom-right of the bounding box

For details more details on the ssd_mobilenet_v1_coco model, see the Open Model Zoo [model](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/ssd_mobilenet_v1_coco)  and the [paper](https://arxiv.org/abs/1801.04381).

### About the yolo-v4-tf model
The yolo-v4-tf model is a YOLO v4 real-time object detection model that was implemented in a Keras* framework and converted to a TensorFlow* framework.  The model was trained on the [Common Objects in Context (COCO)](https://cocodataset.org/#home) dataset with 80 classes.  The input to the converted model is a 608x608 BGR image.  The output of the model are arrays of detection boxes contained in the three output layers:
- StatefulPartitionedCall/model/conv2d_93/BiasAdd/Add: 76x76 
- StatefulPartitionedCall/model/conv2d_101/BiasAdd/Add: 38x38
- StatefulPartitionedCall/model/conv2d_109/BiasAdd/Add: 19x19

Each output layer contains an NxN array for different sized detection boxes within the original image.  Each detection box contains the following information:
- (x, y) - raw coordinates of box center, must apply [sigmoid function](https://en.wikipedia.org/wiki/Sigmoid_function) to get relative to the cell coordinates
- h, w - raw height and width of box, must apply [exponential function](https://en.wikipedia.org/wiki/Exponential_function) and multiply by corresponding anchors to get absolute height and width values
- box_score - confidence of detection box, must apply [sigmoid function](https://en.wikipedia.org/wiki/Sigmoid_function) to get confidence in 0.0-1.0 range
- class_no[80] - array of probability distribution over the 80 classes in logits format, must apply [sigmoid function](https://en.wikipedia.org/wiki/Sigmoid_function) and multiply by obtained confidence value to get confidence for each class

To reduce the results from the three output layers into distinct objects within the original image, the "intersection over union" (also known as the [Jaccard index](https://en.wikipedia.org/wiki/Jaccard_index)) algorithm is typically used to combine overlapping detection boxes with the same class into a single box containing the detected object.

For details more details on the yolo-v4-tf model, see the Open Model Zoo [model](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/yolo-v4-tf), the paper ["YOLOv4: Optimal Speed and Accuracy of Object Detection"](https://arxiv.org/abs/2004.10934), and the [repository](https://github.com/david8862/keras-YOLOv3-model-set).

## Imports

In [None]:
# necessary imports
import shutil
import sys
import zipfile
from pathlib import Path
from subprocess import PIPE, STDOUT, Popen

import cv2
import matplotlib.pyplot as plt
import numpy as np
from openvino.inference_engine import IECore

sys.path.append("../utils")
import notebook_utils as nbutils

## Settings

By default, this notebook will run using the ssd_mobilenet_v1_coco object detection model.  The `USE_YOLOV4_MODEL` variable may be set to select the model to use as follows:
* `OMZ_MODEL_NAME`: Set to `OMZ_MODEL_NAME_YOLO` to use the yolo-v4-tf model or set to `OMZ_MODEL_NAME_SSD_MOBILNET` to use the ssd_mobilenet_v1_coco model

The variable `USING_YOLOV4_MODEL` is set according to the value of `OMZ_MODEL_NAME` and is used later in the code where the two models require different processing (e.g. post-processing inference results).

By default, this notebook downloads the model, dataset, etc. to subdirectories where this notebook is located.  The following variables may be used to set file locations:
* `OMZ_MODEL_NAME`: Model name as it appears on the Open Model Zoo
* `DATA_DIR`: Directory where dataset will be downloaded and set up
* `MODEL_DIR`: Models will be downloaded into the `intel` and `public` folders in this directory
* `OUTPUT_DIR`: Directory used to store any output and other downloaded files (e.g. configuration files for running accuracy_check)

In [None]:
# base settings
OMZ_MODEL_NAME_YOLO = "yolo-v4-tf"
OMZ_MODEL_NAME_SSD_MOBILNET = "ssd_mobilenet_v1_coco"

OMZ_MODEL_NAME = OMZ_MODEL_NAME_SSD_MOBILNET
# OMZ_MODEL_NAME = OMZ_MODEL_NAME_YOLO

USING_YOLOV4_MODEL = bool(OMZ_MODEL_NAME == OMZ_MODEL_NAME_YOLO)

DATA_DIR = Path("data")
MODEL_DIR = Path("model")
OUTPUT_DIR = Path("output")
DATASET_DIR = DATA_DIR / "coco"

if USING_YOLOV4_MODEL:
    LABELS_PATH = DATASET_DIR / "coco_80cl.txt"
else:
    LABELS_PATH = DATASET_DIR / "coco_91cl_bkgr.txt"

TEST_INPUT_IMAGE = DATASET_DIR / "val2017/000000005477.jpg"  # airplane
# TEST_INPUT_IMAGE = DATASET_DIR / "val2017/000000000285.jpg"  # bear
# TEST_INPUT_IMAGE = DATASET_DIR / "val2017/000000007108.jpg"   # elephants

# different model precisions location
MODEL_PUBLIC_DIR = MODEL_DIR / "public" / OMZ_MODEL_NAME
MODEL_FP32_DIR = MODEL_PUBLIC_DIR / "FP32"
MODEL_FP32INT8_DIR = MODEL_PUBLIC_DIR / "FP32-INT8"

# create directories if they do not already exist
DATA_DIR.mkdir(exist_ok=True)
MODEL_DIR.mkdir(exist_ok=True)
OUTPUT_DIR.mkdir(exist_ok=True)

## Helper functions
The `run_command_line()` helper function is provided to aid filtering the output of some of the commands that will be run.  The two functions, `parse_yolo_region()` and `filter_yolo_detections()` are used during post-processing of the yolo-v4-tf model.

In [None]:
def run_command_line(cmd: str, filter=None):
    """
    runs the given command-line outputting lines as they become available to show progress in ~realtime.
    If a filter is provided, it will be called with each line before printing the result from calling the filter
    :param cmd: String containing complete command-line to run
    :param filter: Optional filter called per-line before printing
    :return: none
    """
    proc = Popen(cmd.split(), stdout=PIPE, stderr=STDOUT, universal_newlines=True)
    while proc.poll() is None:
        line = proc.stdout.readline()
        if filter is not None:
            line = filter(line)
        if line is not None:
            sys.stdout.write("%s" % (line))


def parse_yolo_region(
    output_blob,
    layer_name,
    input_width,
    input_height,
    orig_width,
    orig_height,
    threshold,
):
    """
    Parse the yolo inference output for a layer/region returning a list of detections 
     with coordinates within the original imput image
    :param output_blob: Output blob containing inference layer's results
    :param layer_name: Name of output layer
    :param input_width: Inference input's width
    :param input_height: Inference input's height
    :param orig_width: Original input image's width
    :param orig_height: Original input image's height
    :param threshold: Confidence threshold for determining an object detection
    return: List of all dections for the layer's output_blob
    """
    def sigmoid(x):
        return 1.0 / (1.0 + np.exp(-x))

    # YOLOV4 parameters
    num_channels = 3
    num_coords = 4
    num_classes = 80
    # anchors needed for all the output layers
    anchors = {
        "conv2d_93/BiasAdd": [12.0, 16.0,19.0, 36.0,40.0, 28.0],  # layer conv2d_93/BiasAdd
        "conv2d_101/BiasAdd": [36.0, 75.0, 76.0, 55.0, 72.0, 146.0],  # layer conv2d_101/BiasAdd
        "conv2d_109/BiasAdd": [142.0, 110.0, 192.0, 243.0, 459.0, 401.0],  # layer conv2d_109/BiasAdd
    }
    anchor_key = [key for key in anchors.keys() if key in layer_name]
    layer_anchors = []
    if len(anchor_key) > 0:
        layer_anchors = anchors[anchor_key[0]]
    else:
        return []
    # ------------------------------------------ Extracting layer parameters ---------------------------------------
    objects = []
    bbox_size = num_coords + 1 + num_classes
    output_width = output_blob.shape[2]
    output_height = output_blob.shape[3]
    # ------------------------------------------- Parsing YOLO Region output ---------------------------------------
    for row, col, n in np.ndindex(output_height, output_width, num_channels):
        # Getting raw values for each detection bounding bbox
        bbox = output_blob[0, n * bbox_size : (n + 1) * bbox_size, row, col]
        x, y = sigmoid(bbox[:2])
        width, height = bbox[2:4]
        object_probability = sigmoid(bbox[4])
        class_probabilities = sigmoid(bbox[5:])
        if object_probability < threshold:
            continue
        # Process raw value
        x = (col + x) / output_width
        y = (row + y) / output_height
        # Value for exp is very big number in some cases so following construction is using here
        try:
            width = np.exp(width)
            height = np.exp(height)
        except OverflowError:
            continue
        width = width * layer_anchors[2 * n] / input_width
        height = height * layer_anchors[2 * n + 1] / input_height

        class_id = np.argmax(class_probabilities)
        confidence = class_probabilities[class_id] * object_probability
        if confidence < threshold:
            continue

        # translate coordinates within original image
        xmin = max(int((x - width / 2) * (orig_width)), 0)
        ymin = max(int((y - height / 2) * (orig_height)), 0)
        xmax = min(int((x + width / 2) * (orig_width)), orig_width)
        ymax = min(int((y + height / 2) * (orig_height)), orig_height)

        objects.append(dict(xmin=xmin, ymin=ymin, xmax=xmax, ymax=ymax,
                            confidence=confidence.item(),
                            class_id=int(class_id.item()),
            )
        )
    return objects


def filter_yolo_detections(detections, iou_threshold):
    """
    Filters the object detections using intersection over union to identify unique detections
    :param detections: List of detections
    :param iou_threshold: Threshold for when performing intersection over union algorithm
    :return: annotated image

    """
    def iou(box_1, box_2):
        width_of_overlap_area = min(box_1["xmax"], box_2["xmax"]) - max(
            box_1["xmin"], box_2["xmin"]
        )
        height_of_overlap_area = min(box_1["ymax"], box_2["ymax"]) - max(
            box_1["ymin"], box_2["ymin"]
        )
        if width_of_overlap_area < 0 or height_of_overlap_area < 0:
            area_of_overlap = 0
        else:
            area_of_overlap = width_of_overlap_area * height_of_overlap_area
        box_1_area = (box_1["ymax"] - box_1["ymin"]) * (box_1["xmax"] - box_1["xmin"])
        box_2_area = (box_2["ymax"] - box_2["ymin"]) * (box_2["xmax"] - box_2["xmin"])
        area_of_union = box_1_area + box_2_area - area_of_overlap
        if area_of_union == 0:
            return 0
        return area_of_overlap / area_of_union

    detections = sorted(detections, key=lambda obj: obj["confidence"], reverse=True)
    for i in range(len(detections)):
        if detections[i]["confidence"] == 0:
            continue
        for j in range(i + 1, len(detections)):
            # We perform IOU only on objects of same class
            if detections[i]["class_id"] != detections[j]["class_id"]:
                continue

            if iou(detections[i], detections[j]) > iou_threshold:
                detections[j]["confidence"] = 0

    return [det for det in detections if det["confidence"] > 0]

## Download and set up the validation dataset
The [Common Objects in Context (COCO)](https://cocodataset.org/#home) dataset will be downloaded to be used by the `omz_quantizer` and `accuracy_check` tools.  The COCO dataset must be set up as described on the Open Model Zoo [dataset.md:COCO](https://github.com/openvinotoolkit/open_model_zoo/blob/master/data/datasets.md#common-objects-in-context-coco) page.

In [None]:
def set_up_coco_dataset(output_dir):
    output_dir.mkdir(exist_ok=True, parents=True)

    # download zip files
    data_zipname = "val2017.zip"
    data_url = f"http://images.cocodataset.org/zips/{data_zipname}"
    data_zippath = nbutils.download_file(data_url, data_zipname, output_dir)

    annotations_zipname = "annotations_trainval2017.zip"
    annotations_url = f"http://images.cocodataset.org/annotations/{annotations_zipname}"
    annotations_zippath = nbutils.download_file(
        annotations_url, annotations_zipname, output_dir
    )

    # unzip zip files
    zip_ref = zipfile.ZipFile(data_zippath, "r")
    zip_ref.extractall(path=output_dir)
    zip_ref.close()

    zip_ref = zipfile.ZipFile(annotations_zippath, "r")
    required_files = ["instances_val2017.json", "person_keypoints_val2017.json"]
    for zip_info in zip_ref.infolist():
        if any(fn in zip_info.filename for fn in required_files):
            with zip_ref.open(zip_info) as zipped_file, open(
                output_dir / Path(zip_info.filename).name, "wb"
            ) as disk_file:
                shutil.copyfileobj(zipped_file, disk_file)
    zip_ref.close()

    # download the class labels
    labels_url = f"https://github.com/openvinotoolkit/open_model_zoo/raw/master/data/dataset_classes/{LABELS_PATH.name}"
    nbutils.download_file(labels_url, LABELS_PATH.name, output_dir)


if not LABELS_PATH.exists():
    set_up_coco_dataset(DATASET_DIR)

## Download model
The OpenVINO tool [`omz_downloader`](../104-model-tools/104-model-tools.ipynb) is used to automatically download files from the Open Model Zoo.

> **NOTE**: If model IR files are available from the Open Model Zoo, then the downloaded models will appear in the `intel` subdirectory.  If no model IR files are available, then the downloaded models will appear in the `public` directory.

In [None]:
!omz_downloader --name $OMZ_MODEL_NAME --output $MODEL_DIR

## Convert model to IR files

The public models from the Open Model Zoo are made available in their native framework file format and must be converted to OpenVINO Intermediate Representation (IR) files before running inference.  The OpenVINO tool [`omz_convert`](../104-model-tools/104-model-tools.ipynb) is used to convert Open Model Zoo models to the IR files necessary to run inference.

> **NOTE**: For models that are downloaded from the Open Model Zoo already as IR files, the converter utility will not do any conversion and will output the message "Skipping <model_name> (no conversions defined)".

In [None]:
!omz_converter --name $OMZ_MODEL_NAME --precisions FP32 --download_dir $MODEL_DIR  --output $MODEL_DIR

## Quantize the model to INT8
For models downloaded from the Open Model Zoo, the [`omz_quantizer`](../104-model-tools/104-model-tools.ipynb) tool is used to quantize the model to a lower precision (e.g. quantize FP32 to INT8 precision).

In [None]:
def filter_omz_quantizer_output(line):
    if (line.startswith("Quantization command") 
            or line.startswith("Moving") 
            or line.startswith("INFO")):
        return line
    return None


cmd = f"omz_quantizer --name {OMZ_MODEL_NAME} --model_dir {MODEL_DIR}  --output {MODEL_DIR}  --dataset_dir {DATASET_DIR} --precisions FP32-INT8"
run_command_line(cmd, filter_omz_quantizer_output)

## Run the model
Now that the model has been quantized, we will run inference using both the original FP32 model and the new INT8 quantized model to see their results.  First we will run the FP32 model.

> **NOTE**: Post-processing the inference results from the [yolo-v4-tf](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/yolo-v4-tf) model requires more than one step to gather detection results from all the output layers and then filter them into individual detections.  The code here performing the necessary steps was derived from the [YOLOV4 code](https://github.com/openvinotoolkit/open_model_zoo/blob/master/demos/common/python/models/yolo.py) used by the Open Model Zoo [object_detection_demo](https://github.com/openvinotoolkit/open_model_zoo/tree/master/demos/object_detection_demo/python)

In [None]:
def annotate_image(image, detections):
    """
    Annotate image with detections by adding a label and box around each detection
    :param image: Input image to add annotations to
    :param detections: List of detections
    :return: none
    """
    # Convert the inference result to a class name using the labels file
    with open(LABELS_PATH) as f:
        labels = [line.rstrip() for line in f]

    # Draw boxes around each detected object with labels
    for det in detections:
        class_id = det["class_id"]
        conf = det["confidence"]
        det_label = labels[class_id]
        color = (
            min(class_id * 12.5, 255),
            min(class_id * 7, 255),
            min(class_id * 5, 255),
        )
        cv2.rectangle(
            image, (det["xmin"], det["ymin"]), (det["xmax"], det["ymax"]), (color), 2
        )
        cv2.putText(
            image,
            "#" + det_label + " " + str(round(conf * 100, 1)) + " %",
            (det["xmin"], det["ymin"] - 7),
            cv2.FONT_HERSHEY_COMPLEX,
            1,
            color,
            2,
        )


def yolo_apply_results_to_image(input_image, results, image):
    """
    Post-process YOLO inference results and apply to the original input image
    :param input_image: Input image
    :param results: Inference results
    :param image: Original image to annotate
    :return: none
    """
    N, C, input_H, input_W = input_image.shape

    orig_H = image.shape[0]
    orig_W = image.shape[1]

    # Parse inference results
    detections = []
    for layer_name, out_blob in results.items():
        detections += parse_yolo_region(
            out_blob, layer_name, input_W, input_H, orig_W, orig_H, 0.5
        )

    filtered = filter_yolo_detections(detections, 0.4)

    # Annotate image with results
    annotate_image(image, filtered)


def ssd_mobilenet_apply_results_to_image(input_image, results, image):
    """
    Post-process SSD MobileNet inference results and apply to the original input image
    :param input_image: Input image
    :param results: Inference results
    :param image: Original image to annotate
    :return: none
    """

    # Process results into detections used for annotating the image
    # inference result layer [1,1,N,7] for N detections with 7 parameters [image_id, label, conf, x_min, y_min, x_max, y_max]
    img_H = image.shape[0]
    img_W = image.shape[1]
    detections = []
    for res in next(iter(results.values()))[0][0]:
        image_id = res[0]
        if image_id >= 0:
            class_id = int(res[1])
            conf = res[2]
            xmin = int(res[3] * img_W)
            ymin = int(res[4] * img_H)
            xmax = int(res[5] * img_W)
            ymax = int(res[6] * img_H)
            detections.append(dict(xmin=xmin, ymin=ymin, xmax=xmax, ymax=ymax,
                                   confidence=conf, class_id=class_id
                )
            )

    # Annotate image with results
    annotate_image(image, detections)


def run_inference(model_base_path, image_path):
    """
    runs inferrence on an image using the given model and then displays the results
    :param model_base_path: String containing path and file name of model excluding the extension (i.e. ".xml")
    :param image_path: String containing full path to the input image
    :return: none
    """
    # Load the model
    ie = IECore()

    # Create the network from the model
    net = ie.read_network(
        model=f"{model_base_path}.xml", weights=f"{model_base_path}.bin"
    )
    exec_net = ie.load_network(network=net, device_name="CPU")

    input_key = next(iter(exec_net.input_info))

    # Load image
    image = nbutils.load_image(image_path)
    # N,C,H,W = batch size, number of channels, height, width
    N, C, H, W = exec_net.input_info[input_key].tensor_desc.dims
    # The network expects images in BGR format, same as OpenCV so just resize
    input_image = cv2.resize(src=image, dsize=(W, H))
    # reshape image to network input shape ([W,H,C]->[B,C,H,W])
    input_image = np.expand_dims(input_image.transpose(2, 0, 1), 0)

    # Run inference, result = [1,1,N,7] for N detections with 7 parameters [image_id, label, conf, x_min, y_min, x_max, y_max]
    results = exec_net.infer(inputs={input_key: input_image})

    # Annotate image with results
    if USING_YOLOV4_MODEL:
        yolo_apply_results_to_image(input_image, results, image)
    else:
        ssd_mobilenet_apply_results_to_image(input_image, results, image)

    # Display annotated image (imshow requires RGB format, so convert BGR->RGB)
    plt.imshow(nbutils.to_rgb(image))


run_inference(f"{MODEL_FP32_DIR}/{OMZ_MODEL_NAME}", str(TEST_INPUT_IMAGE))

Now, we run the INT8 model and can compare the results to the FP32 results above.

In [None]:
run_inference(f"{MODEL_FP32INT8_DIR}/{OMZ_MODEL_NAME}", str(TEST_INPUT_IMAGE))

## Set up to run accuracy_check
We will check the accuracy of the two FP32 and INT8 models using  [OpenVINO's Accuracy Checker Tool](https://docs.openvino.ai/latest/omz_tools_accuracy_checker.html), [`accuracy_check`](../104-model-tools/104-model-tools.ipynb).  For each model, The Open Model Zoo includes the necessary `accuracy-check.yml` configuration and the global [`dataset_definitions.yml`](https://github.com/openvinotoolkit/open_model_zoo/blob/master/data/dataset_definitions.yml) files needed to run the `accuracy_check` tool.

In [None]:
# retrieve files needed by accuracy_check
OMZ_GITHUB_URL = "https://github.com/openvinotoolkit/open_model_zoo/raw/master"
dataset_def_yml = "dataset_definitions.yml"
dataset_def_yml_url = f"{OMZ_GITHUB_URL}/data/{dataset_def_yml}"
model_acheck_yml = "accuracy-check.yml"
model_acheck_yml_url = (
    f"{OMZ_GITHUB_URL}/models/public/{OMZ_MODEL_NAME}/{model_acheck_yml}"
)

model_acheck_yml_path = nbutils.download_file(
    model_acheck_yml_url, model_acheck_yml, OUTPUT_DIR
)

dataset_def_yml_path = nbutils.download_file(
    dataset_def_yml_url, dataset_def_yml, OUTPUT_DIR
)

## Check accuracy of the model before and after quantization
Now we will run `accuracy_check` for both the original FP32 and the new quantized INT8 models to compare accuracies.  First we will check the accuracy of the FP32 model.

> **NOTE**: In this notebook, we run accuracy_check on a subset of the images in the dataset which takes less time.  For a more accurate check, all images should be used which may be done by not specifying the "-ss <number>" command line argument.

> **NOTE**: The higher the percentage reported by `accuracy_check` the better, however most models are not 100% accurate.  For reference on what to expect form the model, the details for [ssd_mobilenet_v1_coco](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/ssd_mobilenet_v1_coco) on the Open Model Zoo include the accuracy of the original trained model.

In [None]:
# set to '-ss <number>' to use only <number> of images, or set '' to use all images
num_subsamples = "-ss 300"

cmd = f"accuracy_check -tf dlsdk -td CPU -s {DATASET_DIR} -d {dataset_def_yml_path} -c {model_acheck_yml_path} -m {MODEL_FP32_DIR} {num_subsamples}"
run_command_line(cmd)

Now, we check the accuracy of the INT8 model and can compare the results to the FP32 results above.

In [None]:
cmd = f"accuracy_check -tf dlsdk -td CPU -s {DATASET_DIR} -d {dataset_def_yml_path} -c {model_acheck_yml_path} -m {MODEL_FP32INT8_DIR} {num_subsamples}"
run_command_line(cmd)

## Benchmark the model before and after quantization
Finally, we will measure the inference performance of the FP32 and INT8 models using  [OpenVINO's Benchmark Tool](https://docs.openvinotoolkit.org/latest/openvino_inference_engine_tools_benchmark_tool_README.html), [`benchmark_app`](../104-model-tools/104-model-tools.ipynb)
  
> **NOTE**: In this notebook, we run benchmark_app for 15 seconds ("-t <time_seconds>" argument) to give a quick indication of performance. For more accurate performance, we recommended running benchmark_app for 60 seconds in a terminal/command prompt after closing other applications.  

In [None]:
def filter_benchmark_output(line):
    if not (line.startswith(r"[") or line.startswith("  ") or len(line.rstrip()) < 1):
        return line
    return None


# time to run benchmark
time_secs = 15

cmd = f"benchmark_app -m {MODEL_FP32_DIR}/{OMZ_MODEL_NAME}.xml -d CPU -api async -t {time_secs}"
run_command_line(cmd, filter_benchmark_output)
print()
cmd = f"benchmark_app -m {MODEL_FP32INT8_DIR}/{OMZ_MODEL_NAME}.xml -d CPU -api async -t {time_secs}"
run_command_line(cmd, filter_benchmark_output)

## Cleanup
Optionally, all the downloaded and generated files may be removed by setting `do_cleanup` to `True`

In [None]:
do_cleanup = False
if do_cleanup:
    shutil.rmtree(DATASET_DIR)
    shutil.rmtree(MODEL_DIR)
    shutil.rmtree(OUTPUT_DIR)