# Convert and Optimize YOLOv7 with OpenVINO™

The YOLOv7 algorithm is making big waves in the computer vision and machine learning communities.
It is real-time object detection algorithm that performs image recognition tasks by taking an image as input 
and then predicting bounding boxes and class probabilities for each object in the image.

YOLO stands for “You Only Look Once”, it is a popular family of real-time object detection algorithms. 
The original YOLO object detector was first released in 2016. Since then, different versions and variants of YOLO have been proposed, each providing a significant increase in performance and efficiency.
YOLOv7 is next stage of evalution of YOLO models family which provides a greatly improved real-time object detection accuracy without increasing the inference costs.
More details about its realization can be found in original model [paper](https://arxiv.org/abs/2207.02696) and [repository](https://github.com/WongKinYiu/yolov7)

Real-time object detection is often used as a key component in computer vision systems. 
Applications that use real-time object detection models include video analytics, robotics, autonomous vehicles, multi-object tracking and object counting, medical image analysis, and many others.

This short tutorial demonstrates step-by-step instruction how to convert Pytorch Yolo V7 to OpenVINO IR and optimize it using NNCF PTQ API.

The tutorial consists of the following steps:
- Prepare PyTorch model
- Download and prepare dataset
- Validate original model
- Convert PyTorch model to ONNX
- Convert ONNX model to OpenVINO IR
- Validate converted model
- Prepare and run optimization pipeline
- Compare accuracy of the FP32 and quantized models.
- Compare performance of the FP32 and quantized models.

## Get Pytorch model

Generally, PyTorch model represents instance of torch.nn.Module class, iniatilized by state dictionary with model weights.
We will use YOLOv7 tiny model pretrained on COCO dataset, which available in this [repo](https://github.com/WongKinYiu/yolov7).
Typical steps for getting pretrained model:
1. Create instance of model class
2. Load checkpoint state dict, which contains pretrained model weights
3. Turn model to evaluation for switching some operations to inference mode

In our case, model authors already provide tool which allow to convert model to ONNX, so it is not necessary to do these steps manually.

## Prerequisites

In [None]:
import sys
from pathlib import Path
sys.path.append("../utils")
from notebook_utils import download_file

In [None]:
# Download YOLOv7 code
if not Path('yolov7').exists():
    !git clone https://github.com/WongKinYiu/yolov7
%cd yolov7

In [None]:
# Download pretrained model weights
MODEL_LINK = "https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-tiny.pt"
DATA_DIR = Path("data/")
MODEL_DIR = Path("model/")
MODEL_DIR.mkdir(exist_ok=True)
DATA_DIR.mkdir(exist_ok=True)

download_file(MODEL_LINK, directory=MODEL_DIR, show_progress=True)


## Check model inference

`detect.py` script run pytorch model inference and save image as result,

In [None]:
!python detect.py --weights model/yolov7-tiny.pt --conf 0.25 --img-size 640 --source inference/images/horses.jpg


In [None]:
# visualize prediction result
from PIL import Image
Image.open('runs/detect/exp/horses.jpg')


## Export to ONNX

In order to obtain ONNX model, we will use `export.py` script. Let's check it's arguments.

In [None]:
!python export.py - -help


The most important parameters:
* `--weights` - path to model weigths checkpoint
* `--img-size` - size of input image for onnx tracing

As ONNX is less flexible format then PyTorch, there is also opportunity to setup configurable parameters for results postprocessing included in model:
* `--end2end` - export full model to onnx including postporcessing
* `--grid` - export Detect layer as part of model
* `--topk-all` - topk elements for all images
* `--iou-thres` - intersection over union threshold for NMS
* `--conf-thres` - minimal confidence threshold
* `--max-wh` - max bounding box width and height for NMS

In [34]:
!python export.py --weights model/yolov7-tiny.pt --grid


Import onnx_graphsurgeon failure: No module named 'onnx_graphsurgeon'
Namespace(batch_size=1, conf_thres=0.25, device='cpu', dynamic=False, dynamic_batch=False, end2end=False, fp16=False, grid=True, img_size=[640, 640], include_nms=False, int8=False, iou_thres=0.45, max_wh=None, simplify=False, topk_all=100, weights='model/yolov7-tiny.pt')
YOLOR 🚀 v0.1-115-g072f76c torch 1.13.0+cu117 CPU

Fusing layers... 
Model Summary: 200 layers, 6219709 parameters, 6219709 gradients
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]

Starting TorchScript export with torch 1.13.0+cu117...
  if self.grid[i].shape[2:4] != x[i].shape[2:4]:
TorchScript export success, saved as model/yolov7-tiny.torchscript.pt
CoreML export failure: No module named 'coremltools'

Starting TorchScript-Lite export with torch 1.13.0+cu117...
TorchScript-Lite export success, saved as model/yolov7-tiny.torchscript.ptl

Starting ONNX export with onnx 1.11.0...
  if augment:
  if profile:
  if profile:
ONNX 

## Convert ONNX Model to OpenVINO Intermideate Representation
While ONNX models are directly supported by OpenVINO™, it can be useful to convert them to IR format to take advantage of OpenVINO optimization tools and features.
`mo.convert` function can be used for converting model using OpenVINO Model Optimizer capabilities. 
It returns of instance OpenVINO Model class, which is ready to use in python interface and can be serialized to IR for future execution.

In [None]:
from openvino.tools import mo
from openvino.runtime import serialize

model = mo.convert(input_model='model/yolov7-tiny.onnx')
# serialize model for saving IR
serialize(model, 'model/yolov7-tiny.xml')


## Verify model inference

In [None]:
import torch
from utils.dataset import letterbox
from detect import non_maximum_supression
from utils.general import scale_coords
from utils.plots import plot_one_box

def preprocess_image(img0):
    img = letterbox(img, img_size, stride)
    
    # Convert
    img = img[:, :, ::-1].transpose(2, 0, 1)  # BGR to RGB, to 3x416x416
    img = np.ascontiguousarray(img)
    return img, img0


def prepare_input_tensor(image):
    img = torch.from_numpy(image)
    img = img.half() if half else img.float()  # uint8 to fp16/32
    img /= 255.0  # 0 - 255 to 0.0 - 1.0
    if img.ndimension() == 3:
        img = img.unsqueeze(0)
    return image

names = ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
         'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
         'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
         'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
         'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
         'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
         'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
         'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
         'hair drier', 'toothbrush']
colors = {name: [np.random.randint(0, 255) for _ in range(3)]
          for i, name in enumerate(names)}


In [None]:
def get_boxes(output, dwdh, ratio, flatten=False):
    batch_ids = int(np.max(output[:, 0]))
    total_boxes, total_labels, total_scores = [], [], []
    total_predictions = []
    for batch_id in range(batch_ids + 1):
        boxes, scores, labels = [], [], []
        predictions = []
        batch_elem_out = output[output[:, 0] == batch_id]
        for (_, x0, y0, x1, y1, cls_id, score) in output:
            box = np.array([x0,y0,x1,y1])
            box -= np.array(dwdh*2)
            box /= ratio
            score = float(score)
            if not flatten:
                boxes.append(box)
                scores.append(score)
                labels.append(cls_id)
            else:
                predictions.append(np.array([*box, score, cls_id]))
        total_boxes.append(boxes)
        total_labels.append(labels)
        total_scores.append(scores)
        total_predictions.append(predictions)
    if not flatten:
        return total_boxes, total_scores, total_labels
    return total_predictions


def draw_boxes(images, total_boxes, total_scores, total_labels):
    images_with_boxes = []
    for img, boxes, scores, labels in zip(images, total_boxes, total_scores, total_labels):
        if not boxes:
            continue
        for box, score, cls_id in zip(boxes, scores, labels):
            box = box.round().astype(np.int32).tolist()
            cls_id = int(cls_id)
            score = round(float(score),3)
            name = names[cls_id]
            color = colors[name]
            name += ' '+str(score)
            img = cv2.rectangle(img, box[:2], box[2:], color, 2)
            img = cv2.putText(img, name,(box[0], box[1] - 2), cv2.FONT_HERSHEY_SIMPLEX, 0.75, [225, 255, 255], thickness=2)
        images_with_boxes.append(img)
        return images_with_boxes
        

In [None]:
from openvino.runtime import Core
core = Core()
# read converted model
model = core.read_model('model/yolov7-tiny.xml')
# load model on CPU device
compiled_model = core.compile_model(model, 'CPU')
# prepare output blob for getting results
output = compiled_model.output(0)


In [None]:
# read input data example
img = cv2.imread('inference/images/horses.jpg')
# preprocess input data
im, ratio, dwdh = preprocess_image(img)
# run inference and got result for desired output
result = compiled_model([im])[output]
# postprocess result
boxes, scores, labels = get_boxes(result, dwdh, ratio)
# draw boxes on image
images_with_boxes = draw_boxes([img], boxes, scores, labels)

In [None]:
# visualize results
Image.fromarray(images_with_boxes[0][:, :, ::-1])


## Verify model accuracy

### Download dataset

Yolo V7 tiny pretrained on COCO dataset, in order to evaluate model accuracy we need to download it. According to instruction, provided in model repo, we also need to download annotation in prepared by model author format for using original model evaluation scripts

In [None]:
from zipfile import ZipFile

sys.path.append("../../utils")
from notebook_utils import download_file

data_url = "http://images.cocodataset.org/zips/val2017.zip"
labels_url = "https://github.com/ultralytics/yolov5/releases/download/v1.0/coco2017labels-segments.zip"

out_dir = Path('.')

download_file(data_url, directory=out_dir, show_progress=True)
download_file(labels_url, directory=out_dir, show_progress=True)
with ZipFile('coco2017labels-segments.zip' , "r") as zip_ref:
    zip_ref.extractall(out_dir)
with ZipFile('val2017.zip' , "r") as zip_ref:
    zip_ref.extractall(out_dir / 'coco/images')

### Create dataloader

In [None]:
from collections import namedtuple
import yaml
from utils.datasets import create_dataloader
from utils.general import (
    coco80_to_coco91_class, check_dataset, check_file, check_img_size, check_requirements,
    box_iou, non_max_suppression, scale_coords, xyxy2xywh, xywh2xyxy, set_logging, increment_path, colorstr
)

# read dataset config
data = 'data/coco.yaml'
with open(data) as f:
    data = yaml.load(f, Loader=yaml.SafeLoader)

# Dataloader
task = 'val'  # path to train/val/test images
Option = namedtuple('Options', ['single_cls']) # imitation of commandline provided options for single class evaluation
opt = Option(False)
dataloader = create_dataloader(data[task], 640, 1, 32, opt, pad=0.5, rect=True,
                                       prefix=colorstr(f'{task}: '))[0]


### Define validation function

We will reuse validation metrics provided in model repo with adoption to our case (removing extra steps). Original model evaluation procedure can be found in this [file](https://github.com/WongKinYiu/yolov7/blob/main/test.py)

In [None]:

import json
import os
from pathlib import Path

import numpy as np
import torch

from tqdm.notebook import tqdm


from utils.metrics import ap_per_class, ConfusionMatrix


# test function        
def test(data,
         model,
         dataloader,
         single_cls=False,
         save_dir=Path(''),  # for saving images
         is_coco=False,
         v5_metric=False):

    check_dataset(data)  # check
    nc = 1 if single_cls else int(data['nc'])  # number of classes
    iouv = torch.linspace(0.5, 0.95, 10)  # iou vector for mAP@0.5:0.95
    niou = iouv.numel()
    opt = Option(False)
    
    seen = 0
    confusion_matrix = ConfusionMatrix(nc=nc)
    coco91class = coco80_to_coco91_class()
    s = ('%20s' + '%12s' * 6) % ('Class', 'Images', 'Labels', 'P', 'R', 'mAP@.5', 'mAP@.5:.95')
    p, r, f1, mp, mr, map50, map, t0, t1 = 0., 0., 0., 0., 0., 0., 0., 0., 0.
    jdict, stats, ap, ap_class, wandb_images = [], [], [], [], []
    for batch_i, (img, targets, paths, shapes) in enumerate(tqdm(dataloader, desc=s)):
        img, ratio, dwdh = preprocess_image(np.transpose(img[0].numpy(), (1, 2, 0)))
        nb, _, height, width = img.shape  # batch size, channels, height, width
        # Run model
        out = model(img)[output]  # inference outputs
        out = get_boxes(out, dwdh, ratio, flatten=True)
        out = torch.from_numpy(np.array(out))
        targets[:, 2:] *= torch.Tensor([width, height, width, height])  # to pixels
        lb = []

        # Statistics per image
        for si, pred in enumerate(out):
            labels = targets[targets[:, 0] == si, 1:]
            nl = len(labels)
            tcls = labels[:, 0].tolist() if nl else []  # target class
            seen += 1

            if len(pred) == 0:
                if nl:
                    stats.append((torch.zeros(0, niou, dtype=torch.bool), torch.Tensor(), torch.Tensor(), tcls))
                continue

            # Predictions
            predn = pred.clone()

            # Assign all predictions as incorrect
            correct = torch.zeros(pred.shape[0], niou, dtype=torch.bool)
            if nl:
                detected = []  # target indices
                tcls_tensor = labels[:, 0]

                # target boxes
                tbox = xywh2xyxy(labels[:, 1:5])
                scale_coords(img[si].shape[1:], tbox, shapes[si][0], shapes[si][1])  # native-space labels

                # Per target class
                for cls in torch.unique(tcls_tensor):
                    ti = (cls == tcls_tensor).nonzero(as_tuple=False).view(-1)  # prediction indices
                    pi = (cls == pred[:, 5]).nonzero(as_tuple=False).view(-1)  # target indices

                    # Search for detections
                    if pi.shape[0]:
                        # Prediction to target ious
                        ious, i = box_iou(predn[pi, :4], tbox[ti]).max(1)  # best ious, indices

                        # Append detections
                        detected_set = set()
                        for j in (ious > iouv[0]).nonzero(as_tuple=False):
                            d = ti[i[j]]  # detected target
                            if d.item() not in detected_set:
                                detected_set.add(d.item())
                                detected.append(d)
                                correct[pi[j]] = ious[j] > iouv  # iou_thres is 1xn
                                if len(detected) == nl:  # all targets already located in image
                                    break

            # Append statistics (correct, conf, pcls, tcls)
            stats.append((correct.cpu(), pred[:, 4].cpu(), pred[:, 5].cpu(), tcls))

    # Compute statistics
    stats = [np.concatenate(x, 0) for x in zip(*stats)]  # to numpy
    if len(stats) and stats[0].any():
        p, r, ap, f1, ap_class = ap_per_class(*stats, v5_metric=v5_metric, save_dir=save_dir, names=names)
        ap50, ap = ap[:, 0], ap.mean(1)  # AP@0.5, AP@0.5:0.95
        mp, mr, map50, map = p.mean(), r.mean(), ap50.mean(), ap.mean()
        nt = np.bincount(stats[3].astype(np.int64), minlength=nc)  # number of targets per class
    else:
        nt = torch.zeros(1)

    return mp, mr, map50, map

In [None]:
result = test(data=data, model=compiled_model, dataloader=dataloader)

## Optimize model using NNCF Postrainging Quantization API

In [None]:
result

In [None]:
import nncf

def transform_fn(data_batch):
    img = data_batch[0]
    img, _, _ = preprocess_image(np.transpose(img[0].numpy()))
    return img

quantization_dataset = nncf.Dataset(dataloader, transform_fn)

In [None]:
quantized_model = nncf.quantize(model, quantization_dataset, preset=nncf.QuantizationPreset.MIXED)

serialize(quantized_model, 'model/yolov7-tiny_int8.xml')

## Validate Quantized model inference

In [None]:
compiled_int8_model = core.compile_model(quantized_model, 'CPU')
result = compiled_int8_model([im])[compiled_int8_model.output(0)]
boxes, scores, labels = get_boxes(result, dwdh, ratio)
images_with_boxes = draw_boxes([img.copy()], boxes, scores, labels)
Image.fromarray(images_with_boxes[0][:, :, ::-1])

## Validate quantized model accuracy

In [None]:
int8_result = test(data=data, model=compiled_model, dataloader=dataloader)

In [None]:
int8_result

## Compare Performance of the Original and Quantized Models
Finally, use [Benchmark Tool](https://docs.openvino.ai/latest/openvino_inference_engine_tools_benchmark_tool_README.html) to measure the inference performance of the `FP16` and `INT8` models.

> NOTE: For more accurate performance, it is recommended to run `benchmark_app` in a terminal/command prompt after closing other applications. Run `benchmark_app -m model.xml -d CPU` to benchmark async inference on CPU for one minute. Change `CPU` to `GPU` to benchmark on GPU. Run `benchmark_app --help` to see an overview of all command-line options.

In [None]:
# Inference FP16 model (OpenVINO IR)
!benchmark_app -m model/yolov7-tiny.xml -d CPU -api async

In [None]:

# Inference FP16 model (OpenVINO IR)
!benchmark_app -m model/yolov7-tiny_int8.xml -d CPU -api async