## 基于YOLO-V5的目标检测推理与Openvino优化

该示例脚本演示如何基于YOLO-V5模型在自定义的数据集上训练目标检测模型，以及利用训练完后的目标检测模型进行前向推理的详细步骤。
最后还演示利用Intel Openvino编译优化YOLO-V5模型的过程。

Kernel选择：conda_pytorch_latest_p36 
PyTorch Version: 1.7.1

#### 步骤一：基于官方yolov5在自定义的数据集上训练目标检测模型

- 数据集路径：s3://neo-models-zoo/datasets/widerperson.zip
- 数据集介绍：http://www.cbsr.ia.ac.cn/users/sfzhang/widerperson/
- YOLO5 Repo: https://github.com/ultralytics/yolov5.git

In [136]:
import torch
print(torch.__version__)

1.10.0+cu102


In [137]:
! rm -rf yolov5
! rm -rf datasets
! git clone https://github.com/ultralytics/yolov5.git

Cloning into 'yolov5'...
remote: Enumerating objects: 10036, done.[K
remote: Total 10036 (delta 0), reused 0 (delta 0), pack-reused 10036[K
Receiving objects: 100% (10036/10036), 10.38 MiB | 45.42 MiB/s, done.
Resolving deltas: 100% (6943/6943), done.


In [138]:
! cd yolov5 && pip install -r requirements.txt



You should consider upgrading via the '/home/ec2-user/anaconda3/envs/pytorch_p36/bin/python -m pip install --upgrade pip' command.[0m


In [None]:
# 训练yolov5s检测模型
! cd yolov5 && python3 train.py --data factory.yaml --weights yolov5m.pt --img 640 --epochs 30

[34m[1mtrain: [0mweights=yolov5m.pt, cfg=, data=factory.yaml, hyp=data/hyps/hyp.scratch.yaml, epochs=30, batch_size=16, imgsz=640, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, evolve=None, bucket=, cache=None, image_weights=False, device=, multi_scale=False, single_cls=False, adam=False, sync_bn=False, workers=8, project=runs/train, name=exp, exist_ok=False, quad=False, linear_lr=False, label_smoothing=0.0, patience=100, freeze=0, save_period=-1, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest
[34m[1mgithub: [0mup to date with https://github.com/Gaowei-Xu/yolov5 ✅
YOLOv5 🚀 v6.0-111-g040e0f9 torch 1.10.0+cu102 CUDA:0 (Tesla T4, 15110MiB)

[34m[1mhyperparameters: [0mlr0=0.01, lrf=0.1, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degree


     Epoch   gpu_mem       box       obj       cls    labels  img_size
      8/29      7.3G   0.06803   0.04191   0.04419        87       640: 100%|███
               Class     Images     Labels          P          R     mAP@.5 mAP@
                 all         87        634      0.611      0.399      0.296      0.111

     Epoch   gpu_mem       box       obj       cls    labels  img_size
      9/29      7.3G   0.06712     0.041   0.04095        98       640: 100%|███
               Class     Images     Labels          P          R     mAP@.5 mAP@
                 all         87        634      0.349      0.673      0.543      0.247

     Epoch   gpu_mem       box       obj       cls    labels  img_size
     13/29      7.3G   0.05639    0.0386   0.03009        68       640: 100%|███
               Class     Images     Labels          P          R     mAP@.5 mAP@
                 all         87        634      0.544      0.702      0.661      0.297

     Epoch   gpu_mem       box      

#### 步骤二：将yolov5模型转化为Openvino IR格式

Reference：https://docs.openvino.ai/latest/openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html

In [151]:
! pip install onnx==1.10.0
! pip install install openvino-dev==2021.4.2

You should consider upgrading via the '/home/ec2-user/anaconda3/envs/pytorch_p36/bin/python -m pip install --upgrade pip' command.[0m


You should consider upgrading via the '/home/ec2-user/anaconda3/envs/pytorch_p36/bin/python -m pip install --upgrade pip' command.[0m


In [152]:
! cd yolov5 && python3 export.py --weights runs/train/exp/weights/best.pt --img 640 --batch 1 --include openvino

[34m[1mexport: [0mdata=data/coco128.yaml, weights=['runs/train/exp/weights/best.pt'], imgsz=[640], batch_size=1, device=cpu, half=False, inplace=False, train=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=12, verbose=False, workspace=4, nms=False, agnostic_nms=False, topk_per_class=100, topk_all=100, iou_thres=0.45, conf_thres=0.25, include=['openvino']
YOLOv5 🚀 v6.0-111-g040e0f9 torch 1.10.0+cu102 CPU

Fusing layers... 
Model Summary: 290 layers, 20905467 parameters, 0 gradients, 48.1 GFLOPs

[34m[1mPyTorch:[0m starting from runs/train/exp/weights/best.pt (42.3 MB)

[34m[1mONNX:[0m starting export with onnx 1.10.0...
  if self.onnx_dynamic or self.grid[i].shape[2:4] != x[i].shape[2:4]:
[34m[1mONNX:[0m export success, saved as runs/train/exp/weights/best.onnx (84.1 MB)
[34m[1mONNX:[0m run --dynamic ONNX model inference with: 'python detect.py --weights runs/train/exp/weights/best.onnx'

[34m[1mOpenVINO:[0m starting export with openvino 2021.4.

In [4]:
! ls /home/ec2-user/SageMaker/yolov5/runs/train/exp/weights/best_openvino_model
! du -h /home/ec2-user/SageMaker/yolov5/runs/train/exp/weights/best_openvino_model

best.bin  best.mapping	best.xml
81M	/home/ec2-user/SageMaker/yolov5/runs/train/exp/weights/best_openvino_model


#### 步骤三：前向推理测试

In [5]:
def letterbox(im, new_shape=(640, 640), color=(114, 114, 114), auto=True, scaleFill=False, scaleup=True, stride=32):
    # Resize and pad image while meeting stride-multiple constraints
    shape = im.shape[:2]  # current shape [height, width]
    if isinstance(new_shape, int):
        new_shape = (new_shape, new_shape)

    # Scale ratio (new / old)
    r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
    if not scaleup:  # only scale down, do not scale up (for better val mAP)
        r = min(r, 1.0)

    # Compute padding
    ratio = r, r  # width, height ratios
    new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
    dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # wh padding
    if auto:  # minimum rectangle
        dw, dh = np.mod(dw, stride), np.mod(dh, stride)  # wh padding
    elif scaleFill:  # stretch
        dw, dh = 0.0, 0.0
        new_unpad = (new_shape[1], new_shape[0])
        ratio = new_shape[1] / shape[1], new_shape[0] / shape[0]  # width, height ratios

    dw /= 2  # divide padding into 2 sides
    dh /= 2

    if shape[::-1] != new_unpad:  # resize
        im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)
    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
    im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # add border
    return im, ratio, (dw, dh)

In [7]:
import cv2
import matplotlib.pyplot as plt
import numpy as np
import time
from openvino.inference_engine import IECore

ie = IECore()

net = ie.read_network(
    model="/home/ec2-user/SageMaker/yolov5/runs/train/exp/weights/best_openvino_model/best.xml",
    weights="/home/ec2-user/SageMaker/yolov5/runs/train/exp/weights/best_openvino_model/best.bin",
)
exec_net = ie.load_network(net, "CPU")

output_layer_ir = next(iter(exec_net.outputs))
input_layer_ir = next(iter(exec_net.input_info))


# load an image
# Text detection models expects image in BGR format
image = cv2.imread("./datasets/factory/images/IMG_00000.jpg")

# N,C,H,W = batch size, number of channels, height, width
N, C, H, W = net.input_info[input_layer_ir].tensor_desc.dims

# Resize image to meet network expected input sizes
# resized_image = cv2.resize(image, (W, H))
im, ratio, (dw, dh) = letterbox(image, new_shape=(640, 640), stride=32, auto=False)

# Reshape to network input shape
input_image = im.transpose((2, 0, 1))[::-1]  # HWC to CHW, BGR to RGB
input_image = np.expand_dims(input_image, 0)
input_image = input_image / 255.0

# do inference
t1 = time.time()
result = exec_net.infer(inputs={input_layer_ir: input_image})
t2 = time.time()

print("t2 - t1 = {} seconds".format(t2 - t1))
# print(result.keys())
print(result["output"])
print(result["output"].shape)



t2 - t1 = 0.19247078895568848 seconds
[[[1.0079020e+01 4.5858784e+00 5.3581104e+00 ... 1.4807056e-01
   2.6016332e-02 3.1623468e-02]
  [1.7636999e+01 6.3849373e+00 9.9057484e+00 ... 1.4969431e-01
   2.4869159e-02 3.1803496e-02]
  [2.4440678e+01 7.6431808e+00 1.3563699e+01 ... 1.4739256e-01
   1.8786799e-02 2.1544196e-02]
  ...
  [5.6790222e+02 6.3302032e+02 6.5908472e+02 ... 2.7607130e-02
   2.9355656e-02 3.9201073e-02]
  [6.0403583e+02 6.4407703e+02 6.2858954e+02 ... 1.7066147e-02
   3.6628943e-02 3.0496929e-02]
  [6.2628082e+02 6.4488672e+02 5.8874469e+02 ... 1.3586187e-02
   4.2572465e-02 5.0827343e-02]]]
(1, 25200, 19)


In [8]:
print(ratio, (dw, dh))

(0.16, 0.16) (80.0, 0.0)


In [9]:
import numpy as np


def xywh2xyxy(x):
    # Convert nx4 boxes from [x, y, w, h] to [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-right
    y = np.copy(x)
    y[:, 0] = x[:, 0] - x[:, 2] / 2  # top left x
    y[:, 1] = x[:, 1] - x[:, 3] / 2  # top left y
    y[:, 2] = x[:, 0] + x[:, 2] / 2  # bottom right x
    y[:, 3] = x[:, 1] + x[:, 3] / 2  # bottom right y
    return y


def nms(boxes, scores, iou_thresh):
    """
    Args:
        boxes (Tensor[N, 4])): boxes to perform NMS on. They
            are expected to be in ``(x1, y1, x2, y2)`` format with ``0 <= x1 < x2`` and
            ``0 <= y1 < y2``.
        scores (Tensor[N]): scores for each one of the boxes
        iou_thresh (float): discards all overlapping boxes with IoU > iou_threshold

    Returns:
        Tensor: int64 tensor with the indices of the elements that have been kept
        by NMS, sorted in decreasing order of scores
    """
    # （x1、y1）（x2、y2）为box的左上和右下角标
    x1 = boxes[:, 0]
    y1 = boxes[:, 1]
    x2 = boxes[:, 2]
    y2 = boxes[:, 3]

    # 每一个候选框的面积
    areas = (x2 - x1 + 1) * (y2 - y1 + 1)

    # order是按照score降序排序的，得到的是排序的本来的索引，不是排完序的原数组, ::-1表示逆序
    order = scores.argsort()[::-1]

    temp = []
    while order.size > 0:
        i = order[0]
        temp.append(i)
        # 计算当前概率最大矩形框与其他矩形框的相交框的坐标
        # 由于numpy的broadcast机制，得到的是向量
        xx1 = np.maximum(x1[i], x1[order[1:]])
        yy1 = np.minimum(y1[i], y1[order[1:]])
        xx2 = np.minimum(x2[i], x2[order[1:]])
        yy2 = np.maximum(y2[i], y2[order[1:]])

        # 计算相交框的面积,注意矩形框不相交时w或h算出来会是负数，需要用0代替
        w = np.maximum(0.0, xx2 - xx1 + 1)
        h = np.maximum(0.0, yy2 - yy1 + 1)
        inter = w * h

        # 计算重叠度IoU
        ovr = inter / (areas[i] + areas[order[1:]] - inter)

        # 找到重叠度不高于阈值的矩形框索引
        inds = np.where(ovr <= iou_thresh)[0]
        # 将order序列更新，由于前面得到的矩形框索引要比矩形框在原order序列中的索引小1，所以要把这个1加回来
        order = order[inds + 1]
    
    return np.array(temp)



def non_max_suppression(prediction, conf_thres=0.25, iou_thres=0.45, classes=None, agnostic=False, multi_label=False,
                        labels=(), max_det=300):
    """Runs Non-Maximum Suppression (NMS) on inference results

    Returns:
         list of detections, on (n,6) tensor per image [xyxy, conf, cls]
    """

    nc = prediction.shape[2] - 5          # number of classes
    xc = prediction[..., 4] > conf_thres  # candidates

    print("number of classes = {}".format(nc))
    print("number of candidates = {}".format(xc))

    # Checks
    assert 0 <= conf_thres <= 1, f'Invalid Confidence threshold {conf_thres}, valid values are between 0.0 and 1.0'
    assert 0 <= iou_thres <= 1, f'Invalid IoU {iou_thres}, valid values are between 0.0 and 1.0'

    # Settings
    min_wh, max_wh = 2, 4096  # (pixels) minimum and maximum box width and height
    max_nms = 30000           # maximum number of boxes into torchvision.ops.nms()
    time_limit = 10.0         # seconds to quit after
    redundant = True          # require redundant detections
    multi_label &= nc > 1     # multiple labels per box (adds 0.5ms/img)

    t = time.time()
    output = [np.zeros((0, 6))] * prediction.shape[0]
    for xi, x in enumerate(prediction):  # image index, image inference
        # Apply constraints
        # x[((x[..., 2:4] < min_wh) | (x[..., 2:4] > max_wh)).any(1), 4] = 0  # width-height
        x = x[xc[xi]]  # confidence

        # If none remain process next image
        if not x.shape[0]:
            continue

        # Compute conf
        x[:, 5:] *= x[:, 4:5]  # conf = obj_conf * cls_conf

        # Box (center x, center y, width, height) to (x1, y1, x2, y2)
        box = xywh2xyxy(x[:, :4])

        # Detections matrix nx6 (xyxy, conf, cls)        
        conf = x[:, 5:].max(axis=1, keepdims=True)
        j = x[:, 5:].argmax(axis=1)
        j = np.expand_dims(j, axis=-1)
        
        x = np.concatenate((box, conf, j), axis=-1)
        x = x[np.where(conf[:, 0] > conf_thres)]

        # Check shape
        n = x.shape[0]  # number of boxes
        if not n:  # no boxes
            continue
        elif n > max_nms:  # excess boxes
            x = x[x[:, 4].argsort(descending=True)[:max_nms]]  # sort by confidence

        # Batched NMS
        c = x[:, 5:6] * (0 if agnostic else max_wh)  # classes
        boxes, scores = x[:, :4] + c, x[:, 4]  # boxes (offset by class), scores
        
        i = nms(boxes, scores, iou_thres)  # NMS
        
        if i.shape[0] > max_det:  # limit detections
            i = i[:max_det]

        output[xi] = x[i]
        if (time.time() - t) > time_limit:
            print(f'WARNING: NMS time limit {time_limit}s exceeded')
            break  # time limit exceeded

    return output

In [10]:
pred = result["output"]
conf_thres = 0.40
iou_thres = 0.45
classes = None
agnostic_nms = False
max_det = 1000
print(pred.shape)

import torch

pred = non_max_suppression(pred, conf_thres, iou_thres, classes, agnostic_nms, max_det=max_det)
print(pred[0].shape)


(1, 25200, 19)
number of classes = 14
number of candidates = [[False False False ... False False False]]
(3, 6)


In [11]:
class_names = [
    "2D_CODE", "Caution", "3C", "EAC", "UL2", "WEEE", "KC", 
    "ATEX", "FM2", "Failsafe", "RCM", "FM", "CE", "UL1"]

colors = {
    "2D_CODE": (255, 51, 51), 
    "Caution": (255, 0, 255), 
    "3C": (255, 128, 0), 
    "EAC": (0, 153, 0), 
    "UL2": (200, 153, 89), 
    "WEEE": (10, 29, 199), 
    "KC": (49, 48, 153), 
    "ATEX": (40, 210, 144), 
    "FM2": (182, 255, 40), 
    "Failsafe": (100, 92, 49), 
    "RCM": (255, 80, 153), 
    "FM": (255, 20, 20), 
    "CE": (190, 153, 153), 
    "UL1": (100, 153, 153), 
}

full_path = "./datasets/factory/images/IMG_00000.jpg"
vis_image = cv2.imread(full_path, cv2.IMREAD_COLOR)
height, width, channels = vis_image.shape

scale_w, scale_h = ratio

detections = pred[0]

for det in detections:
    x_min = (det[0] - dw) / scale_w
    y_min = (det[1] - dh) / scale_h
    x_max = (det[2] - dw) / scale_w
    y_max = (det[3] - dh) / scale_h
    
    color = colors[class_names[int(det[-1])]]
    color = (color[2], color[1], color[0])
    label_info = '{} {:.3f}'.format(class_names[int(det[5])], det[4])
    vis_image = cv2.rectangle(vis_image, (int(x_min), int(y_min)), (int(x_max), int(y_max)), color, 5)
    vis_image = cv2.putText(vis_image, label_info, (int(x_min), int(y_min)-10), cv2.FONT_HERSHEY_SIMPLEX, 1.0, color, 2, cv2.LINE_AA)
    
cv2.imwrite("./result.png", vis_image)


True