# Vehicle Detection And Recognition with OpenVINO

In this notebook, we will use both detection model and classification model with OpenVINO.We use [Object Detection Models](https://docs.openvino.ai/2020.2/usergroup3.html) and [Object Recognition Models](https://docs.openvino.ai/2020.2/usergroup4.html) from [Open Model Zoo](https://github.com/openvinotoolkit/open_model_zoo).Detection model is used to detect vehicle position.Besides, we crop single vehicle and infer with classification model to recognize attributes of single vehicle.The pipline is here： 
<img align='center' src="data/vehicle-inference-flow.png" alt="drawing" width="1000"/>

Finally, we will get the result:

<img align='center' src="data/vehicle-result.png" alt="drawing" width="300"/>

# Imports

We need import some basic package

In [None]:
import os
import sys
import time
from typing import Tuple, List

import cv2
import numpy as np
import matplotlib.pyplot as plt
from openvino.inference_engine import IECore
from openvino.inference_engine.ie_api import ExecutableNetwork

# Download Models

We need to download pretrained models to continue our progress.We use `omz_downloader`, a command-line tool installed by `openvino-dev` package.

> Note: If you want to change the model,you need to modify the model name.If you want to change the precision,you need to modify the precision value. 

In [None]:
# Directory where model will be downloaded
base_model_dir = "model"
# Model name as named in Open Model Zoo
detection_model_name = "vehicle-detection-0200"
recognition_model_name = "vehicle-attributes-recognition-barrier-0039"
# Selected precision (FP32, FP16, FP16-INT8)
precision = "FP32"

# Check the model exists 
detection_model_path = (
    f"model/intel/{detection_model_name}/{precision}/{detection_model_name}.xml"
)
recognition_model_path = (
    f"model/intel/{recognition_model_name}/{precision}/{recognition_model_name}.xml"
)
# Download detection model
if not os.path.exists(detection_model_path):
    download_command = f"omz_downloader " \
                       f"--name {detection_model_name} " \
                       f"--precision {precision} " \
                       f"--output_dir {base_model_dir}"
    ! $download_command
# Download recognition model
if not os.path.exists(recognition_model_path):
    download_command = f"omz_downloader " \
                   f"--name {recognition_model_name} " \
                   f"--precision {precision} " \
                   f"--output_dir {base_model_dir}"
    ! $download_command

# Load Models

In this notebook,we will need detection model and recognition model.When we download models,we need to initialize inference engine(IECore),and use `read_network` to read network architecture and weights from *.xml and *.bin files.Then,we load the network on the "CPU" using `load_network`.

In [None]:
# Initialize inference engine
ie_core = IECore()

def model_init(model: str) -> Tuple:
    """
    Read the network and weights from file, load the
    model on the CPU and get input and output names of nodes

    :param: model: model architecture path *.xml
    :retuns:
            input_key: Input node network
            output_key: Output node network
            exec_net: Encoder model network
            net: Model network
    """

    # Read the network and corresponding weights from file
    net = ie_core.read_network(model)
    # load the model on the CPU (you can use GPU or MYRIAD as well)
    exec_net = ie_core.load_network(net, "CPU")
    # Get input and output names of nodes
    input_keys = list(exec_net.input_info)
    output_keys = list(exec_net.outputs.keys())
    return input_keys, output_keys, exec_net, net

### Get attributes from model

We use `XXX.input_info.tensor_desc.dims` to get data shapes

In [None]:
# de -> detection
# re -> recognition
# Detection model initialization
input_key_de, output_keys_de, exec_net_de, net_de = model_init(detection_model_path)
# Recognition model initialization
input_key_re, output_keys_re, exec_net_re, net_re = model_init(recognition_model_path)

# Get input size - Detection
height_de, width_de = net_de.input_info[input_key_de[0]].tensor_desc.dims[2:]
# Get input size - Recognition
height_re, width_re = net_re.input_info[input_key_re[0]].tensor_desc.dims[2:]

### Helper function

1. `plt_show` function is used to show image 
2. `softmax` function is a generalization of the logistic function to multiple dimensions to normalize the output of a network to a probability distribution over predicted output classes

In [None]:
def plt_show(raw_image):
    """
    Use matplot to show image inline
    raw_image: input image
    
    :param: raw_image:image array
    """
    plt.figure(figsize=(10, 6))
    plt.axis("off")
    plt.imshow(raw_image)

def softmax(x: np.ndarray) -> np.ndarray:
    """
    Normalizes logits to get confidence values along specified axis
    x: np.array, axis=None
    """
    exp = np.exp(x)
    return exp / np.sum(exp, axis=None)

### Read and show a test image

From detection model input shape `[1, 3, 256, 256]`,we need to resize the image size to `256 x 256`,and expand batch channel with `expand_dims` function.

In [None]:
# Read a image
image_de = cv2.imread("data/cars.jpg")
# Resize to [3, 256, 256]
resized_image_de = cv2.resize(image_de, (width_de, height_de))
# Expand to [1, 3, 256, 256]
input_image_de = np.expand_dims(resized_image_de.transpose(2, 0, 1), 0)
# Show image
plt_show(cv2.cvtColor(image_de, cv2.COLOR_BGR2RGB))

# Use detection network to detect vehicles

![pipline](data/vehicle-inference-flow.png)

In flowchart,We need single vehicles and send to recognition model.First,we use `infer` function to get the result.

The detection model output has the format [image_id, label, conf, x_min, y_min, x_max, y_max], where:

- image_id - ID of the image in the batch
- label - predicted class ID (0 - vehicle)
- conf - confidence for the predicted class
- (x_min, y_min) - coordinates of the top left bounding box corner
- (x_max, y_max) - coordinates of the bottom right bounding box corner

We need to delete useless dims and filter out useless results.

In [None]:
# Inference the network
result = exec_net_de.infer(inputs={input_key_de[0]: input_image_de})

# Get the result
boxes = result[next(iter(output_keys_de))]
# delete the dim of 0, 1
boxes = np.squeeze(boxes, (0, 1))
# Remove zero only boxes
boxes = boxes[~np.all(boxes == 0, axis=1)]

### Detection Processing

In this function, we use boxes to draw rectangles in image,and then we need to filter out low confidence results

In [None]:
def convert_result_to_image(bgr_image, resized_image, boxes, threshold=0.6) -> np.ndarray:
    """
    Use Detection model boxes to draw rectangles and plot the result
    
    :param: bgr_image: raw image
    :param: resized_image: resized image
    :param: boxes: detection model returns rectangle position
    :param: threshold: confidence threshold
    :returns: rgb_image: processed image
    """
    # Define colors for boxes and descriptions
    colors = {"red": (255, 0, 0), "green": (0, 255, 0)}

    # Fetch image shapes to calculate ratio
    (real_y, real_x), (resized_y, resized_x) = bgr_image.shape[:2], resized_image.shape[:2]
    ratio_x, ratio_y = real_x / resized_x, real_y / resized_y

    # Convert base image from bgr to rgb format
    rgb_image = cv2.cvtColor(bgr_image, cv2.COLOR_BGR2RGB)

    boxes = boxes[:, 2:]
    # Iterate through non-zero boxes
    for box in boxes:
        # Pick confidence factor from last place in array
        conf = box[0]
        if conf > threshold:
            # Convert float to int and multiply corner position of each box by x and y ratio
            # In case that bounding box is found at the top of the image, 
            # we position upper box bar little bit lower to make it visible on image 
            # 在顶部找到的边界框，将box下移
            (x_min, y_min, x_max, y_max) = [
                int(max(corner_position * ratio_y * 256, 10)) if idx % 2 
                else int(corner_position * ratio_x * 256)
                for idx, corner_position in enumerate(box[1:]) # 不需要置信度
            ]
            
            print(f"x_min:{x_min}, y_min:{y_min}, x_max:{x_max}, y_max:{y_max}")
            
            # Draw box based on position, parameters in rectangle function are: image, start_point, end_point, color, thickness
            rgb_image = cv2.rectangle(rgb_image, (x_min, y_min), (x_max, y_max), colors["red"], 2)

    return rgb_image

In [None]:
plt_show(convert_result_to_image(image_de, resized_image_de, boxes))

### Recognize the vehicle's attributes

We choose one of the box,and crop the vehicle to test recognition model.Similarly,we need to resize the input image, and infer it.

In [None]:
# Crop the image with [y_min:y_max, x_min:x_max]
# Three vehicles in car.jpg
# test_car = resized_image_de[100:164, 0:100]
# test_car = resized_image_de[130:215, 166:254]
test_car = resized_image_de[92:152, 126:209]

# resize image to input_size
resized_image_re = cv2.resize(test_car, (width_re, height_re))
input_image_re = np.expand_dims(resized_image_re.transpose(2, 0, 1), 0)
plt_show(cv2.cvtColor(resized_image_re, cv2.COLOR_BGR2RGB))

##### Recognition progressing

We will get the result contains colors(White, gray, yellow, red, green, blue, black) and types(Car, bus, truck, van), then we need to get the probability of each attribute, finally we need to choose the max probability as the result.

In [None]:
def vehicle_recognition(exec_net_re, input_size, raw_image):
    """
    Vehicle attributes recognition, input a single vehicle, return it's attribute
    :param: exec_net_re: recognition net 
    :param: input_size: recognition input size
    :param: raw_image: single vehicle image
    :returns: attr_color: predicted color
              attr_type: predicted type
    """
    # vehicle's attribute
    colors = ['White', 'Gray', 'Yellow', 'Red', 'Green', 'Blue', 'Black']
    types = ['Car', 'Bus', 'Truck', 'Van']
    
    # resize image to input size
    resized_image_re = cv2.resize(raw_image, input_size)
    input_image_re = np.expand_dims(resized_image_re.transpose(2, 0, 1), 0)
    plt_show(cv2.cvtColor(resized_image_re, cv2.COLOR_BGR2RGB))
    
    # inference the recognition net
    result = exec_net_re.infer(inputs={input_key_re[0]: input_image_re})
    
    # predict result
    predict_colors = result['color']
    # delete the dim of 2, 3
    predict_colors = np.squeeze(predict_colors, (2, 3))
    predict_types = result['type']
    predict_types = np.squeeze(predict_types, (2, 3))

    attr_color, attr_type = (colors[np.argmax(softmax(predict_colors))],
                             types[np.argmax(softmax(predict_types))])
    return attr_color, attr_type

In [None]:
print(f"Attributes:{vehicle_recognition(exec_net_re, (72, 72), test_car)}")

### Conbine two models

Congratulations!Now we can use detection model to crop single vehicle and recognize the vehicle's attribute.

In [None]:
def convert_result_to_image(exec_net_re, bgr_image, resized_image, boxes, threshold=0.6):
    """
    Use Detection model boxes to draw rectangles and plot the result
    
    :param: bgr_image: raw image
    :param: resized_image: resized image
    :param: boxes: detection model returns rectangle position
    :param: threshold: confidence threshold
    :returns: rgb_image: processed image
    """
    # Define colors for boxes and descriptions
    colors = {"red": (255, 0, 0), "green": (0, 255, 0)}

    # Fetch image shapes to calculate ratio
    (real_y, real_x), (resized_y, resized_x) = bgr_image.shape[:2], resized_image.shape[:2]
    ratio_x, ratio_y = real_x / resized_x, real_y / resized_y

    # Convert base image from bgr to rgb format
    rgb_image = cv2.cvtColor(bgr_image, cv2.COLOR_BGR2RGB)
    
    # original boxes are: image_id, label, conf, x_min, y_min, x_max, y_max
    boxes = boxes[:, 2:]
    # Iterate through non-zero boxes
    for box in boxes:
        # Pick confidence factor from first place in array
        conf = box[0]
        if conf > threshold:
            # Convert float to int and multiply corner position of each box by x and y ratio
            # Besides, we need to multiply the position scale ratio 
            (x_min, y_min, x_max, y_max) = [
                int(max(corner_position * ratio_y * 256, 10)) if idx % 2 
                else int(corner_position * ratio_x * 256)
                for idx, corner_position in enumerate(box[1:]) 
            ]
            
            # Do vehicle recognition inference
            attr_color, attr_type = vehicle_recognition(exec_net_re, (72, 72), 
                                                        resized_image[y_min:y_max, x_min:x_max])
            
            # close the vehicle window
            plt.close()
            
            # Draw box based on position
            # Parameters in rectangle function are: image, start_point, end_point, color, thickness
            rgb_image = cv2.rectangle(rgb_image, (x_min, y_min), (x_max, y_max), colors["red"], 2)
            
            # Draw vehicle attribute 
            # parameters in putText function are: img, text, org, fontFace, fontScale, color, thickness, lineType
            rgb_image = cv2.putText(rgb_image, 
                                    f"A {attr_color} {attr_type}",
                                   (x_min, y_min - 10),
                                   cv2.FONT_HERSHEY_SIMPLEX,
                                   0.5,
                                   colors["green"],
                                   1,
                                   cv2.LINE_AA)
    return rgb_image

In [None]:
plt_show(convert_result_to_image(exec_net_re, image_de, resized_image_de, boxes))