# PaddlePaddle Image Classification with OpenVINO
This demo shows how to run a MobileNetV3 Large PaddePaddle model using OpenVINO Runtime. Instead of exporting the PaddlePaddle model to ONNX and converting to Intermediate Representation (IR) format using Model Optimizer, we can now read the Paddle model directly without conversion.

## Download the MobileNetV3_large_x1_0 Model
Download the pre-trained model directly from the server. More details about the pre-trained model can be found in the PaddleClas documentation below.

Source: https://github.com/PaddlePaddle/PaddleClas/blob/release/2.2/deploy/lite/readme_en.md

In [None]:
from pathlib import Path
import os
import urllib.request
import tarfile

mobilenet_url = "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV3_large_x1_0_infer.tar"
mobilenetv3_model_path = Path("model/MobileNetV3_large_x1_0_infer/inference.pdmodel")
if mobilenetv3_model_path.is_file(): 
    print("Model MobileNetV3_large_x1_0 already exists")
else:
    # Download the model from the server, and untar it.
    print("Downloading the MobileNetV3_large_x1_0_infer model (20Mb)... May take a while...")
    # create a directory 
    os.makedirs('model')
    urllib.request.urlretrieve(mobilenet_url, "model/MobileNetV3_large_x1_0_infer.tar")
    print("Model Downloaded")

    file = tarfile.open("model/MobileNetV3_large_x1_0_infer.tar")
    res = file.extractall('model')
    file.close()
    if (not res):
        print("Model Extracted to \"model/MobileNetV3_large_x1_0_infer\".")
    else:
        print("Error Extracting the model. Please check the network.")

## Define the callback function for postprocessing

In [3]:
def callback(infer_request, i) -> None:
    """
    Define the callback function for postprocessing
    
    :param: infer_request: the infer_request object
            i: the iteration of inference
    :retuns:
            None
    """
    imagenet_classes = json.loads(open("utils/imagenet_class_index.json").read())
    predictions = next(iter(infer_request.results.values()))
    indices = np.argsort(-predictions[0])
    if (i == 0):
        # Calculate the first inference time
        latency = time.time() - start
        print("latency:", latency)
        for i in range(5):
            print(
                "Class name:","'" + imagenet_classes[str(list(indices)[i])][1] + "'",
                ", probability:" , predictions[0][list(indices)[i]])

## Read the model file 

In [4]:
import openvino.runtime as ov

# Intialize Inference Engine with Core()
ie = ov.Core()
# MobileNetV3_large_x1_0
model = ie.read_model("model/MobileNetV3_large_x1_0_infer/inference.pdmodel")
# get the information of intput and output layer
input_layer = model.input(0)
output_layer = model.output(0)

## Integrate preprocessing steps into the execution graph with Preprocessing API
If your input data does not fit perfectly in the model input tensor additional operations/steps are needed to transform the data to a format expected by the model. These operations are known as “preprocessing”.
Preprocessing steps are integrated into the execution graph and performed on the selected device(s) (CPU/GPU/VPU/etc.) rather than always executed on CPU. This improves utilization on the selected device(s).

Overview of Preprocessing API: https://docs.openvino.ai/latest/openvino_docs_OV_Runtime_UG_Preprocessing_Overview.html

In [5]:
import cv2
import numpy as np
from openvino.preprocess import PrePostProcessor
from openvino.runtime import Layout, Type
from openvino.preprocess import ResizeAlgorithm
from openvino.runtime import AsyncInferQueue, PartialShape

filename = "../001-hello-world/data/coco.jpg"
test_image = cv2.imread(filename) 
test_image = np.expand_dims(test_image, 0) / 255
_, h, w, _ = test_image.shape

# Adjust model input shape to improve the performance
model.reshape({input_layer.any_name: PartialShape([1, 3, 224, 224])})
ppp = PrePostProcessor(model)
# Set input tensor information:
# - input() provides information about a single model input
# - layout of data is 'NHWC'
# - set static spatial dimensions to input tensor to resize from
ppp.input().tensor() \
    .set_spatial_static_shape(h, w) \
    .set_layout(Layout('NHWC')) 
inputs = model.inputs
# Here we assume the model has 'NCHW' layout for input
ppp.input().model().set_layout(Layout('NCHW'))
# Do prepocessing:
# - apply linear resize from tensor spatial dims to model spatial dims
# - Subtract mean from each channel
# - Divide each pixel data to appropriate scale value
ppp.input().preprocess() \
    .resize(ResizeAlgorithm.RESIZE_LINEAR, 224, 224) \
    .mean([0.485, 0.456, 0.406]) \
    .scale([0.229, 0.224, 0.225])
# Set output tensor information:
# - precision of tensor is supposed to be 'f32'
ppp.output().tensor().set_element_type(Type.f32)
# Apply preprocessing to modify the original 'model'
model = ppp.build()

## Run Inference
Use “AUTO” as the device name to delegate device selection to OpenVINO. The Auto device plugin internally recognizes and selects devices from among Intel CPU and GPU depending on the device capabilities and the characteristics of the model(s) (for example, precision). Then it assigns inference requests to the best device.
AUTO starts inference immediately on the CPU and then transparently shifts to the GPU (or VPU) once it is ready, dramatically reducing time to first inference.

In [None]:
import time
from IPython.display import Image
import json

# Check the available devices in your system
devices = ie.available_devices
for device in devices:
    device_name = ie.get_property(device_name=device, name="FULL_DEVICE_NAME")
    print(f"{device}: {device_name}")

# Load model to a device selected by AUTO from the available devices list
compiled_model = ie.compile_model(model=model, device_name="AUTO")
# Create infer request queue
infer_queue = AsyncInferQueue(compiled_model)
infer_queue.set_callback(callback)
start = time.time()
# Do inference
infer_queue.start_async({input_layer.any_name: test_image}, 0)
infer_queue.wait_all()
Image(filename=filename) 

## Performance Hints: Latency and Throughput
Throughput and latency are some of the most critical factors that influence the overall performance of an application.
<img width="70%" src="https://raw.githubusercontent.com/OpenVINO-dev-contest/models/main/images/Latency%20VS%20Throughput.png">

- **Latency** measures inference time (ms) required to process a single input or First inference.
- To calculate **throughput**, divide number of inputs that were processed by the processing time.

The OpenVINO performance hints are the new way to configure the performance with the portability in mind. Performance Hints will let the device to configure itself, rather than map the application needs to the low-level performance settings, and keep an associated application logic to configure each possible device separately. 

High-level Performance Hints: https://docs.openvino.ai/latest/openvino_docs_OV_UG_Performance_Hints.html



<br/>
 
**Run Inference with "LATENCY" Performance Hint**

It is possible to define application-specific performance settings with a config key, letting the device adjust to achieve better **"LATENCY"** oriented performance.

In [None]:
# AUTO sets device config based on hints
compiled_model = ie.compile_model(model=model, device_name="AUTO",config={"PERFORMANCE_HINT": "LATENCY"})
infer_queue = AsyncInferQueue(compiled_model)
# implement AsyncInferQueue Python API to boost the performance in Async mode
infer_queue.set_callback(callback)
start = time.time()
# run infernce for 100 times to get the average FPS
for i in range(100):
    infer_queue.start_async({input_layer.any_name: test_image}, i)
infer_queue.wait_all()
end = time.time()
# Calculate the average FPS
fps = 100 / (end - start)
print("fps:", fps)

 <br/>
 
**Run Inference with "TRHOUGHPUT" Performance Hint**

It is possible to define application-specific performance settings with a config key, letting the device adjust to achieve better **"THROUGHPUT"** performance.

In [None]:
# AUTO sets device config based on hints
compiled_model = ie.compile_model(model=model, device_name="AUTO",config={"PERFORMANCE_HINT": "THROUGHPUT"})
infer_queue = AsyncInferQueue(compiled_model)
infer_queue.set_callback(callback)
start = time.time()
for i in range(100):
    infer_queue.start_async({input_layer.any_name: test_image}, i)
infer_queue.wait_all()
end = time.time()
# Calculate the average FPS
fps = 100 / (end - start)
print("fps:", fps)