# Convert and Optimize Retinaface with OpenVINO™

RetinaFace is a widespread deep learning-based face detection and recognition model. It was created by scientists from the Beijing University of Posts and Telecommunications, and it is frequently used for jobs involving face detection and recognition.

The popular open-source PyTorch deep learning framework is used to implement RetinaFace. This framework is used to create and train neural networks. RetinaFace's PyTorch implementation comes with pre-trained models that may be customised for particular face identification and detection tasks.

RetinaFace is a multi-task model that can identify faces, locate landmarks, and estimate the existence of facial features like gender, age, and expressions all at once. The model exhibits cutting-edge accuracy using well-known benchmark datasets such as WIDER FACE and FDDB. More details about its realization can be found in original model [repository](https://github.com/biubug6/Pytorch_Retinaface)

This tutorial demonstrates step-by-step instructions on how to run and optimize PyTorch_Retinaface with OpenVINO

The tutorial consists of the following steps:
- Cloning the original PyTorch Retinaface repository
- Installing the necessary libraries
- Load the Pytorch Retinaface model
- Validate the original model
- Convert the PyTorch model to ONNX 
- Convert ONNX model to OpenVINO IR
- Validate the converted model



# Cloning the original PyTorch Retinaface repository

In [None]:
!git clone https://github.com/biubug6/Pytorch_Retinaface.git

The git clone command is a Git command used to create a copy of a repository on a local machine. The repository is identified by its URL, which is provided as an argument to the command. In this case, the URL is https://github.com/biubug6/Pytorch_Retinaface.git, which points to the PyTorch Retinaface repository on GitHub.

After executing the command, the repository is downloaded into the current working directory of the notebook, which can then be used to load the PyTorch Retinaface model and its weights.

# Installing the necessary libraries

In [None]:
import torch
import onnx
import onnxruntime
from models.retinaface import RetinaFace
from utils.onnx_export import export_onnx_model

`import torch:` imports PyTorch, a popular deep learning library used for building and training deep neural networks.

`import onnx:` imports ONNX, a open-source format for representing deep learning models. It allows interoperability between different deep learning frameworks.

`import onnxruntime:`imports ONNXRuntime, a runtime engine for ONNX models. It allows to run the inference on different platforms.

`from models.retinaface import RetinaFace:` imports the RetinaFace class from the models.retinaface module. RetinaFace is a deep learning model architecture used for object detection in images.

`from utils.onnx_export import export_onnx_model:` imports the export_onnx_model function from the utils.onnx_export module. This function is used to convert a PyTorch model to an ONNX format.

After importing necessary modules, the code can be used to perform the following tasks:
- Convert a PyTorch model to an ONNX format using export_onnx_model function.
- Load an ONNX model using onnxruntime.InferenceSession.
- Run inference on an input image using the loaded ONNX model.

# Load the Pytorch Retinaface model

In [None]:
model = RetinaFace(cfg='model_cfg/retinaface_resnet50.yaml', phase='test')
model.load_state_dict(torch.load('weights/Resnet50_Final.pth', map_location=torch.device('cpu')))

`model = RetinaFace(cfg='model_cfg/retinaface_resnet50.yaml', phase='test'):` creates an instance of the RetinaFace class, which is defined in the models.retinaface module. The RetinaFace class is a deep learning model architecture used for object detection in images. The cfg argument is used to provide the path to the model configuration file in YAML format, which contains the hyperparameters used to train the model. The phase argument specifies the mode in which the model will be used - in this case, it is set to 'test'.

`model.load_state_dict(torch.load('weights/Resnet50_Final.pth', map_location=torch.device('cpu'))):`loads the saved PyTorch checkpoint from the weights/Resnet50_Final.pth file into the model. The load_state_dict method of the model object is used to load the weights. The torch.load function is used to load the checkpoint into memory. The map_location argument is used to specify the device where the model will be loaded. In this case, the device is set to the CPU.

The RetinaFace model is initialized with the hyperparameters specified in the YAML file, and the weights are loaded from the saved checkpoint. After executing these commands, the model object is ready to be used for object detection on input images

# Validate the original model

In [None]:
# Define the input image
image = Image.open("test_image.jpg")

# Preprocess the image
transform = transforms.Compose([
    transforms.Resize((640, 640)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
])
input_image = transform(image).unsqueeze(0)

# Run the model
output = model(input_image)

# Visualize the results
plt.imshow(image)
for detection in output:
    box = detection[:4]
    score = detection[4]
    landmarks = detection[5:]
    plt.gca().add_patch(plt.Rectangle((box[0], box[1]), box[2]-box[0], box[3]-box[1], fill=False, edgecolor='r', linewidth=2))
plt.show()

`image = Image.open("test_image.jpg"):` load the input image from the file system.

`transform = transforms.Compose([...]):` define a sequence of image transformations to be applied to the input image before it can be fed into the model. The transforms.Resize method resizes the image to a size of 640x640 pixels. The transforms.ToTensor method converts the image to a PyTorch tensor. The transforms.Normalize method normalizes the pixel values of the image to have a mean of 0.5 and a standard deviation of 0.5.

`input_image = transform(image).unsqueeze(0):` apply the image transformations to the input image and add an extra dimension to the tensor to represent a batch of size 1. The input tensor is now of shape (1, 3, 640, 640) where 1 represents the batch size, 3 represents the number of color channels, and 640 x 640 is the size of the input image.

`output = model(input_image):` feed the preprocessed input image tensor into the model object to obtain the output predictions. The output tensor has a shape of (1, N, 15), where N is the number of objects detected in the input image and 15 represents the number of values in each detection, including the bounding box coordinates, object confidence score, and facial landmarks.

`plt.imshow(image):` display the input image using Matplotlib's imshow function.

`for detection in output: [...]:` iterate over the output detections and extract the bounding box coordinates, object confidence score, and facial landmarks. The plt.gca().add_patch(plt.Rectangle(...)) method is used to draw a rectangle around each detected object in the input image with the bounding box coordinates.

Finally, the `plt.show()` method is called to display the input image with the detected objects.

# Convert the PyTorch model to ONNX

In [None]:
from onnx import optimizer
from onnxruntime.quantization import quantize_dynamic

# Export the PyTorch model to ONNX format
temp_input = torch.randn(1, 3, 640, 640)
input_names = ["input"]
output_names = ["boxes", "scores", "landmarks"]
onnx_model_path = "retinaface.onnx"
torch.onnx.export(model, temp_input, onnx_model_path, verbose=False, input_names=input_names, output_names=output_names)

# Optimize the ONNX model
onnx_model = onnx.load(onnx_model_path)
passes = ["extract_constant_to_initializer", "eliminate_unused_initializer"]
optimized_model = optimizer.optimize(onnx_model, passes)
onnx.save(optimized_model, onnx_model_path)

# Quantize the ONNX model
quantized_model = quantize_dynamic(onnx_model_path, optimization_level=3)
onnx.save(quantized_model, onnx_model_path)

**Exporting the PyTorch model to ONNX format:** The PyTorch model, which has been loaded in memory, is exported to the ONNX format. This is done using the torch.onnx.export function. The function takes in the PyTorch model, a temporary input tensor (used for tracing the model), the path where the ONNX model will be saved, input and output names, and other optional parameters. Once the ONNX model is generated, it is saved to the specified path.

**Optimizing the ONNX model:** The generated ONNX model may have redundancies and inefficiencies. To optimize the model, the optimizer.optimize function is used. This function takes in the ONNX model and a list of optimization passes to apply. In this case, the two passes used are `extract_constant_to_initializer` and `eliminate_unused_initializer`. Once the model is optimized, it is saved to the same path as the original ONNX model.

**Quantizing the ONNX model:** Quantization is a technique used to reduce the size of the model and improve its inference speed. In this code block, dynamic quantization is applied to the ONNX model using the quantize_dynamic function from the ONNX Runtime. The function takes in the path to the ONNX model and an optimization level (1 to 3). The quantized model is then saved to the same path as the optimized ONNX model.

# Verify the converted ONNX model

In [None]:
ort_session = onnxruntime.InferenceSession(onnx_model_path)
ort_inputs = {ort_session.get_inputs()[0].name: temp_input.numpy()}
ort_outputs = ort_session.run(None, ort_inputs)
boxes, scores, landmarks = ort_outputs

print(f"Boxes shape: {boxes.shape}")
print(f"Scores shape: {scores.shape}")
print(f"Landmarks shape: {landmarks.shape}")


The above code is used to perform inference on the ONNX model using the ONNX Runtime (ORT) package.

First, an `InferenceSession` object is created using the ONNX model path. This object is used to run the inference on the model. Next, the input tensor is passed to the model using a dictionary with the input name as the key and the input tensor as the value. Here, the `temp_input` tensor is converted to a numpy array using the `.numpy()` method and passed as the value of the input tensor dictionary.

The `run` method of the `InferenceSession` object is then called with the input tensor dictionary as the input and `None` as the output. This will return a list of output tensors in the order they were specified when the model was exported. In this case, the output tensors are `boxes`, `scores`, and `landmarks`. These tensors are then printed using their respective `.shape` attribute to check the output dimensions.

Overall, the above code performs inference on the RetinaFace model using the ONNX Runtime and prints the shapes of the output tensors.

That's it! You should now have the PyTorch Retinaface model converted to ONNX format and verified using the ONNX Runtime.

# Convert ONNX model to OpenVINO IR

In [None]:
#Install the OpenVINO toolkit
!wget https://apt.repos.intel.com/openvino/2021/GPG-PUB-KEY-INTEL-OPENVINO-2021 && sudo apt-key add GPG-PUB-KEY-INTEL-OPENVINO-2021
!sudo add-apt-repository "deb https://apt.repos.intel.com/openvino/2021 all main"
!sudo apt update && sudo apt install -y intel-openvino-dev-ubuntu18-2021.4.582


#Convert the ONNX model to an OpenVINO IR model:
import os
from openvino.inference_engine import IECore

onnx_model_path = "retinaface.onnx"
ir_model_path = "retinaface.xml"

ie = IECore()
net = ie.read_network(onnx_model_path)
exec_net = ie.load_network(network=net, device_name="CPU", num_requests=1)
exec_net.export(ir_model_path)


The first three lines download the OpenVINO GPG public key, add the OpenVINO repository to the system's software sources, and update the system's package index. Then the OpenVINO toolkit is installed with the specified version.

Next, the `IECore` class from the OpenVINO Python API is imported. The `onnx_model_path` variable is set to the path of the ONNX model file created earlier. The `ir_model_path` variable is set to the desired path for the output IR model file.

An instance of `IECore` is created, and the `read_network` method is used to load the ONNX model into an OpenVINO `IENetwork` object. The `load_network` method is called on the `IECore` instance to create an executable network from the `IENetwork` object. The `export` method is then called on the `exec_net` object to save the IR model to the specified path. The `device_name` argument is set to `CPU` to specify that the IR model is intended to run on a CPU device.

# Validate the converted model

To validate the converted OpenVINO model, we can use the OpenVINO Inference Engine (IE) to perform inference on the model and compare the output to the original PyTorch model output.

In [None]:
import numpy as np
from openvino.inference_engine import IECore

# Load the OpenVINO IR model
ie = IECore()
net = ie.read_network(model='retinaface.xml', weights='retinaface.bin')

# Load the model to the device
exec_net = ie.load_network(network=net, device_name='CPU')

# Define the input image
image = Image.open("test_image.jpg")

# Preprocess the image
transform = transforms.Compose([
    transforms.Resize((640, 640)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
])
input_image = transform(image).unsqueeze(0).numpy()

# Run inference on the OpenVINO model
results = exec_net.infer(inputs={'input': input_image})

# Get the output tensors
boxes = results['boxes']
scores = results['scores']
landmarks = results['landmarks']

# Print the output shapes
print(f"Boxes shape: {boxes.shape}")
print(f"Scores shape: {scores.shape}")
print(f"Landmarks shape: {landmarks.shape}")

# Load the original PyTorch model
model = RetinaFace(cfg='model_cfg/retinaface_resnet50.yaml', phase='test')
model.load_state_dict(torch.load('weights/Resnet50_Final.pth', map_location=torch.device('cpu')))

# Convert the input to a PyTorch tensor
input_tensor = torch.from_numpy(input_image).float()

# Run inference on the original PyTorch model
output = model(input_tensor)

# Compare the output tensors
np.testing.assert_allclose(boxes, output[0].detach().numpy(), rtol=1e-03, atol=1e-05)
np.testing.assert_allclose(scores, output[1].detach().numpy(), rtol=1e-03, atol=1e-05)
np.testing.assert_allclose(landmarks, output[2].detach().numpy(), rtol=1e-03, atol=1e-05)


Here, we first load the OpenVINO IR model and load it onto the device using the Inference Engine. Then, we define the input image and preprocess it in the same way as before. We then run inference on the OpenVINO model using the `infer` method of the `ExecutableNetwork` object, passing in the input tensor dictionary.

Next, we retrieve the output tensors from the results and print their shapes. Finally, we load the original PyTorch model and run inference on the input tensor. We compare the output tensors from the two models using the `np.testing.assert_allclose` method, which will raise an error if the tensors differ by more than a certain tolerance.

If the output tensors from the OpenVINO model and the original PyTorch model are similar, then we can conclude that the conversion process was successful and the OpenVINO model is a valid replacement for the original PyTorch model.