# Accelerating and scaling inference with ONNX in CPU
## 01 - Getting started
#### By Ramon Lins
------------------

**Table of contents**
* [Introduction](#introduction)
* [Setup](#setup)
* [Tutorial](#tutorial)
* [Visualization](#zetane)
* [Optional](#option)

Reference:

- Tutorial
    > https://pytorch.org/tutorials/advanced/super_resolution_with_onnxruntime.html
    
    > https://pytorch.org/docs/master/onnx.html

- Setup
    > https://onnxruntime.ai/

- ONNX
    > https://onnxruntime.ai/docs/tutorials/export-pytorch-model.html
    
    > https://pytorch.org/docs/master/onnx.html

- ONNXRuntime
    > https://onnxruntime.ai/docs/tutorials/
    
    > https://github.com/microsoft/onnxruntime
    
    > https://onnxruntime.ai/docs/tutorials/accelerate-pytorch/pytorch.html

- Visualization
    > https://github.com/onnx/tutorials/blob/main/tutorials/VisualizingAModel.md

- Comparison
    > https://github.com/onnx/tutorials/blob/main/tutorials/CorrectnessVerificationAndPerformanceComparison.ipynb

- Optional
    > https://github.com/onnx/onnx-docker/blob/master/onnx-ecosystem/converter_scripts/float32_float16_onnx.ipynb
    
    > https://github.com/onnx/onnx-docker

<a id="introduction"></a>
### Introduction

<a id="introduction">

ONNX is an open source project designed to accelerate machine learning across a wide variety of 
frameworks, operating systems, and hardware platforms.

The main objective of this task is to use the ONNX engine to optimize the patch-based density model,
a vgg-16 customized network, to reducing latency.

<a id="setup"></a>
### Setup

Create a environment.yml with:
```
name: base
channels:
  - pytorch-lts
dependencies:
  - python=3.7.*
  - pytorch=1.8.2
  - torchvision=0.9.2
  - cudatoolkit=10.2
  - pip
  - pip:
      - onnx==1.12.0
      - onnxruntime==1.11.1
```
next run the command:
```bash
```


In [70]:
#!conda env create

if an environment already exist, install onnx direct:

In [71]:
%pip install onnx==1.12.0 onnxruntime==1.11.1

Note: you may need to restart the kernel to use updated packages.


<a id="tutorial"></a>
### Tutorial


In [72]:
import io
import numpy as np

import torch.nn.init as init
import torch.utils.model_zoo as model_zoo
import torch.onnx

from torch import nn

In [73]:
class SuperResolutionNet(nn.Module):
    def __init__(self, upscale_factor: int):
        """Super Resolution Network for increasing the resolutiono of images

        Args:
            upscale_factor (int): The factor by which the image resolution is increased
        """
        super(SuperResolutionNet, self).__init__()
        batch_size = 1 # Batch size
        num_filters = 64 # number of filters
        kernel_size_in = 5 # 5x5 kernel for input convolution
        kernel_size_hl = 3 # 3x3 kernel for hidden layer convolution
        stride = 1 # stride of the convolution
        padding_in = 2 # padding for input convolution
        padding_hl = 1 # padding for hidden layers

        self.relu = nn.ReLU()
        self.conv1 = nn.Conv2d(batch_size, num_filters, kernel_size_in, stride, padding_in)
        self.conv2 = nn.Conv2d(num_filters, num_filters, kernel_size_hl, stride, padding_hl)
        self.conv3 = nn.Conv2d(num_filters, num_filters//2, kernel_size_hl, stride, padding_hl)
        self.conv4 = nn.Conv2d(num_filters//2, upscale_factor**2, kernel_size_hl, stride, padding_hl)
        self.pixel_shuffle = nn.PixelShuffle(upscale_factor)

        self._initialize_weights()

    def forward(self, x):
        """forward operation

        Args:
            x (tensor): input image of shape (batch_size, 1, H, W)

        Returns:
            tensor: output image of shape (batch_size, 1, H*upscale_factor, W*upscale_factor)
        """
        x = self.relu(self.conv1(x))
        x = self.relu(self.conv2(x))
        x = self.relu(self.conv3(x))
        
        return self.pixel_shuffle(self.conv4(x))
    
    def _initialize_weights(self):
        #initialize weights for the network using orthogonal initialization
        init.orthogonal_(self.conv1.weight, init.calculate_gain('relu'))
        init.orthogonal_(self.conv2.weight, init.calculate_gain('relu'))
        init.orthogonal_(self.conv3.weight, init.calculate_gain('relu'))
        init.orthogonal_(self.conv4.weight)


model = SuperResolutionNet(upscale_factor=3)

Export onnx model

In [74]:
# Load pretrained model weights
model_url = 'https://s3.amazonaws.com/pytorch/test_data/export/superres_epoch100-44c6958e.pth'
batch_size = 1

# pretrained model weights
model.load_state_dict(model_zoo.load_url(model_url))

# evaluation mode
model.eval()

# input image to test onnx model
torch_input = torch.randn(batch_size, 1, 224, 224, requires_grad=True)
torch_output = model(torch_input)

# export the model to onnx
torch.onnx.export(
    model, # model being run
    torch_input, # model input (or a tuple for multiple inputs)
    "superres.onnx", # where to save the model (can be a file or file-like object)
    export_params=True, # store the trained parameter weights inside the model file
    opset_version=12, # the ONNX version to export the model to
    do_constant_folding=True, # whether to execute constant folding
    input_names=['input'], # the model's input names
    output_names=['output'], # the model's output names
    dynamic_axes={'input': {0: 'batch_size'}, 'output': {0: 'batch_size'}}) # variable length axes

Before verifying the model’s output with ONNX Runtime, we will check the ONNX model with ONNX’s API. First, onnx.load("super_resolution.onnx") will load the saved model and will output a onnx.ModelProto structure (a top-level file/container format for bundling a ML model. For more information onnx.proto documentation.). Then, onnx.checker.check_model(onnx_model) will verify the model’s structure and confirm that the model has a valid schema. The validity of the ONNX graph is verified by checking the model’s version, the graph’s structure, as well as the nodes and their inputs and outputs.

In [75]:
import onnx

onnx_model = onnx.load("superres.onnx")
onnx.checker.check_model(onnx_model)

Compute output using onnx runtime.

To run the model, it is necessary create an inference session. Once it is created, the model is evaluated using `run()` api.

In [76]:
import onnxruntime

session = onnxruntime.InferenceSession("superres.onnx")

def to_numpy(tensor):
    return tensor.detach().cpu().numpy() if tensor.requires_grad else tensor.cpu().numpy()

# inference
onnx_inputs = {session.get_inputs()[0].name: to_numpy(torch_input)}
onnx_output = session.run(None, onnx_inputs)

# compare onnx X with pytorch results
if np.testing.assert_allclose(to_numpy(torch_output), onnx_output[0], rtol=1e-03, atol=1e-05):
    print("ONNX and PyTorch results match!")
else:
    print("ONNX and PyTorch results do not match!")


ONNX and PyTorch results do not match!


Test using images

In [77]:
from PIL import Image
import torchvision.transforms as transforms

img = Image.open("/home/ramon/Git/utils/img/cat.jpg")

resize= transforms.Resize([224, 224])
img_rs = resize(img)

img_ycbcr = img_rs.convert('YCbCr')
img_y, img_cb, img_cr = img_ycbcr.split()

to_tensor = transforms.ToTensor()
img_y = to_tensor(img_y)
img_y.unsqueeze_(0)


tensor([[[[0.2157, 0.1961, 0.1922,  ..., 0.5294, 0.5569, 0.5725],
          [0.2039, 0.1922, 0.1922,  ..., 0.5333, 0.5529, 0.5686],
          [0.2000, 0.1843, 0.1843,  ..., 0.5216, 0.5373, 0.5490],
          ...,
          [0.6667, 0.6745, 0.6392,  ..., 0.6902, 0.6667, 0.6078],
          [0.6392, 0.6431, 0.6235,  ..., 0.8000, 0.7608, 0.6745],
          [0.6392, 0.6353, 0.6510,  ..., 0.8118, 0.7686, 0.6667]]]])

In [78]:
# inference
onnx_inputs = {session.get_inputs()[0].name: to_numpy(img_y)}
onnx_output = session.run(None, onnx_inputs)
img_out_y = onnx_output[0]


In [79]:
img_out_y = Image.fromarray(np.uint8((img_out_y[0] * 255.0).clip(0, 255)[0]), mode='L')

# get the output image follow post-processing step from PyTorch implementation
final_img = Image.merge(
    "YCbCr", [
        img_out_y,
        img_cb.resize(img_out_y.size, Image.BICUBIC),
        img_cr.resize(img_out_y.size, Image.BICUBIC),
    ]).convert("RGB")

# Save the image, we will compare this with the output image from mobile device
final_img.save("./out/cat.jpg")


Comparison of time running

In [80]:
# inference torch
torch_input = torch.randn(batch_size, 1, 224, 224, requires_grad=True)
torch_output = model(torch_input)

In [81]:
# inference
onnx_inputs = {session.get_inputs()[0].name: to_numpy(torch_input)}
onnx_output = session.run(None, onnx_inputs)