Copyright (c) Microsoft Corporation. All rights reserved.  
Licensed under the MIT License.

# Mobilenet v2 Quantization with ONNX Runtime on CPU

In this tutorial, we will load a mobilenet v2 model pretrained with [PyTorch](https://pytorch.org/), export the model to ONNX, quantize then run with ONNXRuntime, and convert the ONNX models to ORT format for ONNXRuntime Mobile.

## 0. Prerequisites ##

If you have Jupyter Notebook, you can run this notebook directly with it. You may need to install or upgrade [PyTorch](https://pytorch.org/), [OnnxRuntime](https://microsoft.github.io/onnxruntime/), and other required packages.

Otherwise, you can setup a new environment. First, install [Anaconda](https://www.anaconda.com/distribution/). Then open an AnaConda prompt window and run the following commands:

```console
conda create -n cpu_env python=3.8
conda activate cpu_env
conda install jupyter
jupyter notebook
```
The last command will launch Jupyter Notebook and we can open this notebook in browser to continue.

### 0.1 Install packages
Let's install the necessary packages to start the tutorial. We will install PyTorch 1.8, OnnxRuntime 1.8, latest ONNX and pillow.

In [1]:
# Install or upgrade PyTorch 1.8.0 and OnnxRuntime 1.8 for CPU-only.
import sys
!{sys.executable} -m pip install --upgrade torch==1.8.0 torchvision==0.9.0 torchaudio===0.8.0 -f https://download.pytorch.org/whl/torch_stable.html
!{sys.executable} -m pip install --upgrade onnxruntime==1.8.0
!{sys.executable} -m pip install --upgrade onnx
!{sys.executable} -m pip install --upgrade pillow

Looking in links: https://download.pytorch.org/whl/torch_stable.html


# 1 Download pretrained model and export to ONNX

In this step, we load a pretrained mobilenet v2 model, and export it to ONNX.

### 1.1 Load the pretrained model
Use torchvision provides API to load mobilenet_v2 model.

In [1]:
from torchvision import models, datasets, transforms as T
# mobilenet_v2 = models.mobilenet_v2(pretrained=True)

# import patching model rather than mobilenet_v2
#    - do I need to simplify it?
#        * all I need is model._run_step(x) which outputs a tuple contianing mu and sigma
#        * I am pretty sure it is the customVAE module that needs to be changed
#        * Will I need to create an entire new model module?
#    - if so, how do I do this?
#    - is there a way to load in the weights for the model and change the architecture
#             to only call _run_step(x) and output that?
#    - also, we only need the weights for the decoder. I am assuming if we could shrink the
#             model somehow, we would reduce the size of it on mobile too.


from vae5 import *    # I need to locally import this file and its dependencies

local_checkpoint_path = ""

patch_model = customVAE.load_from_checkpoint(checkpoint_path=local_checkpoint_path)

### 1.2 Export the model to ONNX
Pytorch onnx export API to export the model.

In [2]:
import torch
image_height = 224
image_width = 224
x = torch.randn(1, 3, image_height, image_width, requires_grad=True) # This should be used a reference for the raw input to the model (preprocessed)
# That being said, how much of the preprocessing needs to be changed
# I am assuming that we will be using a batch size of 1
# as of now, our dataloader takes in file names and outputs an array that is used to point to individual images
# I am assuming that we will have to make this shape be the same as the shape of the output of __getitem__ in 
#    the dataloader which is a result of _img_to_tensor() of a 64x64 sized image patch
# Is the dataloader.__getitem__ used as the input for each item in a batch?





torch_out = mobilenet_v2(x)

# Export the model
torch.onnx.export(mobilenet_v2,              # model being run, we need to get the real one
                  x,                         # model input (or a tuple for multiple inputs)
                  "mobilenet_v2_float.onnx", # where to save the model (can be a file or file-like object)
                  export_params=True,        # store the trained parameter weights inside the model file
                  opset_version=12,          # the ONNX version to export the model to
                  do_constant_folding=True,  # whether to execute constant folding for optimization
                  input_names = ['input'],   # the model's input names
                  output_names = ['output']) # the model's output names


[W NNPACK.cpp:51] Could not initialize NNPACK! Reason: Unsupported hardware.


### 1.3 Sample Execution with ONNXRuntime

Run an sample with the full precision ONNX model. Firstly, implement the preprocess.

In [3]:
from PIL import Image
import numpy as np
import onnxruntime
import torch

# I will likely need to perform the same transformations on each patch that are done in inference.py
# these transforms are done in the dataloader inside of the _img_to_tensor() function
# What exactly do these transformations do? What do I need to replicate on mobile?
# What exactly does transforms.Compose do?
#     - after looking at documentation, it looks like it is used to make transformations on images before
#             feeding them into the model

def preprocess_image(image_path, height, width, channels=3):
    image = Image.open(image_path)
    image = image.resize((width, height), Image.ANTIALIAS)
    image_data = np.asarray(image).astype(np.float32)
    image_data = image_data.transpose([2, 0, 1]) # transpose to CHW
    mean = np.array([0.079, 0.05, 0]) + 0.406
    std = np.array([0.005, 0, 0.001]) + 0.224
    for channel in range(image_data.shape[0]):
        image_data[channel, :, :] = (image_data[channel, :, :] / 255 - mean[channel]) / std[channel]
    image_data = np.expand_dims(image_data, 0)
    return image_data

2022-10-24 13:00:06.671941 [E:onnxruntime:Default, provider_bridge_ort.cc:902 Ensure] Failed to load library libonnxruntime_providers_shared.dylib with error: dlopen(libonnxruntime_providers_shared.dylib, 0x000A): tried: '/Users/paulkoomey/opt/anaconda3/lib/python3.9/site-packages/onnxruntime/capi/libonnxruntime_providers_shared.dylib' (no such file), '/Users/paulkoomey/opt/anaconda3/lib/libonnxruntime_providers_shared.dylib' (no such file), '/Users/paulkoomey/opt/anaconda3/bin/../lib/libonnxruntime_providers_shared.dylib' (no such file), 'libonnxruntime_providers_shared.dylib' (no such file), '/usr/local/lib/libonnxruntime_providers_shared.dylib' (no such file), '/usr/lib/libonnxruntime_providers_shared.dylib' (no such file), '/Users/paulkoomey/Desktop/Life/UTA/2022 Fall/CSE 4391/onyx-model/onnxruntime-inference-examples-modified/quantization/notebooks/imagenet_v2/libonnxruntime_providers_shared.dylib' (no such file)
2022-10-24 13:00:06.671965 [W:onnxruntime:Default, onnxruntime_pybin

#### Download the imagenet labels and load it

In [4]:
# Download ImageNet labels
!curl -o imagenet_classes.txt https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt

# Read the categories
with open("imagenet_classes.txt", "r") as f:
    categories = [s.strip() for s in f.readlines()]

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 10472  100 10472    0     0  58834      0 --:--:-- --:--:-- --:--:-- 59163


#### Run the example with ONNXRuntime

In [5]:
session_fp32 = onnxruntime.InferenceSession("mobilenet_v2_float.onnx")

def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()

def run_sample(session, image_file, categories):
    output = session.run([], {'input':preprocess_image(image_file, image_height, image_width)})[0]
    output = output.flatten()
    output = softmax(output) # this is optional
    top5_catid = np.argsort(-output)[:5]
    for catid in top5_catid:
        print(categories[catid], output[catid])

run_sample(session_fp32, 'cat.jpg', categories)

tabby 0.70533055
Egyptian cat 0.15252899
tiger cat 0.12229194
lynx 0.0043019513
plastic bag 0.0034168828


  image = image.resize((width, height), Image.ANTIALIAS)


# 2 Quantize the model with ONNXRuntime 
In this step, we load the full precison model, and quantize it with ONNXRuntime quantization tool. And show the model size comparison between full precision and quantized model. Finally, we run the same sample with the quantized model

## 2.1 Implement a CalibrationDataReader
CalibrationDataReader takes in calibration data and generates input for the model

In [6]:
from onnxruntime.quantization import quantize_static, CalibrationDataReader, QuantType
import os

def preprocess_func(images_folder, height, width, size_limit=0):
    image_names = os.listdir(images_folder)
    if size_limit > 0 and len(image_names) >= size_limit:
        batch_filenames = [image_names[i] for i in range(size_limit)]
    else:
        batch_filenames = image_names
    unconcatenated_batch_data = []

    for image_name in batch_filenames:
        image_filepath = images_folder + '/' + image_name
        image_data = preprocess_image(image_filepath, height, width)
        unconcatenated_batch_data.append(image_data)
    batch_data = np.concatenate(np.expand_dims(unconcatenated_batch_data, axis=0), axis=0)
    return batch_data


class MobilenetDataReader(CalibrationDataReader):
    def __init__(self, calibration_image_folder):
        self.image_folder = calibration_image_folder
        self.preprocess_flag = True
        self.enum_data_dicts = []
        self.datasize = 0

    def get_next(self):
        if self.preprocess_flag:
            self.preprocess_flag = False
            nhwc_data_list = preprocess_func(self.image_folder, image_height, image_width, size_limit=0)
            self.datasize = len(nhwc_data_list)
            self.enum_data_dicts = iter([{'input': nhwc_data} for nhwc_data in nhwc_data_list])
        return next(self.enum_data_dicts, None)

## 2.2 Quantize the model

As we can not upload full calibration data set for copy right issue, we only demonstrate with some example images. You need to use your own calibration data set in practice.

In [7]:
# change it to your real calibration data set
calibration_data_folder = "calibration_imagenet"
dr = MobilenetDataReader(calibration_data_folder)

quantize_static('mobilenet_v2_float.onnx',
                'mobilenet_v2_uint8.onnx',
                dr)

print('ONNX full precision model size (MB):', os.path.getsize("mobilenet_v2_float.onnx")/(1024*1024))
print('ONNX quantized model size (MB):', os.path.getsize("mobilenet_v2_uint8.onnx")/(1024*1024))

  image = image.resize((width, height), Image.ANTIALIAS)


ONNX full precision model size (MB): 13.324143409729004
ONNX quantized model size (MB): 3.417329788208008


## 2.3 Run the model with OnnxRuntime

In [8]:
session_quant = onnxruntime.InferenceSession("mobilenet_v2_uint8.onnx")
run_sample(session_quant, 'cat.jpg', categories)

tabby 0.84019744
tiger cat 0.09355527
Egyptian cat 0.056807544
lynx 0.0031460503
quilt 0.0012816631


2022-10-24 13:00:21.117990 [W:onnxruntime:, graph.cc:3211 CleanUnusedInitializers] Removing initializer 'onnx::Conv_689_quantized_zero_point'. It is not used by any node and should be removed from the model.
2022-10-24 13:00:21.118083 [W:onnxruntime:, graph.cc:3211 CleanUnusedInitializers] Removing initializer 'onnx::Conv_686_quantized_zero_point'. It is not used by any node and should be removed from the model.
2022-10-24 13:00:21.118103 [W:onnxruntime:, graph.cc:3211 CleanUnusedInitializers] Removing initializer 'onnx::Conv_686_quantized_scale'. It is not used by any node and should be removed from the model.
2022-10-24 13:00:21.118132 [W:onnxruntime:, graph.cc:3211 CleanUnusedInitializers] Removing initializer 'onnx::Conv_683_quantized_zero_point'. It is not used by any node and should be removed from the model.
  image = image.resize((width, height), Image.ANTIALIAS)
ld be removed from the model.
2022-10-24 13:00:21.118168 [W:onnxruntime:, graph.cc:3211 CleanUnusedInitializers] Rem

# 3 Convert the models to ORT format

This step is optional, we will convert the `mobilenet_v2_float.onnx` and `mobilenet_v2_uint8.onnx` to ORT format, to be used in mobile applications.

If you intend to run these models using ONNXRuntime Mobile Execution Providers such as [NNAPI Execution Provider](https://www.onnxruntime.ai/docs/reference/execution-providers/NNAPI-ExecutionProvider.html) or [CoreML Execution Provider](https://www.onnxruntime.ai/docs/reference/execution-providers/CoreML-ExecutionProvider.html), please set the `optimization_level` of the conversion to `basic`. If you intend to run these models using CPU only, please set the `optimization_level` of the conversion to `all`. 

For further details, please see [Converting ONNX models to ORT format](https://www.onnxruntime.ai/docs/how-to/mobile/model-conversion.html).

In [15]:
!{sys.executable} -m onnxruntime.tools.convert_onnx_models_to_ort --optimization_level basic ./

2022-10-18 00:00:41.113540 [E:onnxruntime:Default, provider_bridge_ort.cc:902 Ensure] Failed to load library libonnxruntime_providers_shared.dylib with error: dlopen(libonnxruntime_providers_shared.dylib, 0x000A): tried: '/Users/paulkoomey/opt/anaconda3/envs/cpu_env/lib/python3.8/site-packages/onnxruntime/capi/libonnxruntime_providers_shared.dylib' (no such file), '/Users/paulkoomey/opt/anaconda3/envs/cpu_env/lib/libonnxruntime_providers_shared.dylib' (no such file), '/Users/paulkoomey/opt/anaconda3/envs/cpu_env/bin/../lib/libonnxruntime_providers_shared.dylib' (no such file), 'libonnxruntime_providers_shared.dylib' (no such file), '/usr/local/lib/libonnxruntime_providers_shared.dylib' (no such file), '/usr/lib/libonnxruntime_providers_shared.dylib' (no such file), '/Users/paulkoomey/Desktop/Life/UTA/2022 Fall/CSE 4391/onyx-model/onnxruntime-inference-examples/quantization/notebooks/imagenet_v2/libonnxruntime_providers_shared.dylib' (no such file)
2022-10-18 00:00:41.113574 [W:onnxrunt

Please find the following converted models in the same directory,
* mobilenet_v2_float.ort
* mobilenet_v2_uint8.ort

The above models are used in [ONNX Runtime Mobile image classification Android sample application](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/mobile/examples/image_classification/android).

Please note, there are temporary ONNX model files generated by the quantization process, which are converted to ORT format as well, please ignore these files.