Copyright (c) Microsoft Corporation. All rights reserved.  
Licensed under the MIT License.

# Mobilenet v2 Quantization with ONNX Runtime on CPU

In this tutorial, we will load a mobilenet v2 model pretrained with [PyTorch](https://pytorch.org/), export the model to ONNX, and then quantize and run with ONNXRuntime.

## 0. Prerequisites ##

If you have Jupyter Notebook, you can run this notebook directly with it. You may need to install or upgrade [PyTorch](https://pytorch.org/), [OnnxRuntime](https://microsoft.github.io/onnxruntime/), and other required packages.

Otherwise, you can setup a new environment. First, install [AnaConda](https://www.anaconda.com/distribution/). Then open an AnaConda prompt window and run the following commands:

```console
conda create -n cpu_env python=3.8
conda activate cpu_env
conda install jupyter
jupyter notebook
```
The last command will launch Jupyter Notebook and we can open this notebook in browser to continue.

### 0.1 Install packages
Let's install nessasary packages to start the tutorial. We will install PyTorch 1.8, OnnxRuntime 1.7, latest ONNX and pillow.

In [None]:
# Install or upgrade PyTorch 1.8.0 and OnnxRuntime 1.7 for CPU-only.
import sys
!{sys.executable} -m pip install --upgrade torch==1.8.0+cpu torchvision==0.9.0+cpu torchaudio===0.8.0 -f https://download.pytorch.org/whl/torch_stable.html
!{sys.executable} -m pip install --upgrade onnxruntime==1.7.0
!{sys.executable} -m pip install --upgrade onnx
!{sys.executable} -m pip install --upgrade pillow

# 1 Download pretrained model and export to ONNX

### 1.1 Load the pretrained model

In [None]:
from torchvision import models, datasets, transforms as T
mobilenet_v2 = models.mobilenet_v2(pretrained=True)

### 1.2 Export the model to ONNX

In this step, we import the necessary PyTorch, transformers and other necessary modules for the tutorial, and then set up the global configurations, like data & model folder, GLUE task settings, thread settings, warning settings and etc.

In [None]:
import torch
image_height = 224
image_width = 224
x = torch.randn(1, 3, image_height, image_width, requires_grad=True)
torch_out = mobilenet_v2(x)

# Export the model
torch.onnx.export(mobilenet_v2,              # model being run
                  x,                         # model input (or a tuple for multiple inputs)
                  "mobilenet_v2.onnx",       # where to save the model (can be a file or file-like object)
                  export_params=True,        # store the trained parameter weights inside the model file
                  opset_version=12,          # the ONNX version to export the model to
                  do_constant_folding=True,  # whether to execute constant folding for optimization
                  input_names = ['input'],   # the model's input names
                  output_names = ['output']) # the model's output names
                  #dynamic_axes={'input' : {0 : 'batch_size'},    # variable lenght axes
                  #              'output' : {0 : 'batch_size'}})

### 1.3 Sample Execution with ONNXRuntime

Run an sample with the full precision ONNX model

In [9]:
from PIL import Image
import numpy as np
import onnxruntime
import torch

def preprocess_image(image_path, height, width, channels=3):
    image = Image.open(image_path)
    image = image.resize((width, height), Image.ANTIALIAS)
    image_data = np.asarray(image).astype(np.float32)
    image_data = image_data.transpose([2, 0, 1]) # transpose to CHW
    mean = np.array([0.079, 0.05, 0]) + 0.406
    std = np.array([0.005, 0, 0.001]) + 0.224
    for channel in range(image_data.shape[0]):
        image_data[channel, :, :] = (image_data[channel, :, :] / 255 - mean[channel]) / std[channel]
    image_data = np.expand_dims(image_data, 0)
    return image_data

def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()

def run_sample(session, image_file, categories):
    output = session.run([], {'input':preprocess_image(image_file, image_height, image_width)})[0]
    output = output.flatten()
    output = softmax(output) # this is optional
    top5_catid = np.argsort(-output)[:5]
    for catid in top5_catid:
        print(categories[catid], output[catid])

# Download ImageNet labels
!curl -o imagenet_classes.txt https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt

# Read the categories
with open("imagenet_classes.txt", "r") as f:
    categories = [s.strip() for s in f.readlines()]
        
session_fp32 = onnxruntime.InferenceSession("mobilenet_v2.onnx")

run_sample(session_fp32, 'cat.jpg', categories)

cat.jpg
tabby 0.72574764
Egyptian cat 0.13060525
tiger cat 0.1276467
lynx 0.0035989317
tiger 0.0032379038


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 10472  100 10472    0     0  29750      0 --:--:-- --:--:-- --:--:-- 29750


# 2 Quantize the model with ONNXRuntime 
In this step, we load the full precison model, and quantize it with ONNXRuntime quantization tool. And show the model size comparison between full precision and quantized model. Finally, we run the same sample with the quantized model

In [17]:
from onnxruntime.quantization import quantize_static, CalibrationDataReader, QuantType
import os

def preprocess_func(images_folder, height, width, size_limit=0):
    image_names = os.listdir(images_folder)
    if size_limit > 0 and len(image_names) >= size_limit:
        batch_filenames = [image_names[i] for i in range(size_limit)]
    else:
        batch_filenames = image_names
    unconcatenated_batch_data = []

    for image_name in batch_filenames:
        image_filepath = images_folder + '/' + image_name
        image_data = preprocess_image(image_filepath, height, width)
        unconcatenated_batch_data.append(image_data)
    batch_data = np.concatenate(np.expand_dims(unconcatenated_batch_data, axis=0), axis=0)
    return batch_data


class MobilenetDataReader(CalibrationDataReader):
    def __init__(self, calibration_image_folder):
        self.image_folder = calibration_image_folder
        self.preprocess_flag = True
        self.enum_data_dicts = []
        self.datasize = 0

    def get_next(self):
        if self.preprocess_flag:
            self.preprocess_flag = False
            nhwc_data_list = preprocess_func(self.image_folder, image_height, image_width, size_limit=0)
            self.datasize = len(nhwc_data_list)
            self.enum_data_dicts = iter([{'input': nhwc_data} for nhwc_data in nhwc_data_list])
        return next(self.enum_data_dicts, None)

# change it to your real calibration data set
calibration_data_folder = "calibration_imagenet"
dr = MobilenetDataReader(calibration_data_folder)

quantize_static('mobilenet_v2.onnx',
                'mobilenet_v2.quant.onnx',
                dr)

print('ONNX full precision model size (MB):', os.path.getsize("mobilenet_v2.onnx")/(1024*1024))
print('ONNX quantized model size (MB):', os.path.getsize("mobilenet_v2.quant.onnx")/(1024*1024))

session_quant = onnxruntime.InferenceSession("mobilenet_v2.quant.onnx")
run_sample(session_quant, 'cat.jpg', categories)


calibration_imagenet/ILSVRC2012_val_00000066.JPEG
calibration_imagenet/ILSVRC2012_val_00000092.JPEG
calibration_imagenet/ILSVRC2012_val_00000170.JPEG
calibration_imagenet/ILSVRC2012_val_00000237.JPEG
calibration_imagenet/ILSVRC2012_val_00000303.JPEG
calibration_imagenet/ILSVRC2012_val_00000336.JPEG
calibration_imagenet/ILSVRC2012_val_00000597.JPEG
calibration_imagenet/ILSVRC2012_val_00000633.JPEG
calibration_imagenet/ILSVRC2012_val_00000634.JPEG
calibration_imagenet/ILSVRC2012_val_00000708.JPEG
calibration_imagenet/ILSVRC2012_val_00000762.JPEG
calibration_imagenet/ILSVRC2012_val_00000888.JPEG
calibration_imagenet/ILSVRC2012_val_00000917.JPEG
calibration_imagenet/ILSVRC2012_val_00000930.JPEG
calibration_imagenet/ILSVRC2012_val_00001007.JPEG
calibration_imagenet/ILSVRC2012_val_00001092.JPEG
calibration_imagenet/ILSVRC2012_val_00001137.JPEG
calibration_imagenet/ILSVRC2012_val_00001158.JPEG
calibration_imagenet/ILSVRC2012_val_00001191.JPEG
calibration_imagenet/ILSVRC2012_val_00001223.JPEG


calibration_imagenet/ILSVRC2012_val_00016621.JPEG
calibration_imagenet/ILSVRC2012_val_00016687.JPEG
calibration_imagenet/ILSVRC2012_val_00016695.JPEG
calibration_imagenet/ILSVRC2012_val_00016872.JPEG
calibration_imagenet/ILSVRC2012_val_00016924.JPEG
calibration_imagenet/ILSVRC2012_val_00016964.JPEG
calibration_imagenet/ILSVRC2012_val_00016982.JPEG
calibration_imagenet/ILSVRC2012_val_00017008.JPEG
calibration_imagenet/ILSVRC2012_val_00017172.JPEG
calibration_imagenet/ILSVRC2012_val_00017178.JPEG
calibration_imagenet/ILSVRC2012_val_00017397.JPEG
calibration_imagenet/ILSVRC2012_val_00017463.JPEG
calibration_imagenet/ILSVRC2012_val_00017592.JPEG
calibration_imagenet/ILSVRC2012_val_00017751.JPEG
calibration_imagenet/ILSVRC2012_val_00017881.JPEG
calibration_imagenet/ILSVRC2012_val_00017986.JPEG
calibration_imagenet/ILSVRC2012_val_00018016.JPEG
calibration_imagenet/ILSVRC2012_val_00018134.JPEG
calibration_imagenet/ILSVRC2012_val_00018413.JPEG
calibration_imagenet/ILSVRC2012_val_00018522.JPEG


calibration_imagenet/ILSVRC2012_val_00035859.JPEG
calibration_imagenet/ILSVRC2012_val_00035965.JPEG
calibration_imagenet/ILSVRC2012_val_00036004.JPEG
calibration_imagenet/ILSVRC2012_val_00036051.JPEG
calibration_imagenet/ILSVRC2012_val_00036190.JPEG
calibration_imagenet/ILSVRC2012_val_00036357.JPEG
calibration_imagenet/ILSVRC2012_val_00036436.JPEG
calibration_imagenet/ILSVRC2012_val_00036527.JPEG
calibration_imagenet/ILSVRC2012_val_00036658.JPEG
calibration_imagenet/ILSVRC2012_val_00036802.JPEG
calibration_imagenet/ILSVRC2012_val_00036878.JPEG
calibration_imagenet/ILSVRC2012_val_00037200.JPEG
calibration_imagenet/ILSVRC2012_val_00037212.JPEG
calibration_imagenet/ILSVRC2012_val_00037352.JPEG
calibration_imagenet/ILSVRC2012_val_00037390.JPEG
calibration_imagenet/ILSVRC2012_val_00037658.JPEG
calibration_imagenet/ILSVRC2012_val_00037698.JPEG
calibration_imagenet/ILSVRC2012_val_00037897.JPEG
calibration_imagenet/ILSVRC2012_val_00037898.JPEG
calibration_imagenet/ILSVRC2012_val_00038047.JPEG
