© 2020 Neuralmagic, Inc., Confidential // Neural Magic Evaluation License Agreement
# Post Training Quantization with an ONNX Model

This notebook provides an easy step-by-step walkthrough for performing post training quantization on an ONNX model using tools in the neuralmagicML package. You will:
- Set up the environment
- Download an example ONNX model
- Use `neuralmagicML.onnx.quantization` to run the post training quantization for the model
- Quantize the model and save it to a new ONNX file
- Validate the accuracy of your new quantized model

## Step 1 - Setting Up the Environment

In this step, Neural Magic checks your environment setup to ensure the rest of the notebook will flow smoothly.
Before running, install the neuralmagicML package into the system using the following at the parent of the package directory:

`pip install neuralmagicML-python/ `


In [None]:
notebook_name = "quantize_post_training_onnx"
print("checking setup for {}...".format(notebook_name))

# filter because of tensorboard future warnings
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

try:
    # make sure neuralmagicML is installed
    import neuralmagicML
except Exception as ex:
    raise Exception(
        "please install neuralmagicML using the setup.py file before continuing"
    )

# torch is required for loading the MNIST dataset
from neuralmagicML.utilsnb import check_pytorch_notebook_setup
check_pytorch_notebook_setup()

## Step 2 - Downloading a Sample ONNX Model
To run the quantization example, you will need an ONNX model to quantize.  The cell below contains code to download an ONNX encoded mnist model from the Neural Magic Model Repo.  To quantize another model, point the code to your own ONNX file, or to another ONNX model from the Neural Magic Model Repo.

In [None]:
import os
from neuralmagicML.utils import available_models, RepoModel, clean_path

save_dir = clean_path(os.path.join(".", notebook_name, "mnist"))

model = [model for model in available_models() if model.arch_display == 'MnistNet'][0]

model_path = model.download_onnx_file(save_dir=save_dir)
print("Model downloaded to {}".format(model_path))

## Step 3 - Load a Calibration Dataset
To quantize your model, you will need a dataset that is representitive of model inputs to perform calibration.  The neuralmagicML post training quantization tool takes a `DataLoader` (`neuralmagicML.onnx.utils.data.DataLoader`) that will be used to calibrate the model.

The cell below downloads an Mnist validation dataset from the PyTorch datasets repo to use as your calibration dataset and loads it into a DataLoader object.

In [None]:
from neuralmagicML.pytorch.datasets import MNISTDataset
from neuralmagicML.onnx.utils import DataLoader

mnist_dataset = MNISTDataset(root=save_dir, train=False)

# Format every data point into a dictionary of input name to array
input_dict = [{"input": img.numpy()} for (img, _) in mnist_dataset]
data_loader = DataLoader(input_dict, None, 1)

## Step 4 - Quantize the Model
Run the code below to use the `quantize_model_post_training` function from `neuralmagicML.onnx.quantization` to save a quantized version of your model.

In [None]:
from neuralmagicML.onnx.quantization import quantize_model_post_training

quantized_model_path = os.path.join(save_dir, "model-quant.onnx")

print("Calibrating...")

quantize_model_post_training(
    model_path,
    data_loader,
    output_model_path=quantized_model_path,
    static=True
)

print("Quantized model saved to {}".format(quantized_model_path))

## Step 5 - Validate the Quantized Model
After quantizing your model, it is important to check its performance.  Run the following code to test the quantized model against the MNIST validation dataset.  You should see the model still has accuracy upwards of 99%.

In [None]:
import numpy as np
from tqdm.auto import tqdm
from neuralmagicML.onnx.utils import ORTModelRunner

print("Validating Quantized MNIST Model...")

onnx_inference_session = ORTModelRunner(quantized_model_path, batch_size=1)
correct_predictions = 0

# Reload data_loader
labels = [{"output": np.array(label)} for _, label in mnist_dataset]
data_loader = DataLoader(input_dict, labels, 1)
for i, (batch, label) in enumerate(tqdm(data_loader)):
    outputs, _ = onnx_inference_session.batch_forward(batch)
    prediction = np.argmax(outputs["output_0"])
    if prediction == label["output"]:
        correct_predictions += 1

print(
    "{} / {} samples correctly labeled".format(
        correct_predictions, len(mnist_dataset)
    )
)
    

## Congratulations - You have completed the Post Training Quantization Notebook
### Next Step

Run your model (ONNX file) through the Neural Magic Inference Engine. The following is an example of code that you can run in your Python console. Be sure to enter your ONNX file path and batch size.

```
from neuralmagic import create_model
model = create_model(onnx_file_path=’some/path/to/model.onnx’, batch_size=1)
out = model.forward(input_batch)
print(out)
```