## Lab01_02: Using Intel® Neural Compressor to quantize a model

### Prerequisites: 
- onnx
- onnxruntime 
- onnxruntime-extensions
- neural_compressor 
- numpy 

### Objective:
- Learn the basics of the quantization pipeline (API, dataset, configuration file, quantization type)
- Evaluate the accuracy difference between dynamic quantization and accuracy aware quantization 
- Quantize ResNet50 exported from Lab01_01 to produce quantized Int8 models

In [1]:
import numpy as np
import onnx
from neural_compressor.experimental import Quantization, common
from neural_compressor import options

options.onnxrt.graph_optimization.level = 'ENABLE_BASIC'

Load the previously generated Float32 ONNX model

In [2]:
model = onnx.load("../../models/resnet50.onnx")
print("Model loaded!")

Model loaded!


Here, we'll try two common types of quantization schemes:
- Dynamic Quantization, which does not require a dataset, and therefore less code/configuration
- Accuracy Aware Quantization, which DOES require a dataset, but typically results in better accuracy

>Load each configuration 'yaml' files and run the quantization function. Compare both results. \
    - Dynamic Quantization = **resnet50_v1_5_dynamicq.yaml** \
    - Accuracy Aware Quantization = **resnet50_v1_5_qdq.yaml** \
    - The dataset and label file can be located in the resources/ folder \
    - Make the changes in **resnet50_v1_5_qdq.yaml** to reference the filepath to both the dataset folder and the label file

> Dynamic Quantization does not require a sample dataset and label file for calibration, but we are referencing it for evaluating how well the quantization scheme performed. 

>Static Quantization method will require a sample dataset and label file for calibrating the model. \



In [3]:
quantize = Quantization("resnet50_v1_5_dynamicq.yaml") # Configuration .yaml file

>Quantize the model and save it to the models folder

>Use the [quantize() and save()](https://github.com/intel/neural-compressor/tree/master/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq#code-update) APIs

Refer to the slidedeck for additional help.

In [4]:
quantize.model = common.Model(model)
q_model = quantize()
# Save the quantized model under the models/ folder
q_model.save("../../models/resnet50_int8.onnx")
print("Int8 Quantized model saved!")

2022-09-27 20:02:36 [INFO] Because both eval_dataloader_cfg and user-defined eval_func are None, automatically setting 'tuning.exit_policy.performance_only = True'.
2022-09-27 20:02:36 [INFO] Generate a fake evaluation function.
2022-09-27 20:02:37 [INFO] Get FP32 model baseline.
2022-09-27 20:02:37 [INFO] Save tuning history to C:\Users\Intel\Desktop\NEW\Intel-Innovation-22_CL010_Lab\Lab01\Lab01_02\nc_workspace\2022-09-27_20-02-30\./history.snapshot.
2022-09-27 20:02:37 [INFO] FP32 baseline is: [Accuracy: 1.0000, Duration (seconds): 0.0000]
2022-09-27 20:02:40 [INFO] |********Mixed Precision Statistics********|
2022-09-27 20:02:40 [INFO] +-------------------------+--------+-------+
2022-09-27 20:02:40 [INFO] |         Op Type         | Total  |  INT8 |
2022-09-27 20:02:40 [INFO] +-------------------------+--------+-------+
2022-09-27 20:02:40 [INFO] |          MatMul         |   1    |   1   |
2022-09-27 20:02:40 [INFO] |           Conv          |   53   |   53  |
2022-09-27 20:02:40 

Int8 Quantized model saved!
