# Achieve better inference performance by quantizing a pre-trained model from Model Zoo for Intel(R) Architecture with LPOT

Users will learn how to use Intel(R) Low Precision Optimization Tool ([LPOT](https://github.com/intel/lpot)) to quantize pre-trained model from Model Zoo for Intel(R) Architecture and achieve better inference performance.


## Prerequisites
### Import required python packages and check their version.

Make sure the Tensorflow is **2.3** or later. LPOT, matplotlib are required.

In [None]:
import tensorflow as tf
print("Tensorflow version {}".format(tf.__version__))

import lpot
print("LPOT version {}".format(lpot.__version__))

import matplotlib.pyplot as plt
import numpy as np

### Download pre-trained Model from Model Zoo for Intel(R) Architecture

Download pretrained TensorFlow fp32 Resnet50 model.

In [None]:
!rm -rf resnet50_fp32_pretrained_model.pb
!wget https://storage.googleapis.com/intel-optimized-tensorflow/models/v1_8/resnet50_fp32_pretrained_model.pb

Check if the model file exists

In [None]:
!ls -la resnet50_fp32_pretrained_model.pb

### Prepare the ImageNet dataset

The pretrained Resnet50 Models in Model Zoo for Intel(R) Architecture are trained based on [ImageNet](http://www.image-net.org/index). The same training dataset is used to quantize the model by LPOT and then get the accuracy accordingly.

Download and the ImageNet dataset using the [instructions](https://github.com/IntelAI/models/blob/master/datasets/imagenet/README.md) here. After running the conversion script you should have a directory with the ImageNet dataset as a TF records format, like:
```
  tf_records/
            train
            validation
```
In this sample, we use the validation dataset for calibration & evaluation with LPOT. 

We copy the folder **validation** to the local folder as **tf_2012_val**. 

Check if the folder exists:

In [None]:
!ls -la tf_2012_val

## Quantize FP32 Resnet50 Model by LPOT
The LPOT API will use the stdout to collect the info at runtime. To avoid issues for getting stdout in Jupyter notebook,  we prepare a seperated python script "**lpot_quantize_model.py**" to finish the all quantization jobs with LPOT.

Please refer to [introduction](https://github.com/intel/lpot/blob/master/docs/introduction.md) for more details.


Users need to go through below steps for quantizing a fp32 resnet model:



### 1. Edit a YAML File

#### Create a YAML File

We copy [resnet50_v1.yaml](https://github.com/intel/lpot/blob/master/examples/tensorflow/image_recognition/resnet50_v1.yaml) to local.

#### Update for Dataset

We use the pre-defined **Dataloader** in LPOT. Please set the ImageNet folder to **tf_2012_val** in a YAML file.

1.Calibration Dataset

Set the calibration dataset folder for ImageNet section:
```
quantization:
  ...
  Imagenet:
    root: tf_2012_val
  ...
  
```

2.Evaluation dataset

Set the evaluation dataset folder for accuracy and performance sections:

```
evaluation: 
  accuracy:
  ...
    Imagenet:
      root: tf_2012_val
  ...    
          

  performance:
  ...
    Imagenet:
      root: tf_2012_val
  ...    
          
```

#### Check YAML file

In [None]:
!cat resnet50_v1.yaml

### 2. Define a Tuning Function
Call Quantization APIs via the auto_tune function, and a frozen quantized model (int8 model) will be generated.

Moreover, we define a function to save the model as a PB file.

Check the code in "**lpot_quantize_model.py**".

In [None]:
!cat lpot_quantize_model.py

### 3. Run the Script to Quantize the Model

Execute the "**lpot_quantize_model.py**" to show the whole process of quantizing a model.

Note, it will take about 0.5-2 hours.

In [None]:
!python lpot_quantize_model.py

In the end, users will get a quantized model file "**resnet50_int8_model.pb**"

## Compare Quantized Model

Model Zoo for Intel(R) Architecture provides an inference script [launch_benchmark.py](https://github.com/IntelAI/models/blob/master/benchmarks/launch_benchmark.py) to measure the throughput, latency and accuracy of the FP32 & INT8 model.

For accuracy, we use the dataset defined in YAML file.

We prepare a bash script **local_banchmark.sh** as a wrapper script for **launch_benchmark.py** to benchmarking the model.

Three parameters are required:

1. Dataset path: It must be the **relative path** of the dataset.

2. Model file

3. Precision: [fp32 | int8]

Three results files will be generated: 

1. [Precision]_throughput.txt
   
2. [Precision]_latency.txt
    
3. [Precision]_accuracy.txt

In [None]:
!cat local_banchmark.sh

### FP32 Model

Run local_banchmark.sh to measure the throughput, latency and accuracy of FP32 model **resnet50_fp32_pretrained_model.pb**.

The first argument **tf_2012_val** is the relative path of the dataset. You could change it as yours.

In [None]:
!bash ./local_banchmark.sh tf_2012_val resnet50_fp32_pretrained_model.pb fp32 

### INT8 Model

Run local_banchmark.sh to measure the throughput, latency and accuracy of INT8 model **resnet50_int8_model.pb**.

It will save the test result in different text files.

In [None]:
!bash ./local_banchmark.sh tf_2012_val resnet50_int8_model.pb int8 

### Convert to JSON format

We prepare a script **format2json.py** to convert the test result files into the json format.

In [None]:
!python format2json.py

### Analyze the performance data.

In [None]:
import json

def autolabel(ax, rects):
    """
    Attach a text label above each bar displaying its height
    """
    for rect in rects:
        height = rect.get_height()
        ax.text(rect.get_x() + rect.get_width()/2., 1.05*height,
                '%0.5f' % float(height),
        ha='center', va='bottom')

def draw_bar(x, t, y, subplot, color, x_lab, y_lab, width=0.2):
    plt.subplot(subplot)
    plt.xticks(x, t)
    ax1 = plt.gca()
    ax1.set_xlabel(x_lab)
    ax1.set_ylabel(y_lab, color=color)
    rects1 = ax1.bar(x, y, color=color, width=width)
    ax1.tick_params(axis='y', labelcolor=color)
    autolabel(ax1, rects1)

def load_res(json_file):
    with open(json_file) as f:
        data = json.load(f)
        return data

res_32 = load_res('fp32.json')
res_8 = load_res('int8.json')
   
accuracys = [res_32['accuracy'], res_8['accuracy']]
throughputs = [res_32['throughput'], res_8['throughput']]             
latencys = [res_32['latency'], res_8['latency']]

print('throughputs', throughputs)
print('latencys', latencys)
print('accuracys', accuracys)

accuracys_perc = [accu*100 for accu in accuracys]

t = ['FP32', 'INT8']
x = [0, 1]
plt.figure(figsize=(16,6))
draw_bar(x, t, throughputs, 131, 'tab:green', 'Throughput(fps)', '', width=0.4)
draw_bar(x, t,  latencys, 132, 'tab:blue', 'Latency(s)', '', width=0.4)
draw_bar(x, t,  accuracys_perc, 133, '#28a99d', 'Accuracys(%)', '', width=0.4)
plt.show()

### Performance comparison between FP32 and INT8



In [None]:
throughputs_times = [1, throughputs[1]/throughputs[0]]
latencys_times = [1, latencys[1]/latencys[0]]
accuracys_times = [0, accuracys_perc[1] - accuracys_perc[0]]

print('throughputs_times', throughputs_times)
print('latencys_times', latencys_times)
print('accuracys_times', accuracys_times)

plt.figure(figsize=(16,6))
draw_bar(x, t, throughputs_times, 131, 'tab:green', 'Throughput Comparison (big is better)', '', width=0.2)
draw_bar(x, t, latencys_times, 132, 'tab:blue', 'Latency Comparison (small is better)', '', width=0.2)
draw_bar(x, t, accuracys_times, 133, '#28a99d', 'Accuracys Loss(%)', '', width=0.2)
plt.show()

In [None]:
print("[CODE_SAMPLE_COMPLETED_SUCCESFULLY]")