# Intel® Low Precision Optimization Tool (LPOT) Sample for Tensorflow

## Agenda
- Train a CNN Model Based on Keras
- Quantize Keras Model by LPOT
- Compare Quantized Model

### LPOT Release and Sample 

This sample code is always updated for the LPOT release in latest oneAPI release.

If you want to get the sample code for old oneAPI release, please checkout the old sample code release by git tag.

1. Check tags

```
git pull
git tag

2021.1-beta08
2021.1-beta09
2021.1-beta10

```

2. Checkout old code release
```
git checkout 2021.1-beta10
```

Import python packages and check version.

Make sure the Tensorflow is **2.2** or newer and LPOT is **1.0, 1,1 ** or **1.1**, matplotlib are installed.

Note， LPOT has an old name **ilit**. Following script supports to old package name **ilit**.

In [None]:
import tensorflow as tf
print("Tensorflow version {}".format(tf.__version__))

try:
    import lpot
    print("LPOT version {}".format(lpot.__version__)) 
except:
    import ilit as lpot
    print("iLiT version {}".format(lpot.__version__))       

import matplotlib.pyplot as plt
import numpy as np

## Train a CNN Model Based on Keras

We prepare a script '**alexnet.py**' to provide the functions to train a CNN model.

### Dataset
Use [MNIST](http://yann.lecun.com/exdb/mnist/) dataset to recognize hand writing numbers. 
Load the dataset.

In [None]:
import alexnet
 
data = alexnet.read_data()
x_train, y_train, label_train, x_test, y_test, label_test = data
print('train', x_train.shape, y_train.shape, label_train.shape)
print('test', x_test.shape, y_test.shape, label_test.shape)


### Build Model

Build a CNN model like Alexnet by Keras API based on Tensorflow.
Print the model structure by Keras API: summary().

In [None]:
classes = 10
width = 28
channels = 1

model = alexnet.create_model(width ,channels ,classes)

model.summary()

### Train the Model with the Dataset

Set the **epochs** to "**3**"

In [None]:
epochs = 3

alexnet.train_mod(model, data, epochs)

### Freeze and Save Model to Single PB

Set the input node name is "**x**".

In [None]:
from tensorflow.python.framework.convert_to_constants import convert_variables_to_constants_v2

def save_frezon_pb(model, mod_path):
    # Convert Keras model to ConcreteFunction
    full_model = tf.function(lambda x: model(x))
    concrete_function = full_model.get_concrete_function(
        x=tf.TensorSpec(model.inputs[0].shape, model.inputs[0].dtype))

    # Get frozen ConcreteFunction
    frozen_model = convert_variables_to_constants_v2(concrete_function)

    # Generate frozen pb
    tf.io.write_graph(graph_or_graph_def=frozen_model.graph,
                      logdir=".",
                      name=mod_path,
                      as_text=False)
fp32_frezon_pb_file = "fp32_frezon.pb"
save_frezon_pb(model, fp32_frezon_pb_file)

In [None]:
!ls -la fp32_frezon.pb

## Quantize FP32 Model by LPOT

LPOT supports to quantize the model with a validation dataset for tuning.
Finally, it returns an frezon quantized model based on int8.

We prepare a python script "**LPOT_quantize_model.py**" to call LPOT to finish the all quantization job.
Following code sample is used to explain the code.

### Define Dataloader

The class **Dataloader** provides an iter function to return the image and label as batch size.
We uses the validation data of MNIST dataset.

In [None]:
import mnist_dataset
import math


class Dataloader(object):
  def __init__(self, batch_size):
    self.batch_size = batch_size


  def __iter__(self):
    x_train, y_train, label_train, x_test, y_test,label_test = mnist_dataset.read_data()
    batch_nums = math.ceil(len(x_test)/self.batch_size)

    for i in range(batch_nums-1):
        begin = i*self.batch_size
        end = (i+1)*self.batch_size
        yield x_test[begin: end], label_test[begin: end]

    begin = (batch_nums-1)*self.batch_size
    yield x_test[begin:], label_test[begin:]

### Define Load FP32 Model
Load the saved fp32 model in previous step.

It's defined as alexnet.load_pb(in_model)

In [None]:
!cat alexnet.py

### Define Yaml File

We define alexnet.yaml to save the necessary parameters for LPOT.
In this case, we only need to change the input/output according to the fp32 model.

In this case, the input node name is '**x**'.

Output name is '**Identity**'.

In [None]:
!cat alexnet.yaml

### Define Tuning Function
We follow the template to create the tuning function. The function will return a frezon quantized model (int8 model).

In [None]:

def auto_tune(input_graph_path, yaml_config, batch_size):    
    fp32_graph = alexnet.load_pb(input_graph_path)
    quan = lpot.Quantization(yaml_config)
    dataloader = Dataloader(batch_size)
    assert(dataloader)
    q_model = quan(
                        fp32_graph,
                        q_dataloader=dataloader,
                        eval_func=None,
                        eval_dataloader=dataloader)
    return q_model


def save_int8_frezon_pb(q_model, path):
    from tensorflow.python.platform import gfile
    f = gfile.GFile(path, 'wb')
    f.write(q_model.as_graph_def().SerializeToString())
    print("Save to {}".format(path))
    
yaml_file = "alexnet.yaml"
batch_size = 200
int8_pb_file = "alexnet_int8_model.pb"

### Call Function to Quantize the Model

Show the code in "**lpot_quantize_model.py**".

In [None]:
!cat  lpot_quantize_model.py

We will execute the "**lpot_quantize_model.py**" to show the whole process of quantizing a model.

In [None]:
!export TF_ENABLE_MKL_NATIVE_FORMAT=0

In [None]:
!TF_ENABLE_MKL_NATIVE_FORMAT=0 python lpot_quantize_model.py

We get a quantized model file "**alexnet_int8_model.pb**"

## Compare Quantized Model

We prepare a script **profiling_lpot.py** to test the performance of PB model.

There is no correct performance data if run the code by jupyter notebook. So we run the script as process.

Let learn **profiling_lpot.py**. 

In [None]:
!cat profiling_lpot.py

Execute the **profiling_lpot.py** with FP32 model file:

In [None]:
!python profiling_lpot.py --input-graph=./fp32_frezon.pb --omp-num-threads=4 --num-inter-threads=1 --num-intra-threads=4 --index=32

Execute the **profiling_lpot.py** with int8 model file:

In [None]:
!python profiling_lpot.py --input-graph=./alexnet_int8_model.pb --omp-num-threads=4 --num-inter-threads=1 --num-intra-threads=4 --index=8

In [None]:
!cat 32.json
!echo " "
!cat 8.json

Execute the functions to load and show the performance data from 32.json & 8.sjon.

In [None]:
import json

def autolabel(ax, rects):
    """
    Attach a text label above each bar displaying its height
    """
    for rect in rects:
        height = rect.get_height()
        ax.text(rect.get_x() + rect.get_width()/2., 1.05*height,
                '%0.2f' % float(height),
        ha='center', va='bottom')

def draw_bar(x, t, y, subplot, color, x_lab, y_lab, width=0.2):
    plt.subplot(subplot)
    plt.xticks(x, t)
    ax1 = plt.gca()
    ax1.set_xlabel(x_lab)
    ax1.set_ylabel(y_lab, color=color)
    rects1 = ax1.bar(x, y, color=color, width=width)
    ax1.tick_params(axis='y', labelcolor=color)
    autolabel(ax1, rects1)

def load_res(json_file):
    with open(json_file) as f:
        data = json.load(f)
        return data

res_32 = load_res('32.json')
res_8 = load_res('8.json')
   
accuracys = [res_32['accuracy'], res_8['accuracy']]
throughputs = [res_32['throughput'], res_8['throughput']]             
latencys = [res_32['latency'], res_8['latency']]

print('throughputs', throughputs)
print('latencys', latencys)
print('accuracys', accuracys)

accuracys_perc = [accu*100 for accu in accuracys]

t = ['FP32', 'INT8']
x = [0, 1]
plt.figure(figsize=(16,6))
draw_bar(x, t, throughputs, 131, 'tab:green', 'Throughput(fps)', '', width=0.2)
draw_bar(x, t,  latencys, 132, 'tab:blue', 'Latency(s)', '', width=0.2)
draw_bar(x, t,  accuracys_perc, 133, '#28a99d', 'Accuracys(%)', '', width=0.2)
plt.show()

### FP32 vs INT8

Compare the performance data based on data of FP32 model.

In [None]:
throughputs_times = [1, throughputs[1]/throughputs[0]]
latencys_times = [1, latencys[1]/latencys[0]]
accuracys_times = [0, accuracys_perc[1] - accuracys_perc[0]]

print('throughputs_times', throughputs_times)
print('latencys_times', latencys_times)
print('accuracys_times', accuracys_times)

plt.figure(figsize=(16,6))
draw_bar(x, t, throughputs_times, 131, 'tab:green', 'Throughput Comparison (big is better)', '', width=0.2)
draw_bar(x, t, latencys_times, 132, 'tab:blue', 'Latency Comparison (small is better)', '', width=0.2)
draw_bar(x, t, accuracys_times, 133, '#28a99d', 'Accuracys Loss(%)', '', width=0.2)
plt.show()

## Sample Running is Finished

In [None]:
print("[CODE_SAMPLE_COMPLETED_SUCCESFULLY]")

### Summary
Performance Improvement:

- FP32 to INT8.
- Intel® Deep Learning Boost speed up INT8 if your CPU is the Second Generation Intel® Xeon® Scalable Processors which supports it.