# Quantize InceptionV3 by Intel® Extenstion for Tensorflow* on Intel® Xeon®

## introduction


The example shows an End-To-End pipeline:

1. Train a InceptionV3 model with a flower photo dataset by transfer learning.

2. Execute the calibration by Intel® Neural Compressor.

3. Quantize and accelerate the inference by Intel® Extenstion for Tensorflow* for CPU.

This example can be executed on Intel® CPU supports Intel® AVX-512 Vector Neural Network Instructions (VNNI) or Intel® Advanced Matrix Extensions (AMX). There will be more performance improvement on Intel® CPU with AMX.

## Import Depended Library

In [None]:
import os
import warnings
warnings.filterwarnings('ignore')

import numpy as np
import neural_compressor as inc
print("neural_compressor version {}, need >=2.0".format(inc.__version__))

import tensorflow as tf
print("tensorflow {}, need ==2.15.1".format(tf.__version__))

import intel_extension_for_tensorflow as itex
print("intel_extension_for_tensorflow version {}, need >=1.1.0".format(itex.__version__))

## Dataset

We use a dataset of several thousand flowers photos. The flowers dataset contains five sub-folders for five classes:

```
flowers_photos/
  daisy/
  dandelion/
  roses/
  sunflowers/
  tulips/

```

1. Download the dataset from internet and extract it.

In [None]:
!wget -r -nc -P ./ http://download.tensorflow.org/example_images/flower_photos.tgz -O flower_photos.tgz
!tar -zxvf flower_photos.tgz

2. Create dataset.

In [None]:
WIDTH=224
HEIGHT=224
BATCH_SIZE=32

dataset_folder = './flower_photos/'

image_size = (WIDTH, HEIGHT)


def process(image,label):
    image = tf.cast(image/255.0 ,tf.float32)
    return image, label

train_dataset, val_dataset = tf.keras.preprocessing.image_dataset_from_directory(
    dataset_folder,
    validation_split=0.2,
    subset="both",
    seed=100,
    image_size=image_size,
    batch_size=BATCH_SIZE,
    label_mode= "categorical"
)

class_names = train_dataset.class_names
class_num = len(class_names)
print("Class Num={}".format(class_num))

train_dataset = train_dataset.map(process)
val_dataset = val_dataset.map(process)

## Transfer Learning

### Build Model

We will download a pre-trained InceptionV3 FP32 model by Keras API.

We disable the training capability of the pre-trained FP32 model part, and add 1 GlobalAveragePooling2D layer and 3 Dense layers. The final tf.keras.layers.Dense is with class number of the data and activation function **softmax**.

During the training, only the added layers are training. With the feature extractor function of pre-trained layers, it's easy to train the model in short time with the custom dataset in short time.

In [None]:
def build_model(w, h, class_num):    
    base_model=tf.keras.applications.InceptionV3(weights='imagenet',include_top=False)
    x = base_model.output
    x = tf.keras.layers.GlobalAveragePooling2D()(x)
    x = tf.keras.layers.Dense(256, activation='relu')(x)
    x = tf.keras.layers.Dense(256, activation='relu')(x)   
    predictions = tf.keras.layers.Dense(class_num, activation='softmax')(x)

    # this is the model we will train
    model = tf.keras.Model(inputs=base_model.input, outputs=predictions)

    # first: train only the top layers (which were randomly initialized)
    # i.e. freeze all convolutional InceptionV3 layers
    for layer in base_model.layers:
        layer.trainable = False

    # show the latest 10 layers' traninable
    for layer in model.layers[-10:]:
        print("{}\t{}".format(layer.trainable, layer.name,))

    # compile the model (should be done *after* setting layers to non-trainable)
    model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001), metrics=['accuracy'], loss='categorical_crossentropy')
    model.summary()
    return model

model = build_model(WIDTH, HEIGHT, class_num)

### Training Model

Train the model with 2 epochs.

In [None]:
def train_model(model, epochs=1):
    hist = model.fit(train_dataset, epochs = epochs, validation_data = val_dataset)
    result = model.evaluate(val_dataset)
    
epochs=2
train_model(model, epochs)

### Save Model

In [None]:
def save_model(model, model_path):    
    model.save(model_path)
    print("Save model to {}".format(model_path))
    
model_fp32_path="model_keras.fp32"
save_model(model, model_fp32_path)

## Quantize Model by Intel® Neural Compressor

### YAML File

To support quantization by Intel® Extenstion for Tensorflow* with oneDNN Graph, we need to set the **framework** as **tensorflow_itex**.

It's only special setting for Intel® Extenstion for Tensorflow*. Other configuration is same as legacy.

The mandatory items are framework, evalution, accuracy_criterion and exist_policy.

The tuning target is the accuracy loss percentage is thess than **1%**. We could edit it in Jupyter Notebook.

In [None]:
!cat inceptionv3.yaml

### Custom Dataset

The custom dataset class must provide two methods: `__len__()` and `__getitem__()`.

In this case, use the integrated metric function in this tool. So the dataset format must follow the requirement of default metric function. So the label format is class index, instead of categorical vector (one-hot encoding)

In [None]:
def process(image,label):
    image = tf.cast(image/255.0, tf.float32)
    return image, label

class Dataset(object):
    def __init__(self):
        # load dataset in memory and format as list [(image, lable), (image, label)]
        dataset_folder = 'flower_photos/'
        image_size = (224, 224)

        train_dataset, val_dataset = tf.keras.preprocessing.image_dataset_from_directory(
            dataset_folder,
            validation_split=0.2,
            subset="both",
            seed=100,
            image_size=image_size,
            batch_size=1
         )

        class_names = train_dataset.class_names
        class_num = len(class_names)

        self.train_dataset = list(train_dataset.map(process))
        self.train_dataset = [(tf.reshape(images, [224, 224, 3]), labels) for images, labels in self.train_dataset]
    
    
    def __getitem__(self, index):
        # return (image, label) by index       
        return self.train_dataset[index]

    def __len__(self):
        # return dataset size as integer
        return len(self.train_dataset)

### Quantize by Intel® Neural Compressor API

Create the dataloader by custom data defined above. Call Intel® Neural Compressor API to quantize the FP32 model.

The executing time depends on the size of dataset and accuracy target.

#### Execute

In [None]:
from neural_compressor.experimental import Quantization, common
from tensorflow.core.protobuf import rewriter_config_pb2


infer_config = tf.compat.v1.ConfigProto()
infer_config.graph_options.rewrite_options.constant_folding = rewriter_config_pb2.RewriterConfig.OFF
session = tf.compat.v1.Session(config=infer_config)
tf.compat.v1.keras.backend.set_session(session)

def auto_tune(input_graph_path, yaml_config, batch_size, int8_pb_file):
    quantizer = Quantization(yaml_config)
    dataset = Dataset()
    quantizer.calib_dataloader = common.DataLoader(dataset, batch_size=batch_size)
    quantizer.eval_dataloader = common.DataLoader(dataset, batch_size=batch_size)
    quantizer.model = common.Model(input_graph_path)
    q_model = quantizer.fit()

    return q_model


yaml_file = "inceptionv3.yaml"
batch_size = 32
model_fp32_path="model_keras.fp32"
int8_pb_file = "model_pb.int8"
q_model = auto_tune(model_fp32_path, yaml_file, batch_size, int8_pb_file)
q_model.save(int8_pb_file)

## Test the Performance & Accuracy

We use same script to test the perfomrance and accuracy of the FP32 and INT8 models.

Use 4 CPU cores to test process.


### Execute to Quantizae

#### Test FP32 Model

In [None]:
%%time
!source env_itex/bin/activate && numactl -C 0-3 python profiling_inc.py --input-graph=./model_keras.fp32 --omp-num-threads=4 --num-inter-threads=1 --num-intra-threads=4 --index=32

#### Test INT8 Model

In [None]:
%%time
!source env_itex/bin/activate && numactl -C 0-3 python profiling_inc.py --input-graph=./model_pb.int8 --omp-num-threads=4 --num-inter-threads=1 --num-intra-threads=4 --index=8

### Compare the Result

In [None]:
!python compare_perf.py