## ⚠️ **DEPRECATED**

This notebook is deprecated and may no longer be maintained.
Please use it with caution or refer to updated resources.


# Accelerate VGG19 Inference on Intel® Gen4 Xeon®  Sapphire Rapids

## Introduction


This example shows a whole pipeline:

1. Train an image classification model [VGG19](https://arxiv.org/abs/1409.1556) by transfer learning based on [TensorFlow Hub](https://tfhub.dev) trained model.

2. Quantize the FP32 Keras model and get an INT8 PB model using Intel® Neural Compressor.

3. Test and compare the performance of FP32 & INT8 models.

This example can be executed on Intel® CPU supports Intel® AVX-512 Vector Neural Network Instructions (VNNI) or Intel® Advanced Matrix Extensions (AMX). There will be more performance improvement on Intel® CPU with AMX.

In [None]:
%env TF_ENABLE_ONEDNN_OPTS=1 ## In case not enabled

## Import all the required libraries

In [None]:
%matplotlib inline

import os
import warnings
warnings.filterwarnings('ignore')

import numpy as np
import matplotlib.pylab as plt
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_datasets as tfds

import neural_compressor as inc
print("neural_compressor version {}".format(inc.__version__))

import tensorflow as tf
print("tensorflow {}".format(tf.__version__))

from IPython import display

## Transfer Learning

### Dataset

We use a publicly available dataset [ibean](https://github.com/AI-Lab-Makerere/ibean/) and download from internet. The dataset size is about 170MB which is small enough to be easy download and learn deep learning in short time.

It includes leaf images of beans which consist of 3 classes: 2 disease classes and the healthy class. The dataset is divided into 3 parts: train, test, validation.

A record include:
1. Image: shape (500, 500, 3), data type is uint8
2. label: class label (num_classes=3), data type is uint64


In [None]:
# define class number
class_num=3

def load_raw_dataset():
    raw_datasets, raw_info = tfds.load(name = 'beans', with_info = True,
                                       as_supervised = True, 
                                       split = ['train', 'test'])
    return raw_datasets, raw_info

### Pre-Trained Model

We will download a trained VGG19 FP32 Keras model from TensorFlow Hub. 

The pre-trained model's input is (32, 32, 3) and output is 10 softmax logits for 10 classes. 

We need to convert the input image to (32, 32, 3)


In [None]:
# define input image size and class number
w=h=32

### Build Model

We call hub.KerasLayer() to download the pre-trained model and wraps it as a Keras Layer.

We disable the training capability of the trained FP32 model part, and add 3 tf.keras.layers.Dense layers and 2 tf.keras.layers.Dropout layers. The final tf.keras.layers.Dense is with class number of the data and  activation function **softmax**.

During the training, only the added layers are training. With the feature extractor function of pre-trained layers, it's easy to train the model in short time with the custom dataset in short time.

In [None]:
def build_model(w, h, class_num):
    url = "https://www.kaggle.com/models/deepmind/ganeval-cifar10-convnet/frameworks/TensorFlow1/variations/ganeval-cifar10-convnet/versions/1"
    feature_extractor_layer = hub.KerasLayer(url, input_shape = (w, h, 3))
    feature_extractor_layer.trainable = False

    model = tf.keras.Sequential(
        [
            feature_extractor_layer,
            tf.keras.layers.Dense(256, activation = 'relu'),
            tf.keras.layers.Dropout(0.4),
            tf.keras.layers.Dense(256, activation = 'relu'),
            tf.keras.layers.Dropout(0.4),            
            tf.keras.layers.Dense(class_num, activation = 'softmax')
        ]
    )

    model.summary()

    model.compile(
        optimizer = tf.keras.optimizers.Adam(),
        loss = tf.keras.losses.CategoricalCrossentropy(from_logits = True),
        metrics = ['acc']
    )    
    return model

model = build_model(w, h, class_num)

### Data Preprocessing

The pre-trained model's input shape is (32, 32, 3), so we must resize the input of dataset to same shape for transfer learning.

The raw input data is INT8 type, we need to convert it to FP32.

In [None]:
def preprocess(image, label):
    image = tf.cast(image, tf.float32)/255.0
    return tf.image.resize(image, [w, h]), tf.one_hot(label, class_num)

### Dataset Loader

In [None]:
def load_dataset(batch_size = 32):
    datasets, info = load_raw_dataset()
    return [dataset.map(preprocess).batch(batch_size) for dataset in datasets]

### Training Model

Train the model with 5 epochs.

In [None]:
def train_model(model, epochs=1):
    train_dataset, test_dataset = load_dataset()
    hist = model.fit(train_dataset, epochs = epochs, validation_data = test_dataset)
    result = model.evaluate(test_dataset)
    
epochs=5
train_model(model, epochs)

### Save Model

In [None]:
def save_model(model, model_path):    
    model.save(model_path)
    print("Save model to {}".format(model_path))
    
model_fp32_path="model_keras.fp32"
save_model(model, model_fp32_path)

### Test Model on Single Image

In [None]:
%matplotlib inline

import matplotlib.pylab as plt
import numpy as np


def verify_single_image(model, test_dataset, info):
    for sample in datasets[-1].take(1):
        [image, label] = sample
        image_fp32, label_arr = preprocess(image, label)
        image_fp32 = np.expand_dims(image_fp32, axis = 0)
        pred = model(image_fp32)


        plt.figure()
        plt.imshow(image)
        plt.show()

        print("Actual Label : %s" %info.features['label'].names[label.numpy()])
        print("Predicted Label : %s" %info.features['label'].names[np.argmax(pred)])
        
datasets, info = load_raw_dataset()
verify_single_image(model, datasets[-1], info)

## Model Quantization using Intel® Neural Compressor(INC)

### Custom Dataset

The custom dataset class must provide two methods: `__len__()` and `__getitem__()`.

In this case, use the integrated metric function in this tool. So the dataset format must follow the requirement of default metric function. So the label format is class index, instead of categorical vector (one-hot encoding)

In [None]:
def preprocess_1(image, label):
    image = tf.cast(image, tf.float32)/255.0
    return  tf.image.resize(image, [w, h]), label  


class Dataset(object):
    def __init__(self):
        datasets , info = load_raw_dataset()        
        self.train_dataset = [preprocess_1(v, l) for v,l in datasets[0]]
    
    def __getitem__(self, index):
        return self.train_dataset[index]

    def __len__(self):
        return len(list(self.train_dataset))


### Quantization 

#### Quantization Plus BF16 on Sapphire Rapids (SPR) (Optional)

If you want to try Quantization Plus BF16 on **none SPR**, please enable it forcely.

The quantized model can be accelerated when run inference on SPR.

```
import os
os.environ["FORCE_BF16"] = "1"
os.environ["MIX_PRECISION_TEST"] = "1"
```

#### Quantization using Intel® Neural Compressor(INC) API

Create the dataloader by custom data defined above. Call Intel® Neural Compressor API to quantize the FP32 model.

The executing time depends on the size of dataset and accuracy target.

#### Execute to Quantize on Local SPR server.

In [None]:
from neural_compressor.data import DataLoader
from neural_compressor.quantization import fit
from neural_compressor.config import PostTrainingQuantConfig, AccuracyCriterion
from neural_compressor import Metric


def auto_tune(input_graph_path, batch_size, int8_pb_file):
    dataset = Dataset()
    dataloader = DataLoader(framework='tensorflow', dataset=dataset, batch_size = batch_size)
    
    #Define accuracy criteria and tolerable loss
    config = PostTrainingQuantConfig(
    accuracy_criterion = AccuracyCriterion(
      higher_is_better=True, 
      criterion='relative',  
      tolerable_loss=0.01  
      )
    )

    top1 = Metric(name="topk", k=1)
    
    q_model = fit(
        model=input_graph_path,
        conf=config,
        calib_dataloader=dataloader,
        eval_dataloader=dataloader,
        eval_metric=top1
        )

    return q_model



batch_size = 32
model_fp32_path="model_keras.fp32"
int8_pb_file = "model_pb.int8"
q_model = auto_tune(model_fp32_path,  batch_size, int8_pb_file)
q_model.save(int8_pb_file)
print("Save quantized model to {}".format(int8_pb_file))

## Test the Performance & Accuracy

We use same script to test the performance and accuracy of the FP32 and INT8 models.

Use 4 CPU cores to test process.


### Execute profiling_inc.py for Inference results on Local SPR server.

#### Note: It's recommended to provide full python env path in the notebook. Please change it accordingly

#### Test FP32 Model

In [None]:
%%time
!numactl -C 0-3 ~/.conda/envs/env_inc/bin/python profiling_inc.py --input-graph=./model_keras.fp32 --omp-num-threads=4 --num-inter-threads=1 --num-intra-threads=4 --index=32

#### Test INT8 Model

In [None]:
%%time
!numactl -C 0-3 ~/.conda/envs/env_inc/bin/python profiling_inc.py --input-graph=./model_pb.int8 --omp-num-threads=4 --num-inter-threads=1 --num-intra-threads=4 --index=8

### Compare the Results

In [None]:
!~/.conda/envs/env_inc/bin/python compare_perf.py

Show result by graphic.

In [None]:
from IPython.display import Image, display

listOfImageNames = ['fp32_int8_aboslute.png',
                    'fp32_int8_times.png']

for imageName in listOfImageNames:
    display(Image(filename=imageName))

# Citation

```
@ONLINE {beansdata,
    author="Makerere AI Lab",
    title="Bean disease dataset",
    month="January",
    year="2020",
    url="https://github.com/AI-Lab-Makerere/ibean/"
}
```

In [None]:
!which python 