## ⚠️ **DEPRECATED**

This notebook is deprecated and may no longer be maintained.
Please use it with caution or refer to updated resources.


# Intel® Neural Compressor Sample for Tensorflow

## Introduction

This is a demo to show how to use Intel® Neural Compressor to do quantization on ResNet.

## Prepare Environment

In [None]:
import sys
!conda install python==3.10 -y
!{sys.executable} -m pip install -r requirements.txt 

!wget -nc https://storage.googleapis.com/intel-optimized-tensorflow/models/v1_6/resnet50_fp32_pretrained_model.pb


In [None]:
print(sys.executable)
!{sys.executable} -m pip list
import tensorflow as tf
import numpy as np
import datasets


## Create Dataloader

In [None]:
# login to huggingface to download the imagenet-1k dataset
# you should replace this read-only token with your own by create one on (https://huggingface.co/settings/tokens)
from huggingface_hub.hf_api import HfFolder
HfFolder.save_token('hf_xxxxxxxxxxxxxxxxxxxxxx')


In [None]:
from datasets import load_dataset
# load dataset in streaming way will get an IterableDatset
calib_dataset = load_dataset('imagenet-1k', split='train', streaming=True, token=True)
eval_dataset = load_dataset('imagenet-1k', split='validation', streaming=True, token=True)


In [None]:
# We can select only a subset of the dataset for demo, here just select 1k samples
MAX_SAMPLE_LENGTG=1000
def sample_data(dataset, max_sample_length):
    data = {"image": [], "label": []}
    for i, record in enumerate(dataset):
        if i >= MAX_SAMPLE_LENGTG:
            break
        data["image"].append(record['image'])
        data["label"].append(record['label'])
    return datasets.Dataset.from_dict(data)

sub_calib_dataset = sample_data(calib_dataset, MAX_SAMPLE_LENGTG)
sub_eval_dataset = sample_data(eval_dataset, MAX_SAMPLE_LENGTG)


In [None]:
from neural_compressor.data.transforms.imagenet_transform import TensorflowResizeCropImagenetTransform
height = width = 224
transform = TensorflowResizeCropImagenetTransform(height, width)

class CustomDataloader:
    def __init__(self, dataset, batch_size=1):
        '''dataset is a iterable dataset and will be loaded record by record at runtime.'''
        self.dataset = dataset
        self.batch_size = batch_size
        import math
        self.length = math.ceil(len(self.dataset) / self.batch_size)
    
    def __iter__(self):
        batch_inputs = []
        labels = []
        for idx, record in enumerate(self.dataset):
            # record e.g.: {'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=408x500 ...>, 'label': 91}
            img = record['image']
            label = record['label']
            # skip the wrong shapes
            if len(np.array(img).shape) != 3 or np.array(img).shape[-1] != 3:
                continue
            img_resized = transform((img, label))   # (img, label)
            batch_inputs.append(np.array(img_resized[0]))
            labels.append(label)
            if (idx+1) % self.batch_size == 0:
                yield np.array(batch_inputs), np.array(labels)   # (bs, 224, 224, 3), (bs,)
                batch_inputs = []
                labels = []
    def __len__(self):
        return self.length


In [None]:
calib_dataloader = CustomDataloader(dataset=sub_calib_dataset, batch_size=32)
eval_dataloader = CustomDataloader(dataset=sub_eval_dataset, batch_size=32)


## Quantization

Then we are moving to the core quantization logics. `quantization.fit` is the main entry of converting our base model to the quantized model. We pass the prepared calibration and evaluation dataloder to `quantization.fit`. After converting, we obtain the quantized int8 model and save it locally. 

In [None]:
from tqdm import tqdm
import time
from neural_compressor import quantization
from neural_compressor.config import PostTrainingQuantConfig

conf = PostTrainingQuantConfig(calibration_sampling_size=[50, 100], excluded_precisions = ['bf16'])

def eval_func(model):
    from neural_compressor.model import Model
    model = Model(model)
    ans = []
    total_cnt = 0
    total_hit = 0
    latency_list = []
    for idx, (batch_inputs, labels) in enumerate(tqdm(eval_dataloader)):
        feed_dict = dict(zip(model.input_tensor, [batch_inputs]))
        start = time.time()
        preds = model.sess.run(model.output_tensor, feed_dict)
        end = time.time()
        latency_list.append(end-start)
        ans = np.argmax(preds[0], axis=-1)
        labels += 1    # label shift
        total_cnt += len(labels)
        total_hit += np.sum(ans == labels)
    acc = total_hit / total_cnt
    latency = np.array(latency_list).mean() / eval_dataloader.batch_size
    return acc

q_model = quantization.fit("./resnet50_fp32_pretrained_model.pb", conf=conf, calib_dataloader=calib_dataloader, eval_func=eval_func)
q_model.save("resnet50_int8.pb")


## Benchmark

Now we can see that we have two models under the current directory: the original fp32 model `resnet50_fp32_pretrained_model.pb` and the quantized int8 model `resnet50_int8.pb`, and then we are going to do performance comparisons between them.


To avoid the conflicts of jupyter notebook kernel to our benchmark process. We create a `resnet_quantization.py` and run it directly to do the benchmarks.

### FP32 benchmark

In [None]:
!{sys.executable} resnet_benchmark.py --input_model resnet50_fp32_pretrained_model.pb 2>&1|tee fp32_benchmark.log


### INT8 benchmark

In [None]:
!{sys.executable} resnet_benchmark.py --input_model resnet50_int8.pb 2>&1|tee int8_benchmark.log


Finally, you will get the performance in the logs like following:

* fp32_benchmark.log

```
2023-08-28 22:46:39 [INFO] ********************************************
2023-08-28 22:46:39 [INFO] |****Multiple Instance Benchmark Summary*****|
2023-08-28 22:46:39 [INFO] +---------------------------------+----------+
2023-08-28 22:46:39 [INFO] |              Items              |  Result  |
2023-08-28 22:46:39 [INFO] +---------------------------------+----------+
2023-08-28 22:46:39 [INFO] | Latency average [second/sample] | 0.027209 |
2023-08-28 22:46:39 [INFO] | Throughput sum [samples/second] |  36.753  |
2023-08-28 22:46:39 [INFO] +---------------------------------+----------+
```

* int8_benchmark.log

```
2023-08-28 22:48:35 [INFO] ********************************************
2023-08-28 22:48:35 [INFO] |****Multiple Instance Benchmark Summary*****|
2023-08-28 22:48:35 [INFO] +---------------------------------+----------+
2023-08-28 22:48:35 [INFO] |              Items              |  Result  |
2023-08-28 22:48:35 [INFO] +---------------------------------+----------+
2023-08-28 22:48:35 [INFO] | Latency average [second/sample] | 0.006855 |
2023-08-28 22:48:35 [INFO] | Throughput sum [samples/second] | 145.874  |
2023-08-28 22:48:35 [INFO] +---------------------------------+----------+
```

As shown in the logs, the int8/fp32 performance gain is about 145.87/36.75 = 3.97x