## BigDL-Nano Keras Inference Example
---
This example shows the usage of BigDL-Nano Tensorflow Keras inference pipeline.

In [1]:
import os
from time import time

import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds
from tensorflow import keras
from tensorflow.keras import layers
from bigdl.nano.tf.keras import Model

In [2]:
IMG_SIZE=224
BATCH_SIZE=64
DATASET_NAME="stanford_dogs"

### Loading data
---
Here we load data from tensorflow_datasets (hereafter TFDS). Stanford Dogs dataset is provided in TFDS as stanford_dogs. It features 20,580 images that belong to 120 classes of dog breeds (12,000 for training and 8,580 for testing).

In [3]:
(_, ds_test), ds_info=tfds.load(
    DATASET_NAME, data_dir="tensorflow_datasets/" ,split=["train", "train"], with_info=True, as_supervised=True
)
NUM_CLASSES = ds_info.features["label"].num_classes

size = (IMG_SIZE, IMG_SIZE)
ds_test = ds_test.map(lambda image, label: (tf.image.resize(image, size), label))

2022-06-08 17:47:33.383491: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [4]:
def input_preprocess(image, label):
    label = tf.one_hot(label, NUM_CLASSES)
    return image, label

ds_test = ds_test.map(input_preprocess)
TEST_STEPS = int(os.environ.get('TEST_STEPS', len(ds_test)))
ds_test = ds_test.take(TEST_STEPS).cache().prefetch(buffer_size=tf.data.AUTOTUNE).batch(BATCH_SIZE)

### Loading Model
---
Load the EfficientNetB0 Model using the H5 file saving using bigdl.nano.tf.keras.Model after single process training in the nano-keras-fit-example.

In [5]:
model = keras.models.load_model("EfficientNetB0.h5")

In [6]:
start = time()
model.predict(ds_test)
inference_time_model_basic = time() - start

tcmalloc: large alloc 1073741824 bytes == 0x55ee1ecca000 @  0x7ff3ddcd4d3f 0x7ff3ddd0b0c0 0x7ff3ddd0e082 0x7ff3ddd0e243 0x7ff3c009e402 0x7ff3b4438eb0 0x7ff3b44590b5 0x7ff3b445c9ea 0x7ff3b445cf69 0x7ff3b445d2d1 0x7ff3b4451ce3 0x7ff3afb17051 0x7ff3af97238d 0x7ff3af705087 0x7ff3af70591e 0x7ff3af705b1d 0x7ff3bb0f6bf5 0x7ff3afb18d7c 0x7ff3afaa2cec 0x7ff3b4e6576e 0x7ff3b4e621f3 0x7ff3b01f4313 0x7ff3ddc6a609 0x7ff3ddb8f133
tcmalloc: large alloc 2147483648 bytes == 0x55ee61c8c000 @  0x7ff3ddcd4d3f 0x7ff3ddd0b0c0 0x7ff3ddd0e082 0x7ff3ddd0e243 0x7ff3c009e402 0x7ff3b4438eb0 0x7ff3b44590b5 0x7ff3b445c9ea 0x7ff3b445cf69 0x7ff3b445d2d1 0x7ff3b4451ce3 0x7ff3afb17051 0x7ff3af97238d 0x7ff3af705087 0x7ff3af70591e 0x7ff3af705b1d 0x7ff3b56e5bfd 0x7ff3afb18d7c 0x7ff3afaa2cec 0x7ff3b4e6576e 0x7ff3b4e621f3 0x7ff3b01f4313 0x7ff3ddc6a609 0x7ff3ddb8f133
tcmalloc: large alloc 4294967296 bytes == 0x55eee5ef0000 @  0x7ff3ddcd4d3f 0x7ff3ddd0b0c0 0x7ff3ddd0e082 0x7ff3ddd0e243 0x7ff3c009e402 0x7ff3b4438eb0 0x7ff3b445

### Quantize Model
---
Use Model.quantize from bigdl.nano.tf.keras to calibrate a keras model for post-training quantization.<br>
Here are some parameters:
```
    :param calib_dataset:   A tf.data.Dataset object for calibration. Required for
                            static quantization. It's also used as validation dataloader.
    :param precision:       Global precision of quantized model,
                            supported type: 'int8', 'bf16', 'fp16', defaults to 'int8'.
    :param accelerator:     Use accelerator 'None', 'onnxruntime', 'openvino', defaults to None.
                            None means staying in tensorflow.
    :param metric:          A tensorflow.keras.metrics.Metric object for evaluation.
    :param accuracy_criterion:  Tolerable accuracy drop.
                                accuracy_criterion = {'relative': 0.1, 'higher_is_better': True}
                                allows relative accuracy loss: 1%. accuracy_criterion =
                                {'absolute': 0.99, 'higher_is_better':False} means accuracy
                                must be smaller than 0.99.
    :param tuning_strategy:    'bayesian', 'basic', 'mse', 'sigopt'. Default: 'bayesian'.
    
```
Access more details from [Source](https://github.com/intel-analytics/BigDL/blob/main/python/nano/src/bigdl/nano/tf/quantization.py#L23)

In [7]:
model_quantized = model.quantize(calib_dataset=ds_test,
                                 tuning_strategy='basic'
                                 )

2022-06-08 17:48:26 [INFO] Generate a fake evaluation function.
2022-06-08 17:48:26.817804: I tensorflow/core/grappler/devices.cc:75] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 (Note: TensorFlow was not compiled with CUDA or ROCm support)
2022-06-08 17:48:26.817985: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2022-06-08 17:48:26.841506: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1149] Optimization results for grappler item: graph_to_optimize
  function_optimizer: function_optimizer did nothing. time = 0.007ms.
  function_optimizer: function_optimizer did nothing. time = 0.002ms.

2022-06-08 17:48:28.023431: I tensorflow/core/grappler/devices.cc:75] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 (Note: TensorFlow was not compiled with CUDA or ROCm support)
2022-06-08 17:48:28.023599: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2022-06-08 17:48:28.181

Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`


Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`
2022-06-08 17:48:30 [INFO] Pass StripUnusedNodesOptimizer elapsed time: 453.15 ms
2022-06-08 17:48:30 [INFO] Pass GraphCseOptimizer elapsed time: 116.53 ms
2022-06-08 17:48:31 [INFO] Pass FoldBatchNormNodesOptimizer elapsed time: 784.02 ms
2022-06-08 17:48:31 [INFO] Pass UpdateEnterOptimizer elapsed time: 23.74 ms
2022-06-08 17:48:31 [INFO] Pass ConvertLeakyReluOptimizer elapsed time: 27.11 ms
2022-06-08 17:48:31 [INFO] Pass ConvertAddToBiasAddOptimizer elapsed time: 26.93 ms
2022-06-08 17:48:31 [INFO] Pass FuseTransposeReshapeOptimizer elapsed time: 27.61 ms
2022-06-08 17:48:31 [INFO] Pass FuseConvWithMathOptimizer elapsed time: 27.29 ms
2022-06-08 17:48:31 [INFO] Pass ExpandDimsOptimizer elapsed time: 26.73 ms
2022-06-08 17:48:31 [INFO] Pass InjectDummyBiasAddOptimizer elapsed time: 27.31 ms
2022-06-08 17:48:31 [INFO] Pass MoveSqueezeAfterReluOptimizer elapsed time: 27.07 ms
2022-06-08 17:48:31 [INFO] Pass Pre

;EfficientNet/block5b_se_expand/Conv2D_eightbit_requant_range__print__;__requant_min:[-5.81148911]
;EfficientNet/block5b_se_expand/Conv2D_eightbit_requant_range__print__;__requant_max:[3.22957158]
;EfficientNet/block5b_project_conv/Conv2D_eightbit_min_EfficientNet/block5b_se_excite/mul__print__;__min:[-0.267710477]
;EfficientNet/block5b_project_conv/Conv2D_eightbit_max_EfficientNet/block5b_se_excite/mul__print__;__max:[10.466321]
;EfficientNet/block5b_project_conv/Conv2D_eightbit_requant_range__print__;__requant_min:[-17.544323]
;EfficientNet/block5b_project_conv/Conv2D_eightbit_requant_range__print__;__requant_max:[19.8302059]
;EfficientNet/block5c_expand_conv/Conv2D_eightbit_min_EfficientNet/block5b_add/add__print__;__min:[-36.190834]
;EfficientNet/block5c_expand_conv/Conv2D_eightbit_max_EfficientNet/block5b_add/add__print__;__max:[38.600174]
;EfficientNet/block5c_expand_conv/Conv2D_eightbit_requant_range__print__;__requant_min:[-10.4459524]
;EfficientNet/block5c_expand_conv/Conv2D_e

;EfficientNet/block2a_se_reduce/Conv2D_eightbit_min_EfficientNet/block2a_se_reshape/Reshape__print__;__min:[-0.255523384]
;EfficientNet/block2a_se_reduce/Conv2D_eightbit_max_EfficientNet/block2a_se_reshape/Reshape__print__;__max:[16.6014957]
;EfficientNet/block2a_se_reduce/Conv2D_eightbit_requant_range__print__;__requant_min:[-10.1076632]
;EfficientNet/block2a_se_reduce/Conv2D_eightbit_requant_range__print__;__requant_max:[-1.31568098]
;EfficientNet/block2a_se_expand/Conv2D_eightbit_min_EfficientNet/block2a_se_reduce/mul__print__;__min:[-0.2783162]
;EfficientNet/block2a_se_expand/Conv2D_eightbit_max_EfficientNet/block2a_se_reduce/mul__print__;__max:[-0.000412048597]
;EfficientNet/block2a_se_expand/Conv2D_eightbit_requant_range__print__;__requant_min:[-0.45660764]
;EfficientNet/block2a_se_expand/Conv2D_eightbit_requant_range__print__;__requant_max:[1.06961787]
;EfficientNet/block2a_project_conv/Conv2D_eightbit_min_EfficientNet/block2a_se_excite/mul__print__;__min:[-0.186194316]
;Efficie

2022-06-08 17:48:43 [INFO] Pass QuantizedRNNConverter elapsed time: 19.16 ms
2022-06-08 17:48:43 [INFO] Pass StripUnusedNodesOptimizer elapsed time: 39.66 ms
2022-06-08 17:48:43 [INFO] Pass RemoveTrainingNodesOptimizer elapsed time: 15.02 ms
2022-06-08 17:48:43 [INFO] Pass FoldBatchNormNodesOptimizer elapsed time: 13.8 ms
2022-06-08 17:48:43 [INFO] Pass MetaOpOptimizer elapsed time: 8.4 ms
2022-06-08 17:48:44 [INFO] Pass PostCseOptimizer elapsed time: 120.17 ms
2022-06-08 17:48:44 [INFO] |**********Mixed Precision Statistics*********|
2022-06-08 17:48:44 [INFO] +-----------------------+-------+------+------+
2022-06-08 17:48:44 [INFO] |        Op Type        | Total | INT8 | FP32 |
2022-06-08 17:48:44 [INFO] +-----------------------+-------+------+------+
2022-06-08 17:48:44 [INFO] |         Conv2D        |   65  |  65  |  0   |
2022-06-08 17:48:44 [INFO] | DepthwiseConv2dNative |   16  |  0   |  16  |
2022-06-08 17:48:44 [INFO] |         MatMul        |   1   |  1   |  0   |
2022-06-0

In [8]:
start = time()
for x, _ in ds_test.as_numpy_iterator():
    model_quantized(x)
infer_time_model_quantized = time() - start

In [9]:
template = """
|    Precision   | Inference Time(s) |
|      FP32      |       {:5.2f}       |
|      INT8      |       {:5.2f}       |
| Improvement(%) |       {:5.2f}       |
"""
summary = template.format(
    inference_time_model_basic,
    infer_time_model_quantized,
    (1 - infer_time_model_quantized /inference_time_model_basic) * 100
)
print(summary)


|    Precision   | Inference Time(s) |
|      FP32      |       42.63       |
|      INT8      |       40.95       |
| Improvement(%) |        3.94       |

